Sort 函數介紹

←手機掃碼閱讀火星人 @ 2014-03-26 , reply:0

Sort 函數介紹

Definition and syntax
Sort by numerical order
Sort by ASCII order
Sort by dictionary order
Sort by reverse order
Sort using multiple keys
Generate an array of sort ranks
Sort a hash by its keys (sort + map)
Sort a hash by its values (sort + map)
Sort words in a file and eliminate duplicates
Efficient sorting (sort + map)
Sorting lines on the last field (sort + map)
More efficient sorting (sort + map)
CPAN modules for sorting
Sort words with 4 or more consecutive vowels (sort map grep)
Print the canonical sort order (sort map grep)

The sort function
sort LIST
sort BLOCK LIST
sort SUBNAME LIST The sort function sorts the LIST and returns the sorted list value. If SUBNAME or BLOCK is omitted, the sort is in standard string comparison order (i.e., an ASCII sort). If SUBNAME is specified, it gives the name of a subroutine that compares two list elements and returns an integer less than, equal to, or greater than 0, depending on whether the elements are in ascending sort order, equal, or descending sort order, respectively. In place of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine.
sort函數的作用就是對LIST排序然後返回一個排好序的列表。如果沒有SUBNAME或者BLOCK，那麼sort函數按照標準的字元串比較順序來排序（即ASCII排序）。如果指定了SUBNAME，也就是說，指定一個子程序名來比較列表中的兩個元素，根據它們的關係是升序排列、相等，還是降序排列，來返回一個小於0的整數值、0，大於0的整數值。也可以用一個匿名的、內置有排序的子程序塊BLOCK來代替SUBNAME。
The two elements to be compared are passed as the variables $a and $b. They are passed by reference, so don』t modify $a or $b. If you use a subroutine, it cannot be recursive.
用於比較的兩個元素分別作為$a和$b傳遞給SUBNAME或BLOCK。它們是作為引用傳遞的，所以千萬不要修改$a或者$b。如果使用子程序的話，它不能是遞歸的。
Perl versions prior to 5.6 use the Quicksort algorithm for sorting; version 5.6 and higher use the slightly more efficient Mergesort algorithm. Both are efficient N log N algorithms.
在5.6版本之前，perl使用快速排序演算法來排序；5.6之後的版本使用略微高效的歸併排序演算法。這兩種演算法都是高效的時間複雜度為Nlog N的演算法。?
Sort by numerical order
按數字順序排序
@array = (8, 2, 32, 1, 4, 16);
print join(' ', sort { $a <=> $b } @array), "\n";

輸出結果:
1 2 4 8 16 32 Equivalently:
相同結果地：
sub numerically { $a <=> $b }
print join(' ', sort numerically @array), "\n";

輸出結果:
1 2 4 8 16 32 Sort by ASCII order (not dictionary order)
按ASCII順序排序（不是字典順序）
@languages = qw(fortran lisp c c++ Perl python java);
print join(』』, sort @languages), 」\n」;
Perl c c++ fortran java lisp python
Equivalently:
相同結果地：
print join(' ', sort { $a cmp $b } @languages), "\n"; Watch out if you have some numbers in the data:
如果數據中有數字的話要小心
print join(' ', sort 1 .. 11), "\n";

輸出結果:
1 10 11 2 3 4 5 6 7 8 9 Sort by dictionary order
按照字典順序
use locale;
@array = qw(ASCII ascap at_large atlarge A ARP arp);
@sorted = sort { ($da = lc $a) =~ s/[\W_]+//g;
($db = lc $b) =~ s/[\W_]+//g;
$da cmp $db;
} @array;
print "@sorted\n";

輸出結果:
A ARP arp ascap ASCII atlarge at_large The use locale pragma is optional ? it makes the code more portable if the data contains international characters. This pragma affects the operators cmp, lt, le, ge, gt and some functions ? see the perllocale man page for details.
編譯指示use locale是可選的－如果數據中包含國際字元的話，這會讓代碼更加便攜。這個編譯指示會影響操作符cmp，lt，le，ge，gt和一些函數 ? 細節參看perllocal的man頁。
Notice that the order of atlarge and at_large was reversed on output, even though their sort order was identical. (The substitution removed the underscore before the sort.) This happened because this example was run using Perl version 5.005_02. Before Perl version 5.6, the sort function would not preserve the order of keys with identical values. The sort function in Perl versions 5.6 and higher preserves this order. (A sorting algorithm that preserves the order of identical keys is called 「stable」.)
注意，上面的例子中，atlarge和at_large的位置在結果中反過來了，儘管它們排序的順序是相同的。（子程序在排序之前移除了下劃線。）位置之所以會反過來，是因為這個程序當前正運行在perl5.005_02之下。在5.6版本之前，對於值相同的鍵，sort函數是不會保留它們的順序的。而5.6之後的版本，sort函數則會保留這種順序。(這種保留相同值的排列順序的演算法被稱之為「固定」。）
Sort by reverse order
逆序排序
To reverse the sort order, reverse the position of the operands of the cmp or <=> operators:
為了做逆序排序，只需要把操作符<=>兩邊的操作數位置調換即可。
sort { $b <=> $a } @array; or change the sign of the block』s or subroutine』s return value:
或者改變程序塊兒或子程序返回值的符號：
sort { -($a <=> $b) } @array; or add the reverse function (slightly less efficient, but perhaps more readable):
或者添加reverse函數（效率低一些，但是可讀性更強）：
reverse sort { $a <=> $b } @array; Sort using multiple keys
利用多鍵排序
To sort by multiple keys, use a sort subroutine with multiple tests connected by Perl』s or operator. Put the primary sort test first, the secondary sort test second, and so on.
為了用多鍵排序，用來排序的子程序需要用or操作符來連接多個測試，第一個鍵的測試在前，第二個鍵的測試在後，依此類推。
# An array of references to anonymous hashes
# 一個匿名哈希表的數組引用

@employees = (
{ FIRST => 'Bill', LAST => 'Gates',
SALARY => 600000, AGE => 45 },
{ FIRST => 'George', LAST => 'Tester'
SALARY => 55000, AGE => 29 },
{ FIRST => 'Steve', LAST => 'Ballmer',
SALARY => 600000, AGE => 41 }
{ FIRST => 'Sally', LAST => 'Developer',
SALARY => 55000, AGE => 29 },
{ FIRST => 'Joe', LAST => 'Tester',
SALARY => 55000, AGE => 29 },
);
sub seniority {
$b->{SALARY} <=> $a->{SALARY}
or $b->{AGE} <=> $a->{AGE}
or $a->{LAST} cmp $b->{LAST}
or $a->{FIRST} cmp $b->{FIRST}
}
@ranked = sort seniority @employees;
foreach $emp (@ranked) {
print "$emp->{SALARY}\t$emp->{AGE}\t$emp->{FIRST}
$emp->{LAST}\n";
}

輸出結果:
600000 45 Bill Gates
600000 41 Steve Ballmer
55000 29 Sally Developer
55000 29 George Tester
55000 29 Joe Tester Generate an array of sort ranks for a list
針對列表部分範圍進行排序，返回數組。
@x = qw(matt elroy jane sally);
@rank[sort { $x[$a] cmp $x[$b] } 0 .. $#x] = 0 .. $#x;
print "@rank\n";

輸出結果:
2 0 1 3 Sort a hash by its keys
針對鍵來對哈希表排序
Hashes are stored in an order determined by Perl』s internal hashing algorithm; the order changes when you add or delete a hash key. If you need to sort a hash, consider storing the data in an array instead and using the preceding recipe (generate an array of sort ranks). Alternatively, you can sort a hash by key value and store the results in an array in which each element is a reference to a hash containing only one key/value pair:
哈希表的存儲是根據perl內部的哈希演算法決定的順序來排列的；當你添加或者刪除一個哈希鍵，則該順序將會發生改變。如果你需要對一個哈希排序，那么最好將數據存儲在數組中，然後使用前面提到的方法（針對列表部分範圍進行排序）。或者，你可以根據哈希表的鍵來排序，並將結果存儲在一個數組中，在該數組中，每個元素都將是哈希表中一個鍵/值對兒的引用：
%hash = (Donald => Knuth, Alan => Turing, John => Neumann);
@sorted = map { { ($_ => $hash{$_}) } } sort keys %hash;
foreach $hashref (@sorted) {
($key, $value) = each %$hashref;
print "$key => $value\n";
}

輸出結果:
Alan => Turing
Donald => Knuth
John => Neumann Sort a hash by its values
Unlike hash keys, hash values are not guaranteed to be unique. If you sort a hash by only its values, the sort order of two elements with the same value may change when you add or delete other values. To do a stable sort by hash values, do a primary sort by value and a secondary sort by key:
與哈希表中的鍵不通，哈希表中的值沒有唯一性。如果你只根據值來排序的化，那麼當添加或者刪除其他值的時候，相同值的兩個元素的排序順序將會發生改變。為了根據哈希值來作一個穩定的排序，需要先根據值來做第一次排序，然後根據鍵再做第二次排序：
%hash = ( Elliot => Babbage,
Charles => Babbage,
Grace => Hopper,
Herman => Hollerith
);
@sorted = map { { ($_ => $hash{$_}) } }
sort { $hash{$a} cmp $hash{$b}
or $a cmp $b
} keys %hash;
foreach $hashref (@sorted) {
($key, $value) = each %$hashref;
print "$key => $value\n";
}

輸出結果:
Charles => Babbage
Elliot => Babbage
Herman => Hollerith
Grace => Hopper (Elliot Babbage was Charles』s younger brother. He died in abject poverty after numerous attempts to use Charles』s Analytical Engine to predict horse races. And did I tell you about Einstein』s secret life as a circus performer?)
（Elliot Babbage 是 Charles 的小弟弟。他在多次嘗試用查理斯分析機預報賽馬結果之後死於貧苦之中。我告訴過你愛因斯坦曾當過馬戲團演員的故事嗎？）
Sort words in a file and eliminate duplicates
對文件中的單詞進行排序，並去掉重複的單詞。
This Unix 「one-liner」 displays a sorted list of the unique words in a file. (The \ escapes the following newline so that Unix sees the command as a single line.) The statement uniq{@F} = () uses a hash slice to create a hash whose keys are the unique words in the file; it is semantically equivalent to $uniq{ $F[0], $F[1], ... $F[$#F] } = (); Hash slices have a confusing syntax that combines the array prefix symbol with the hash key symbols {}, but they solve this problem succinctly.
下面這個unix俳句將列印出一個文件中的出現的唯一的單詞的列表。（\ 符號忽略新行符，所以unix認為這兩行實際是一行程序。）語句uniq{@F} = () 使用一個哈希片斷來創造一個哈希表，這個哈希表的鍵就是文件中唯一的單詞；在語義上，它相當於 $uniq{$F0,$F1, ... $F[$#F]} = (); 哈希片斷有一個相當容易讓人迷惑的語法，就是把數組的標誌@和哈希表符號{}組合在一起，但這樣處理更加簡潔。
perl -0777ane '$, = "\n"; \
@uniq{@F} = (); print sort keys %uniq' file

輸出結果:
-0777 - Slurp entire file instead of single line
-a - Autosplit mode, split lines into @F array
-e - Read scrīpt from command line
-n - Loop over scrīpt specified on command line: while (<>) { ... }
$, - Output field separator used by print function
file - File name Efficient sorting: the Orcish maneuver and the Schwartzian transform
The sort subroutine will usually be called many times for each key. If the run time of the sort is critical, use the Orcish maneuver or Schwartzian transform so that each key is evaluated only once.
Consider an example where evaluating the sort key involves a disk access: sorting a list of file names by file modification date.

Brute force ? multiple disk accesses for each file
@sorted = sort { -M $a <=> -M $b } @filenames;

# Cache keys in a hash (the Orcish maneuver)
@sorted = sort { ($modtimes{$a} ||= -M $a) <=>
($modtimes{$b} ||= -M $b)
} @filenames; The Orcish maneuver was popularized by Joseph Hall. The name is derived from the phrase 「or cache」, which is the term for the 」||=」 operator. (Orcish may also be a reference to the Orcs, a warrior race from the Lord of the Rings trilogy.)
# Precompute keys, sort, extract results
# (The famous Schwartzian transform)
@sorted = map( { $_->[0] }
sort( { $a->[1] <=> $b->[1] }
map({ [$_, -M] } @filenames)
)
); The Schwartzian transform is named for its originator, Randall Schwartz, who is the co-author of the book Learning Perl.
Benchmark results (usr + sys time, as reported by the Perl Benchmark module):
Linux 2.2.14, 333 MHz CPU, 1649 files:
Brute force: 0.14 s Orcish: 0.07 s Schwartzian: 0.09 s
WinNT 4.0 SP6, 133 MHz CPU, 1130 files:
Brute force: 7.38 s Orcish: 1.43 s Schwartzian: 1.38 s The Orcish maneuver is usually more difficult to code and less elegant than the Schwartzian transform. I recommend that you use the Schwartzian transform as your method of choice.
Also, remember these basic rules of optimization: (1) Don』t do it (or don』t do it yet), (2) Make it right before you make it faster, and (3) Make it clear before you make it faster.
Sorting lines on the last field (Schwartzian transform)
Given $str with the contents below (each line is terminated by the newline character, \n):

eir 11 9 2 6 3 1 1 81% 63% 13
oos 10 6 4 3 3 0 4 60% 70% 25
hrh 10 6 4 5 1 2 2 60% 70% 15
spp 10 6 4 3 3 1 3 60% 60% 14
Sort the contents using the last field as the sort key.

$str = join "\n",
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, (split)[-1] ] }
split /\n/, $str;
Another approach is to install and use the CPAN module Sort::Fields. For example:
use Sort::Fields;
@sorted = fieldsort [ 6, '2n', '-3n' ] @lines; This statement uses the default column definition, which is a split on whitespace.
Primary sort is an alphabetic (ASCII) sort on column 6.
Secondary sort is a numeric sort on column 2.
Tertiary sort is a reverse numeric sort on column 3.
Efficient sorting revisited: the Guttman-Rosler transform
Given that the Schwartzian transform is the usually best option for efficient sorting in Perl, is there any way to improve on it? Yes, there is! The Perl sort function is optimized for the default sort, which is in ASCII order. The Guttman-Rosler transform (GRT) does some additional work in the enclosing map functions so that the sort is done in the default order. The Guttman-Rosler transform was first published by Michal Rutka and popularized by Uri Guttman and Larry Rosler.
Consider an example where you want to sort an array of dates. Given data in the format YYYY/MM/DD:
@dates = qw(2001/1/1 2001/07/04 1999/12/25); what is the most efficient way to sort it in order of increasing date?
The straightforward Schwartzian transform would be:
@sorted = map { $_->[0] }
sort { $a->[1] <=> $b->[1]
or $a->[2] <=> $b->[2]
or $a->[3] <=> $b->[3]
}
map { [ $_, split m $_, 3 ] } @dates; The more efficient Guttman-Rosler transform is:
@sorted = map { substr $_, 10 }
sort
map { m|(\d\d\d\d)/(\d+)/(\d+)|;
sprintf "%d-%02d-%02d%s", $1, $2, $3, $_
} @dates; The GRT solution is harder to code and less readable than the Schwartzian transform solution, so I recommend you use the GRT only in extreme circumstances. For tests using large datasets, Perl 5.005_03 and Linux 2.2.14, the GRT was 1.7 times faster than the Schwartzian transform. For tests using Perl 5.005_02 and Windows NT 4.0 SP6, the GRT was 2.5 times faster. Using the timing data from the first efficiency example, the Guttman-Rosler transform was faster than a brute force sort by a factor of 2.7 and 13 on Linux and Windows, respectively.
If you are still not satisfied, you may be able to speed up your sorts by upgrading to a more recent version of Perl. The sort function in Perl versions 5.6 and higher uses the Mergesort algorithm, which (depending on the data) is slightly faster than the Quicksort algorithm used by the sort function in previous versions.
Again, remember these basic rules of optimization: (1) Don』t do it (or don』t do it yet), (2) Make it right before you make it faster, and (3) Make it clear before you make it faster.
Some CPAN modules for sorting
These modules can be downloaded from http://search.cpan.org/.
File::Sort ? Sort one or more text files by lines
Sort::Fields ? Sort lines using one or more columns as the sort key(s)
Sort::ArbBiLex ? Construct sort functions for arbitrary sort orders
Text::BibTeX::BibSort ? Generate sort keys for bibliographic entries.
Sort::Maker ? Automatic generation of ST, GRT, etc. sorters.
Sort::Key ? The fastest way to sort in Perl. The open source community is adding to CPAN regularly ? use the search engine to check for new modules. If one of the CPAN sorting modules can be modified to suit your needs, contact the author and/or post your idea to the Usenet group comp.lang.perl.modules. If you write a useful, generalized sorting module, please contribute it to CPAN!
The holy grail of Perl data transforms

Challenge: Find a practical problem that can be
effectively solved by a statement using map, sort and grep. The code should run faster and be more compact than alternative solutions.
Print sorted list of words with 4 or more consecutive vowels
perl -e 'print sort map { uc } grep /[aeiou]{4}/, <>' \
/usr/dict/words

輸出結果:
AQUEOUS
DEQUEUE
DEQUEUED
DEQUEUES
DEQUEUING
ENQUEUE
ENQUEUED
ENQUEUES
HAWAIIAN
OBSEQUIOUS
PHARMACOPOEIA
QUEUE
QUEUED
QUEUEING
QUEUER
QUEUERS
QUEUES
QUEUING
SEQUOIA (Pharmacopoeia is an official book containing a list of drugs with articles on their preparation and use.)
Print the canonical sort order
This prints the canonical order of characters used by the cmp, lt, gt, le and ge operators:
print +(sort grep /\w/, map { chr } 0 .. 255), "\n";

輸出結果:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz The map converts the numbers to their ASCII value; the grep gets rid of special characters and funky control characters that mess up your screen. The plus sign helps Perl interpret the syntax print (...) correctly.
This example shows that, in this case, the expression '_' lt 'a' is TRUE. If your program has the 「use locale」 pragma, the canonical order will depend on the program』s current locale.

Tags: linux system 內核

[火星人 ] Sort 函數介紹已經有362次圍觀

本文地址：http://coctec.com/docs/linux/show-post-188434.html

Sort 函數介紹

熱門文章

最新文章