Linux Classes
Linux Training
* Linux Classes *

LINUX CLASSES - DATA MANIPULATION

How Can I Sort Linux Files?

The sort command sorts a file according to fields--the individual pieces of data on each line. By default, sort assumes that the fields are just words separated by blanks, but you can specify an alternative field delimiter if you want (such as commas or colons). Output from sort is printed to the screen, unless you redirect it to a file.

If you had a file like the one shown here containing information on people who contributed to your presidential reelection campaign, for example, you might want to sort it by last name, donation amount, or location. (Using a text editor, enter those three lines into a file and save it with donor.data as the file name.)

Bay Ching 500000 China
Jack Arta 250000 Indonesia
Cruella Lumper 725000 Malaysia

Let's take this sample donors file and sort it according to the donation amount. The following shows the command to sort the file on the second field (last name) and the output from the command:

sort +1 -2 donors.data
Jack Arta 250000 Indonesia
Bay Ching 500000 China
Cruella Lumper 725000 Malaysia

The syntax of the sort command is pretty strange, but if you study the following examples, you should be able to adapt one of them for your own use. The general form of the sort command is

sort <flags> <sort fields> <file name>

The most common flags are as follows:

-f Make all lines uppercase before sorting (so "Bill" and "bill" are treated the same).
-r Sort in reverse order (so "Z" starts the list instead of "A").
-n Sort a column in numerical order
-tx Use x as the field delimiter (replace x with a comma or other character).
-u Suppress all but one line in each set of lines with equal sort fields (so if you sort on a field containing last names, only one "Smith" will appear even if there are several).

Specify the sort keys like this:

+m Start at the first character of the m+1th field.
-n End at the last character of the nth field (if -N omitted, assume the end of the line).

Looks weird, huh? Let's look at a few more examples with the sample company.data file shown here, and you'll get the hang of it. (Each line of the file contains four fields: first name, last name, serial number, and department name.)

Jan Itorre 406378 Sales
Jim Nasium 031762 Marketing
Mel Ancholie 636496 Research
Ed Jucacion 396082 Sales

To sort the file on the third field (serial number) in reverse order and save the results in sorted.data, use this command:

sort -r +2 -3 company.data > sorted.data
Mel Ancholie 636496 Research
Jan Itorre 406378 Sales
Ed Jucacion 396082 Sales
Jim Nasium 031762 Marketing

Now let's look at a situation where the fields are separated by colons instead of spaces. In this case, we will use the -t: flag to tell the sort command how to find the fields on each line. Let's start with this file:

Itorre, Jan:406378:Sales
Nasium, Jim:031762:Marketing
Ancholie, Mel:636496:Research
Jucacion, Ed:396082:Sales

To sort the file on the second field (serial number), use this command:

sort -t: +1 -2 company.data
Nasium, Jim:031762:Marketing
Jucacion, Ed:396082:Sales
Itorre, Jan:406378:Sales
Ancholie, Mel:636496:Research

To sort the file on the third field (department name) and suppress the duplicates, use this command:

sort -t: -u +2 company.data
Nasium, Jim:031762:Marketing
Ancholie, Mel:636496:Research
Itorre, Jan:406378:Sales

Note that the line for Ed Jucacion did not print, because he's in Sales, and we asked the command (with the -u flag) to suppress lines that were the same in the sort field.

There are lots of fancy (and a few obscure) things you can do with the sort command. If you need to do any sorting that's not quite as straightforward as these examples, try the man sort command for more information.

For more information on the sort command, see the sort manual.

Previous Lesson: Heads or Tails?
Next Lesson: Eliminating Duplicates

[ RETURN TO INDEX ]

Comments (most recent first)

Bob Rankin     (17 Aug 2010, 09:56)
Yogesh, Once again, they are the sort keys, which are explained above.
yogesh     (17 Aug 2010, 09:19)
sort -r +2 -3 company.data > sorted.data

I wonder what is role of +2 here and

+1 in : sort +1 -2 donors.data



i've this same querryyyyyyyyy
moomin     (15 Jul 2010, 14:01)
sort -t'f' -k2 -rn test.txt
blabla.ref110.f0
blabla.ref102.f0
blabla.ref11.f0

moomin     (15 Jul 2010, 13:55)
I find the -k option easier to use

if you wanna sort guid of your existing grps
i.e
sort -t: -k3 -n /etc/group
or -rn for reverse order
John Hopkins     (09 Jul 2010, 02:05)
I just can't "sort" out the following problem.
I got a list of filenames looking like this:

blabla.ref102.f0
blabla.ref11.f0
blabla.ref110.f0

and I need to sort it by the ref-numbers.
There is no way to get proper fields. Anything I can do about that?
umar ayaz     (16 Jun 2010, 05:29)
Good for understanding
McSort     (14 Jun 2010, 15:47)
to sort multiple files, you can merge data using '-m' flag.

Example:

> sort -m file1.data file2.data file3.data
Bob Rankin     (10 Jun 2010, 08:58)
@checkerbum - Believe it or not, that's the expected result! I didn't believe it either, until I tried it and looked up the specs for the sort command. The trick is to set the LC_ALL environment variable before the sort command.

export LC_ALL=C

Then run your sort command.
checkerbum     (09 Jun 2010, 11:52)
sort ignores some characters while sorting.
The following list is considered by sort to be in order. the '_' is ignored.
I have tried all the switches, -d, -g, -i, -n. none of them gives the desired results.
Is there another sort utility that works as one would expect?

en_aud_sw_digo
enb_m0_digo
enb_mchrg_digo_chrg
en_cmp_vkp_digo
gilberto dos santos alves     (31 Mar 2010, 15:07)
please bob could you make one more explain to sort a file like this file have 3 fields: 1=id of title, 2=id of editor, 3=title of book/article. se that theses files are separated by spaces and the problem is the 3rd field have spaces in. I think if you show these sample all we will understand +m -n. thank you.
=====file start======
66 365 ACCENT, DIALECT AND THE SCHOOL
1454 5436 A COURSE IN MODERN LINGUISTICS
1 30 A COURSE OF PHONETICS
67 370 ACQUIRING LANGUAGE IN CONVERS.
68 375 ACTANTS ET ACTIONS DANS L'EXPRESSION D'UNE REGLE DE JEU
69 377 ACTES DU PREMIER CONGRES INTERNATIONAL DE LINGUISTES
1020 5002 ACTES DU PREMIER CONGRES INTERNATIONAL DE LINGUISTES
70 378 ACTION GESTURE AND SYMBOLI THE EMERGENCE OF LANGUAGE
=====file end======
gilberto dos santos alves     (31 Mar 2010, 14:57)
default sort is by default entire line of file.
hari     (27 Mar 2010, 12:11)
what is the default sort key
Bob Rankin     (17 Mar 2010, 06:17)
I suppose you could launch both sort commands in the background and let them run at the same time...
Gouled     (17 Mar 2010, 04:21)
could i sort 2 files (i.e file1 & file2)simultaniously or I'd have to do each seperately
thank you
interesting article
Bob Rankin     (01 Mar 2010, 06:45)
They are the sort keys. Read the article again, they are explained above.
mauludi     (28 Feb 2010, 23:46)
sort -r +2 -3 company.data > sorted.data

I wonder what is role of +2 here and

+1 in : sort +1 -2 donors.data

thank you in advance

*Name:
Email:
Notify me about new comments on this page
Hide my email
*Text:
 

Ask Bob Rankin - Free Tech Support
<Send This Link to a Friend>         <Bookmark This Page>


Copyright © by Bob Rankin
All rights reserved - Redistribution is allowed only with permission.