How Can I Eliminate Duplicates in a Linux File?
Atopic Dermatitis for Dummies
Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed
Chronic Rhinitis Unleashed
Chronic Rhinitis
Unleashed
Learn Nasal Endoscopy in 21 Days
To remove all the duplicates from the list of books, use this command:
uniq my.books
Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed
Learn Nasal Endoscopy in 21 Days
If you want to print only the book titles that are not duplicated (to find out which books you have one copy of), add the -u flag, like this:
uniq -u my.books
Learn Nasal Endoscopy in 21 Days
Conversely, you might want to exclude the titles that appear only once. If so, add the -d flag, like this:
uniq -d my.books
Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed
Now let's take inventory. To summarize the list of books and add a count of the number of times each one appears in the list, add the -c flag, like this:
uniq -c my.books
2 Atopic Dermatitis for Dummies
3 Chronic Rhinitis Unleashed
1 Learn Nasal Endoscopy in 21 Days
Note that the uniq command does not sort the input file, so you may want to use the sort command to prepare the data for uniq in advance. (See the end of this section for an example.)
Here's a recap of the flags you can use with the uniq command:
-u Print only lines that appear once in the input file.
-d Print only the lines that appear more than once in the input file.
-c Precede each output line with a count of the number of times it was found.
Previous Lesson: Sorting Data
Next Lesson: Selecting Columns
Comments - most recent first
(Please feel free to answer questions posted by others!)
Is it possible or how do I deduplicate a list of items in a .txt file without sorting them - they are already in the correct order but I know there are duplicates. cheers Steve
Biko all of these cmds work on stdin and stdout unless directed to do otherwise. So when you use a file as input, stdout is still where the modified data will end up unless you redirect stdout to the file to save the output. I.e. The sample data above in a file nums.
$ echo '111223334444111555' > nums
$ cat nums | sort | uniq > nums
$
see no output between prompts is what you should get. the > nums is the redirect for stdout, putting the output into nums file for saving.
$ cat nums
12345
$
This is what should be in the nums file now.
Check.
(that means we need to sort first, isn't ?)
I welcome your comments. However... I am puzzled by many people who say "Please send me the Linux tutorial." This website *is* your Linux Tutorial! Read everything here, learn all you can, ask questions if you like. But
don't ask me to send what you already have. :-) NO SPAM! If you post garbage, it will be deleted, and you will be banned.
Copyright © by -
All rights reserved - Redistribution is allowed only with permission.

