< Previous | Contents | Next >
uniq
Compared to sort, the uniq program is a lightweight. uniq performs a seemingly trivial task. When given a sorted file (or standard input), it removes any duplicate lines and sends the results to standard output. It is often used in conjunction with sort to clean the output of duplicates.
Tip: While uniq is a traditional Unix tool often used with sort, the GNU version of sort supports a -u option, which removes duplicates from the sorted output.
Let’s make a text file to try this out:
[me@linuxbox ~]$ cat > foo.txt a
b c a b c
[me@linuxbox ~]$ cat > foo.txt a
b c a b c
Remember to type Ctrl-d to terminate standard input. Now, if we run uniq on our text file:
[me@linuxbox ~]$ uniq foo.txt
a b c a b c
[me@linuxbox ~]$ uniq foo.txt
a b c a b c
the results are no different from our original file; the duplicates were not removed. For
uniq to do its job, the input must be sorted first:
[me@linuxbox ~]$ sort foo.txt | uniq
a b c
[me@linuxbox ~]$ sort foo.txt | uniq
a b c
This is because uniq only removes duplicate lines which are adjacent to each other.
uniq has several options. Here are the common ones:
Table 20-2: Common uniq Options
Option Description
Option Description
-c Output a list of duplicate lines preceded by the number of times the line occurs.
-d Only output repeated lines, rather than unique lines.
-f n Ignore n leading fields in each line. Fields are separated by whitespace as they are in sort; however, unlike sort, uniq has no option for setting an alternate field separator.
-i Ignore case during the line comparisons.
-s n Skip (ignore) the leading n characters of each line.
-u Only output unique lines. Lines with duplicates are ignored.
Here we see uniq used to report the number of duplicates found in our text file, using the -c option:
[me@linuxbox ~]$ sort foo.txt | uniq -c
2 a
2 b
2 c
[me@linuxbox ~]$ sort foo.txt | uniq -c
2 a
2 b
2 c