Changes: Uniq - Removing duplicate lines

Revision as of 19:58, 28 March 2014

Tip 648 Printable Monobook Previous Next

created 2004 · complexity intermediate · author Michael Geddes · version 7.0

The following command will sort all lines and remove duplicates (keeping unique lines):

:sort u

If you need more control, here are some alternatives.

There are two versions (and \v "verymagic" version as a variant of the second): the first leaves only the last line, the second leaves only the first line. (Use \zs for speed reason.)

g/^\(.*\)\n\1$/d
g/\%(^\1\n\)\@<=\(.*\)$/d
g/\v%(^\1\n)@<=(.*)$/d

Breakdown of the second version:

g/\%(^\1\n\)\@<=\(.*\)$/d
g/                     /d  <-- Delete the lines matching the regexp
            \@<=           <-- If the bit following matches, make sure the bit preceding this symbol directly precedes the match
                \(.*\)$    <-- Match the line into subst register 1
  \%(     \)               <-- Group without placing in a subst register.
     ^\1\n                 <-- Match subst register 1 followed the new line between the 2 lines

In this simple format (matching the whole line), it's not going to make much difference, but it will start to matter if you want to do stuff like match the first word only.

This does a uniq on the first word in the line (with the \v "verymagic" version included after), and deletes all but the first line:

g/\%(^\1\>.*\n\)\@<=\(\k\+\).*$/d
g/\v%(^\1>.*\n)@<=(\k+).*$/d

Comments

Here are some more Vim-native ways for removing duplicate lines. This time they don't have to be adjacent. Line order is preserved.

This one can be a bit slow. And the pattern would match a single empty line which would also be deleted. The part ":g/^m0<CR>" at beginning and end of the command maybe optional.

:nno \d1 :g/^/m0<CR>:g/^\(.*\)\n\_.*\%(^\1$\)/d<CR>:g/^/m0<CR>

This is faster. Uses mark l.

:nno \d2 :g/^/kl\|if search('^'.escape(getline('.'),'\.*[]^$/').'$','bW')\|'ld<CR>

Following uses a substitute to delete all repeated lines (leaving only the first line, while deleting following duplicate lines). This is a variation on the g//d method.

%s/^\(.*\)\(\n\1\)\+$/\1/
%s/\v^(.*)(\n\1)+$/\1/

@@ Line 1: / Line 1: @@
+{{TipImported
-{{review}}
-{{Tip
 |id=648
+|previous=647
-|title=Uniq - Removing duplicate lines
+|next=649
-|created=February 1, 2004 20:45
+|created=2004
 |complexity=intermediate
 |author=Michael Geddes
-|version=6.0
+|version=7.0
 |rating=18/9
+|category1=
-|text=
+|category2=
-There are two versions, the first leaves only the last line, the second leaves only the first line.
-g/^\(.*\)$\n\1$/d
-g/\%(^\1$\n\)\@&lt;=\(.*\)$/d
-Breakdown of the second version:
-g//d &lt;-- Delete the lines matching the regexp
-\@&lt;= &lt;-- If the bit following matches, make sure the bit preceding this symbol directly precedes the match
-\(.*\)$ &lt;-- Match the line into subst register 1
-\%( ) &lt;--- Group without placing in a subst register.
-^\1$\n &lt;--- Match subst register 1 followed by end of line and the new line between the 2 lines
-In this simple format (matching the whole line), it's not going to make much difference, but it will start to matter if you want to do stuff like match the first word only
-This does a uniq on the first word in the line, and deletes all but the first line:
-g/\%(^\1\&gt;.*$\n\)\@&lt;=\(\k\+\).*$/d
 }}
+The following command will sort all lines and remove duplicates (keeping unique lines):
+<pre>
+:sort u
+</pre>
+If you need more control, here are some alternatives.
-== Comments ==
-Or you could simply pipe the file, (Or range of lines) through uniq(1) thusly:
-:%!uniq
+There are two versions (and \v "verymagic" version as a variant of the second): the first leaves only the last line, the second leaves only the first line. (Use \zs for speed reason.)
-Cheers,
+<pre>
-Morel.
+g/^\(.*\)\n\1$/d
+g/\%(^\1\n\)\@<=\(.*\)$/d
+g/\v%(^\1\n)@<=(.*)$/d
+</pre>
+Breakdown of the second version:
+<pre>
+g/\%(^\1\n\)\@<=\(.*\)$/d
+g/                     /d  <-- Delete the lines matching the regexp
+            \@<=           <-- If the bit following matches, make sure the bit preceding this symbol directly precedes the match
+                \(.*\)$    <-- Match the line into subst register 1
+  \%(     \)               <-- Group without placing in a subst register.
+     ^\1\n                 <-- Match subst register 1 followed the new line between the 2 lines
+</pre>
+In this simple format (matching the whole line), it's not going to make much difference, but it will start to matter if you want to do stuff like match the first word only.
-box1024--AT--post.com
-, February 2, 2004 4:57
-----
-unless you are stuck inside a windows machine using vim, in which case this tip is most appreciated :)
+This does a uniq on the first word in the line (with the \v "verymagic" version included after), and deletes all but the first line:
--- RS
+<pre>
+g/\%(^\1\>.*\n\)\@<=\(\k\+\).*$/d
+g/\v%(^\1>.*\n)@<=(\k+).*$/d
+</pre>
+==See also==
-'''Anonymous'''
+*[[VimTip1148|Unique sorting]] script to 'sort unique' a List (not text lines)
-, February 2, 2004 5:35
+*[http://code.google.com/p/lh-vim/source/browse/system-tools/trunk/plugin/system_utils.vim system_utils.vim] command to remove duplicate lines in a range (uses <code>g//d</code> method)
-----
+*[[VimTip1166|Sort lines]] how to sort lines
-then again, windows users can have 'sort', 'uniq', 'grep' and a host of others if they visit the unxutils.sourceforge.net site
+==Comments==
-scott2237--AT--yahoo.com
+Here are some more Vim-native ways for removing duplicate lines. This time they don't have to be adjacent. Line order is preserved.
-, February 2, 2004 10:33
-----
-Or http://www.cygwin.com
+This one can be a bit slow. And the pattern would match a single empty line which would also be deleted. The part ":g/^m0<CR>" at beginning and end of the command maybe optional.
-'''Anonymous'''
+<pre>
-, February 2, 2004 15:34
+:nno \d1 :g/^/m0<CR>:g/^\(.*\)\n\_.*\%(^\1$\)/d<CR>:g/^/m0<CR>
-----
+</pre>
-Of course, personally, I use sort | uniq whether on my Windows or my Unix box. However if you were (for example) going to make a script that wanted to use uniq, then you shouldn't be assuming either exists.
+This is faster. Uses mark <code>l</code>.
-As sombody else has come up with sort, I thought I'd give a go at a pure vim version of uniq.
+<pre>
+:nno \d2 :g/^/kl\|if search('^'.escape(getline('.'),'\.*[]^$/').'$','bW')\|'ld<CR>
+</pre>
-I would definitely use this in a script over assuming an environment. Not everybody wants to download cygwin or friends for the lack of one or two commands. (Not that M*cr*s*ft doesn't suck in so many ways). I'm sure that with the multitude of platforms that Vim runs on, there a few out there that don't have convenient ports of/alternates to unix commands.
-//.ichael Geddes
-'''Anonymous'''
-, February 2, 2004 16:38
 ----
+Following uses a substitute to delete all repeated lines (leaving only the first line, while deleting following duplicate lines). This is a variation on the <code>g//d</code> method.
-Here are some more vim-native ways for removing duplicate
+<pre>
-lines. This time they don't have to be adjacent. Line order
+%s/^\(.*\)\(\n\1\)\+$/\1/
-is preserved.
+%s/\v^(.*)(\n\1)+$/\1/
+</pre>
-This one can be a bit slow.
-:nno \d1 :g/^/m0&lt;CR&gt;:g/^\(.*\)\n\_.*\%(^\1$\)/d&lt;CR&gt;:g/^/m0&lt;CR&gt;
-This is faster (some help from Preben Guldberg with this one).
-Uses mark l.
-:nno \d2 :g/^/kl\|if search('^'.escape(getline('.'),'\.*[]^$/').'$','bW')\|'ld&lt;CR&gt;
-Antony
-ads--AT--metawire.org
-, February 4, 2004 8:08
-----
-Here are some more vim-native ways for removing duplicate
-lines. This time they don't have to be adjacent. Line order
-is preserved.
-This one can be a bit slow.
-:nno \d1 :g/^/m0&lt;CR&gt;:g/^\(.*\)\n\_.*\%(^\1$\)/d&lt;CR&gt;:g/^/m0&lt;CR&gt;
-This is faster (some help from Preben Guldberg with this one).
-Uses mark l.
-:nno \d2 :g/^/kl\|if search('^'.escape(getline('.'),'\.*[]^$/').'$','bW')\|'ld&lt;CR&gt;
-Antony
-ads--AT--metawire.org
-, February 4, 2004 8:10
-----
-Here are some more vim-native ways for removing duplicate
-lines. This time they don't have to be adjacent. Line order
-is preserved.
-This one can be a bit slow.
-:nno \d1 :g/^/m0&lt;CR&gt;:g/^\(.*\)\n\_.*\%(^\1$\)/d&lt;CR&gt;:g/^/m0&lt;CR&gt;
-This is faster (some help from Preben Guldberg with this one).
-Uses mark l.
-:nno \d2 :g/^/kl\|if search('^'.escape(getline('.'),'\.*[]^$/').'$','bW')\|'ld&lt;CR&gt;
-Antony
-ads--AT--metawire.org
-, February 4, 2004 8:10
 ----
-<!-- parsed by vimtips.py in 0.463149  seconds-->