Wikia

Vim Tips Wiki

Changes: Uniq - Removing duplicate lines

Edit

Back to page

(Change <tt> to <code>, perhaps also minor tweak.)
(Spaced the detailed breakdown a bit better, and added some verymagic versions)
 
(4 intermediate revisions by 3 users not shown)
Line 18: Line 18:
 
If you need more control, here are some alternatives.
 
If you need more control, here are some alternatives.
   
There are two versions, the first leaves only the last line, the second leaves only the first line.
+
There are two versions (and \v "verymagic" version as a variant of the second): the first leaves only the last line, the second leaves only the first line. (Use \zs for speed reason.)
 
<pre>
 
<pre>
g/^\(.*\)$\n\1$/d
+
g/^\(.*\)\n\1$/d
g/\%(^\1$\n\)\@<=\(.*\)$/d
+
g/\%(^\1\n\)\@<=\(.*\)$/d
  +
g/\v%(^\1\n)@<=(.*)$/d
 
</pre>
 
</pre>
   
 
Breakdown of the second version:
 
Breakdown of the second version:
 
<pre>
 
<pre>
g//d <-- Delete the lines matching the regexp
+
g/\%(^\1\n\)\@<=\(.*\)$/d
\@<= <-- If the bit following matches, make sure the bit preceding this symbol directly precedes the match
+
g/ /d <-- Delete the lines matching the regexp
\(.*\)$ <-- Match the line into subst register 1
+
\@<= <-- If the bit following matches, make sure the bit preceding this symbol directly precedes the match
\%( ) <--- Group without placing in a subst register.
+
\(.*\)$ <-- Match the line into subst register 1
^\1$\n <--- Match subst register 1 followed by end of line and the new line between the 2 lines
+
\%( \) <-- Group without placing in a subst register.
  +
^\1\n <-- Match subst register 1 followed the new line between the 2 lines
 
</pre>
 
</pre>
   
 
In this simple format (matching the whole line), it's not going to make much difference, but it will start to matter if you want to do stuff like match the first word only.
 
In this simple format (matching the whole line), it's not going to make much difference, but it will start to matter if you want to do stuff like match the first word only.
   
This does a uniq on the first word in the line, and deletes all but the first line:
+
This does a uniq on the first word in the line (with the \v "verymagic" version included after), and deletes all but the first line:
 
<pre>
 
<pre>
g/\%(^\1\>.*$\n\)\@<=\(\k\+\).*$/d
+
g/\%(^\1\>.*\n\)\@<=\(\k\+\).*$/d
  +
g/\v%(^\1>.*\n)@<=(\k+).*$/d
 
</pre>
 
</pre>
   
Line 48: Line 48:
 
Here are some more Vim-native ways for removing duplicate lines. This time they don't have to be adjacent. Line order is preserved.
 
Here are some more Vim-native ways for removing duplicate lines. This time they don't have to be adjacent. Line order is preserved.
   
This one can be a bit slow.
+
This one can be a bit slow. And the pattern would match a single empty line which would also be deleted. The part ":g/^m0<CR>" at beginning and end of the command maybe optional.
 
<pre>
 
<pre>
 
:nno \d1 :g/^/m0<CR>:g/^\(.*\)\n\_.*\%(^\1$\)/d<CR>:g/^/m0<CR>
 
:nno \d1 :g/^/m0<CR>:g/^\(.*\)\n\_.*\%(^\1$\)/d<CR>:g/^/m0<CR>
Line 62: Line 62:
 
<pre>
 
<pre>
 
%s/^\(.*\)\(\n\1\)\+$/\1/
 
%s/^\(.*\)\(\n\1\)\+$/\1/
  +
%s/\v^(.*)(\n\1)+$/\1/
 
</pre>
 
</pre>
   

Latest revision as of 19:58, March 28, 2014

Tip 648 Printable Monobook Previous Next

created 2004 · complexity intermediate · author Michael Geddes · version 7.0


The following command will sort all lines and remove duplicates (keeping unique lines):

:sort u

If you need more control, here are some alternatives.

There are two versions (and \v "verymagic" version as a variant of the second): the first leaves only the last line, the second leaves only the first line. (Use \zs for speed reason.)

g/^\(.*\)\n\1$/d
g/\%(^\1\n\)\@<=\(.*\)$/d
g/\v%(^\1\n)@<=(.*)$/d

Breakdown of the second version:

g/\%(^\1\n\)\@<=\(.*\)$/d
g/                     /d  <-- Delete the lines matching the regexp
            \@<=           <-- If the bit following matches, make sure the bit preceding this symbol directly precedes the match
                \(.*\)$    <-- Match the line into subst register 1
  \%(     \)               <-- Group without placing in a subst register.
     ^\1\n                 <-- Match subst register 1 followed the new line between the 2 lines

In this simple format (matching the whole line), it's not going to make much difference, but it will start to matter if you want to do stuff like match the first word only.

This does a uniq on the first word in the line (with the \v "verymagic" version included after), and deletes all but the first line:

g/\%(^\1\>.*\n\)\@<=\(\k\+\).*$/d
g/\v%(^\1>.*\n)@<=(\k+).*$/d

See alsoEdit

CommentsEdit

Here are some more Vim-native ways for removing duplicate lines. This time they don't have to be adjacent. Line order is preserved.

This one can be a bit slow. And the pattern would match a single empty line which would also be deleted. The part ":g/^m0<CR>" at beginning and end of the command maybe optional.

:nno \d1 :g/^/m0<CR>:g/^\(.*\)\n\_.*\%(^\1$\)/d<CR>:g/^/m0<CR>

This is faster. Uses mark l.

:nno \d2 :g/^/kl\|if search('^'.escape(getline('.'),'\.*[]^$/').'$','bW')\|'ld<CR>

Following uses a substitute to delete all repeated lines (leaving only the first line, while deleting following duplicate lines). This is a variation on the g//d method.

%s/^\(.*\)\(\n\1\)\+$/\1/
%s/\v^(.*)(\n\1)+$/\1/

Around Wikia's network

Random Wiki