Changes: Search across multiple lines

Revision as of 20:52, 28 December 2012

Tip 242 Printable Monobook Previous Next

created 2002 · complexity intermediate · version 6.0

Vim can search for text that spans multiple lines. For example, the search /hello\_sworld finds "hello world" in a single line, and also finds "hello" ending one line, with "world" starting the next line. In a search, \s finds space or tab, while \_s finds newline or space or tab: an underscore adds a newline to any character class.

This tip shows how to search over multiple lines, and presents a useful command so entering :S hello world finds "hello" followed by "world" separated by spaces or tabs or newlines, and :S! hello world allows any non-word characters, including newlines, between the words.

Patterns including end-of-line

The search /^abc finds abc at the beginning of a line, and /abc$ finds abc at the end of a line. However, in /abc^def and /abc$def the ^ and $ are just ordinary characters with no special meaning. By contrast, each of the following has a special meaning anywhere in a search pattern.

`\n`	a newline character (line ending)
`\_s`	a whitespace (space or tab) or newline character
`\_^`	the beginning of a line (zero width)
`\_$`	the end of a line (zero width)
`\_.`	any character including a newline

Example searches:

/abc\n*def: Finds abc followed by zero or more newlines then def.; Finds abcdef or abc followed by blank lines and def.; The blank lines have to be empty (no space or tab characters).

/abc\_s*def: Finds abc followed by any whitespace or newlines then def.; Finds abcdef or abc followed by blank lines and def.; The blank lines can contain any number of space or tab characters.; There may be whitespace after abc or before def.

/abc\_$\_s*def: Finds abc at end-of-line followed by any whitespace or newlines then def.; There must be no characters (other than a newline) following abc.; There can be any number of space, tab or newline characters before def.

/abc\_s*\_^def: Finds abc followed by any whitespace or newlines then def where def begins a line.; There must be no characters (other than a newline) before def.; There can be any number of space, tab or newline characters after abc.

/abc\_$def: Finds nothing because \_$ is "zero width" so the search is looking for abcdef where abc is also at end-of-line (which cannot occur).

/abc\_^def: Finds nothing because \_^ is "zero width" so the search is looking for abcdef where def is also at beginning-of-line (which cannot occur).

/abc\_.\{-}def: Finds abc followed by any characters or newlines (as few as possible) then def.; Finds abcdef or abc followed by any characters then def.

/abc$\_s.*$\{0,18\}\_sdef: Finds a block of 0 to 18 lines enclosed by abc and def.; limiting the number of lines is important, replacing this by a star will cause vim to consume 100% CPU.

Searching for multiline HTML comments

It is common for comments in HTML documents to span several lines:

<!-- This comment
 covers two lines. -->

The following search finds any HTML comment:

/<!--\_.\{-}-->

The atom \_. finds any character including end-of-line. The multi \{-} matches as few as possible (stopping at the first "-->"; the multi * is too greedy and would stop at the last occurrence).

Syntax highlighting may be not be accurate, particularly with long comments. The following command will improve the accuracy when jumping in the file, but may be slower (:help :syn-sync):

:syntax sync fromstart

Searching over multiple lines

A pattern can find any specified characters, for example, [aeiou] matches 'a' or 'e' or 'i' or 'o' or 'u'. In addition, Vim defines several character classes. For example, \a is [A-Za-z] (matches any alphabetic character), and \A is [^A-Za-z] (opposite of \a; matches any non-alphabetic character). :help /\a

An underscore can be used to extend a character class to include a newline (end of line). For example, searching for \_[aeiou] finds a newline or a vowel, so \_[aeiou]\+ matches any sequence of vowels, even a sequence spanning multiple lines. Similarly, \_a\+ matches any sequence of alphabetic characters, even when spanning multiple lines.

The following search pattern finds "hello world" where any non-alphabetic characters separate the words:

hello\_[^a-zA-Z]*world

The above pattern (which is equivalent to hello\_A*world) matches "helloworld", and "hello? ... world", and similar strings, even if "hello" is on one line and "world" is on a following line.

Searching over multiple lines with a user command

The script below defines the command :S that will search for a phrase, even when the words are on different lines. Examples:

:S hello world: Searches for "hello" followed by "world", separated by whitespace including newlines.
:S! hello world: Searches for "hello" followed by "world", separated by any non-word characters (whitespace, newlines, punctuation).; Finds, for example, "hello, world" and "hello+world" and "hello ... world". The words can be on different lines.

After entering the command, press n or N to search for the next or previous occurrence.

Put the following in your vimrc (or in file searchmultiline.vim in your plugin directory):

" Search for the ... arguments separated with whitespace (if no '!'),
" or with non-word characters (if '!' added to command).
function! SearchMultiLine(bang, ...)
  if a:0 > 0
    let sep = (a:bang) ? '\_W\+' : '\_s\+'
    let @/ = join(a:000, sep)
  endif
endfunction
command! -bang -nargs=* -complete=tag S call SearchMultiLine(<bang>0, <f-args>)|normal! /<C-R>/<CR>

References

:help pattern

Comments

@@ Line 5: / Line 5: @@
 |created=2002
 |complexity=intermediate
-|author=vim_power
+|author=
 |version=6.0
 |rating=31/16
@@ Line 11: / Line 11: @@
 |category2=
 }}
+Vim can search for text that spans multiple lines. For example, the search <code>/hello\_sworld</code> finds "hello world" in a single line, and also finds "hello" ending one line, with "world" starting the next line. In a search, <code>\s</code> finds space or tab, while <code>\_s</code> finds newline or space or tab: an underscore adds a newline to any character class.
-One of the most uncelebrated features of Vim is the ability to span a search across multiple lines.
+This tip shows how to search over multiple lines, and presents a useful command so entering <code>:S&nbsp;hello&nbsp;world</code> finds "hello" followed by "world" separated by spaces or tabs or newlines, and <code>:S!&nbsp;hello&nbsp;world</code> allows any non-word characters, including newlines, between the words.
-All of the following match line beginnings or endings anywhere in the search pattern, unlike ^ and $.
+==Patterns including end-of-line==
-<pre>
+The search <code>/^abc</code> finds <code>abc</code> at the beginning of a line, and <code>/abc$</code> finds <code>abc</code> at the end of a line. However, in <code>/abc^def</code> and <code>/abc$def</code> the <code>^</code> and <code>$</code> are just ordinary characters with no special meaning. By contrast, each of the following has a special meaning anywhere in a search pattern.
-\n   the new-line character itself
-\_^  the beginning of a line
-\_$  the end of a line but before any new-line character
-\_s  a space, tab character, or new-line character
-</pre>
+{| class="cleartable"
-e.g /{\_s will match all white-space characters and new-line chars after a "{"
+| <code>\n</code> || a newline character (line ending)
+|-
+| <code>\_s</code> || a whitespace (space or tab) or newline character
+|-
+| <code>\_^</code> || the beginning of a line (zero width)
+|-
+| <code>\_$</code> || the end of a line (zero width)
+|-
+| <code>\_.</code> || any character including a newline
+|}
+Example searches:
-These are odd beasts to work with.
+;<code>/abc\n*def</code>
+:Finds <code>abc</code> followed by zero or more newlines then <code>def</code>.
+:Finds <code>abcdef</code> or <code>abc</code> followed by blank lines and <code>def</code>.
+:The blank lines have to be empty (no space or tab characters).
+;<code>/abc\_s*def</code>
-\_^ is a useful marker.
+:Finds <code>abc</code> followed by any whitespace or newlines then <code>def</code>.
+:Finds <code>abcdef</code> or <code>abc</code> followed by blank lines and <code>def</code>.
+:The blank lines can contain any number of space or tab characters.
+:There may be whitespace after <code>abc</code> or before <code>def</code>.
+;<code>/abc\_$\_s*def</code>
-<pre>  end one line\_^begin the next</pre>
+:Finds <code>abc</code> at end-of-line followed by any whitespace or newlines then <code>def</code>.
+:There must be no characters (other than a newline) following <code>abc</code>.
+:There can be any number of space, tab or newline characters before <code>def</code>.
+;<code>/abc\_s*\_^def</code>
-\_$ is not equivalent. It also is a zero-length marker, but that means
+:Finds <code>abc</code> followed by any whitespace or newlines then <code>def</code> where <code>def</code> begins a line.
-the end-of-line characters remain between it and the next line.
+:There must be no characters (other than a newline) before <code>def</code>.
-The following never matches, because u doesn't match the end-of-line character.
+:There can be any number of space, tab or newline characters after <code>abc</code>.
-<pre>  end one line\_$um</pre>
+;<code>/abc\_$def</code>
+:Finds nothing because <code>\_$</code> is "zero width" so the search is looking for <code>abcdef</code> where <code>abc</code> is also at end-of-line (which cannot occur).
+;<code>/abc\_^def</code>
-This does what you want, though:
+:Finds nothing because <code>\_^</code> is "zero width" so the search is looking for <code>abcdef</code> where <code>def</code> is also at beginning-of-line (which cannot occur).
+;<code>/abc\_.\{-}def</code>
-<pre>  end one line\nnext line</pre>
+:Finds <code>abc</code> followed by any characters or newlines (as few as possible) then <code>def</code>.
+:Finds <code>abcdef</code> or <code>abc</code> followed by any characters then <code>def</code>.
+;<code>/abc\(\_s.*\)\{0,18\}\_sdef</code>
-\_s is a different kind of beast. You can insert the underscore in any
+:Finds a block of 0 to 18 lines enclosed by <code>abc</code> and <code>def</code>.
-of the character-class atoms to include line-ends in the class. In this case
+:limiting the number of lines is important, replacing this by a star will cause vim to consume 100% CPU.
-the match position moves past a line-end when it matches. So \_S* matches
-any sequence of NON-white-space characters, even across multiple lines.
+==Searching for multiline HTML comments==
-The last member of the set is \_., which matches any character in the buffer,
+It is common for comments in HTML documents to span several lines:
-including line-ends.  \_.* matches the rest of the buffer from the current
+<pre>
-position.
+<!-- This comment
+ covers two lines. -->
+</pre>
+The following search finds any HTML comment:
+<pre>
+/<!--\_.\{-}-->
+</pre>
+The atom <code>\_.</code> finds any character including end-of-line. The multi <code>\{-}</code> matches as few as possible (stopping at the first "<code>--></code>"; the multi <code>*</code> is too greedy and would stop at the last occurrence).
-==References==
-*{{help|pattern}}
+Syntax highlighting may be not be accurate, particularly with long comments. The following command will improve the accuracy when jumping in the file, but may be slower ({{help|:syn-sync}}):
-==Comments==
-To seek out HTML comments over ''multiple'' lines, for example:
 <pre>
+:syntax sync fromstart
-<!-- foobar does
- not exist -->
 </pre>
+==Searching over multiple lines==
-Use the search:
+A pattern can find any specified characters, for example, <code>[aeiou]</code> matches 'a' or 'e' or 'i' or 'o' or 'u'. In addition, Vim defines several character classes. For example, <code>\a</code> is <code>[A-Za-z]</code> (matches any alphabetic character), and <code>\A</code> is <code>[^A-Za-z]</code> (opposite of <code>\a</code>; matches any non-alphabetic character). {{help|/\a}}
-<pre>/<!--\_p\{-}--></pre>
+An underscore can be used to extend a character class to include a newline (end of line). For example, searching for <code>\_[aeiou]</code> finds a newline or a vowel, so <code>\_[aeiou]\+</code> matches any sequence of vowels, even a sequence spanning multiple lines. Similarly, <code>\_a\+</code> matches any sequence of alphabetic characters, even when spanning multiple lines.
-We used \{-} the "few as possible" operator rather than * which is too greedy when there are many such comments in the file.
-The key is of course \_p which is printable characters including EOL end-of-lines.
+The following search pattern finds "hello world" where any non-alphabetic characters separate the words:
+<pre>
+hello\_[^a-zA-Z]*world
+</pre>
+The above pattern (which is equivalent to <code>hello\_A*world</code>) matches "helloworld", and "hello? ... world", and similar strings, even if "hello" is on one line and "world" is on a following line.
-However, the highlighting is very erratic when the span over number of lines exceeds, say, 30. And highlighting is rather spotty when there are shifts in screen views. This is due to the default that improves highlighting performance.
+==Searching over multiple lines with a user command==
-If you want to ensure the most accurate highlighting, try:
+The script below defines the command <code>:S</code> that will search for a phrase, even when the words are on different lines. Examples:
-<pre>:syntax sync fromstart</pre>
+;<code>:S hello world</code>
+:Searches for "hello" followed by "world", separated by whitespace including newlines.
+;<code>:S! hello world</code>
+:Searches for "hello" followed by "world", separated by any non-word characters (whitespace, newlines, punctuation).
+:Finds, for example, "hello, world" and "hello+world" and "hello ... world". The words can be on different lines.
+After entering the command, press <code>n</code> or <code>N</code> to search for the next or previous occurrence.
-This can slow things down on large files with complex highlighting
-{{help|:syn-sync}}
+Put the following in your [[vimrc]] (or in file <code>searchmultiline.vim</code> in your plugin directory):
-----
+<source lang="vim">
-For some reason <!--\_p\{-}--> doesn't work if your comments are indented (with opening and closing comment tag indented).
+" Search for the ... arguments separated with whitespace (if no '!'),
+" or with non-word characters (if '!' added to command).
+function! SearchMultiLine(bang, ...)
+  if a:0 > 0
+    let sep = (a:bang) ? '\_W\+' : '\_s\+'
+    let @/ = join(a:000, sep)
+  endif
+endfunction
+command! -bang -nargs=* -complete=tag S call SearchMultiLine(<bang>0, <f-args>)|normal! /<C-R>/<CR>
+</source>
+==See also==
-Here's another way to highlight HTML comments using conventional regex:
+*[[Searching]] how to search
-<pre>/<\!--\(.\|\n\)*--></pre>
+*[[Search patterns]] regex information and examples
+*[[Search for visually selected text]] search for selected text; finds targets on multiple lines
+==References==
-However, this one will spill over to the next comment if there's more than one so it's not too useful.
+*{{help|pattern}}
+==Comments==
-----
-The TAB character is among the control chars, thus not matched with \p per default.
-----
-The script attached to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=256743 offers a convenient way to address this issue, you can type ":S foo bar" and it is translated to "/sfoo\_s\+bar"
-The following works even better for me:
-copy
-<pre>
-:py <<EOF
-import vim
-def MySearch(*args):
-    s="\\_s\\+".join(args)
-    vim.command("/"+s)
-EOF
-command -nargs=* -complete=tag S :py MySearch(<f-args>)
-</pre>
-into a file ~/.vim/project/blanksearch.vim
-Note the tab (not spaces!!!) in the two indented lines.
-The advantage of this version is that N and n work afterwards.
-----