Vim Tips Wiki
No edit summary
(Change <tt> to <code>, perhaps also minor tweak.)
(31 intermediate revisions by 10 users not shown)
Line 1: Line 1:
  +
{{TipNew
{{review}}
 
  +
|id=1585
{{TipImported
 
  +
|previous=1584
|id=26
 
  +
|next=1586
|previous=25
 
  +
|created=2008
|next=27
 
|created=March 5, 2001
 
 
|complexity=basic
 
|complexity=basic
|author=scrott
+
|author=
|version=6.0
+
|version=7.0
  +
|subpage=/200802
|rating=2031/784
 
|category1=Duplicate
+
|category1=Fileformat
|category2=File_Handling
+
|category2=
|category3=Windows
 
 
}}
 
}}
  +
Vim recognizes three file formats (unix, dos, mac) that determine what line ending characters (line terminators) are removed from each line when a file is read, or are added to each line when a file is written. A file format problem can display <code>^M</code> characters, or can prevent scripts from running correctly. This tip explains how to avoid problems, and how to convert from one file format to another. Use of the <code>'fileformat'</code> and <code>'fileformats'</code> options is also explained. See [[#Converting the current file|below]] if all you want to know is how to remove ^M characters, or how to fix the line endings in the file you are working on (in brief, enter <code>:e ++ff=dos</code> to remove <code>^M</code> when viewing a file).
'''See [[File format]] for other suggestions. May merge these tips later.'''
 
----
 
If you work in a mixed environment (Unix and DOS or Windows), you will often see ^M when you open a file. Under Unix, a line is supposed to be terminated with an LF (linefeed) character, but DOS/Windows use CR LF (carriage return followed by linefeed). Sometimes, LF is written as NL (newline).
 
   
  +
The line terminator expected for each file format is:
If Vim believes that a file has unix format, any CR will display as ^M (because CR is Ctrl-M).
 
  +
{| class="cleartable"
  +
|unix || LF only (each line ends with an LF character).
  +
|-
  +
|dos || CRLF (each line ends with two characters, CR then LF).
  +
|-
  +
|mac || CR only (each line ends with a CR character).
  +
|}
  +
CR is ''carriage return'' (return cursor to left margin), which is Ctrl-M or ^M or hex 0D.
   
  +
LF is ''linefeed'' (move cursor down), which is Ctrl-J or ^J or hex 0A. Sometimes, LF is written as NL (newline).
Vim supports three file formats: '''unix''' (lines end with LF), '''dos''' (lines end with CR LF), and '''mac''' (lines end with CR for Mac OS version 9 or early, lines end with LF after Mac OS X).
 
   
  +
Mac OS version 9 and earlier use mac line endings, while Mac OS X and later use unix line endings.
Use <tt>:set ffs?</tt> to see the value of your <tt>ffs</tt> (fileformats) option. {{help|'ffs'}} explains how Vim interprets a file when it is read.
 
   
  +
==File format options==
Use <tt>:set ff?</tt> to see the value of <tt>ff</tt> (fileformat) for the current buffer. This setting show how Vim interpreted the file when it was read. If you change <tt>ff</tt>, the file will be written with different line endings when you next write the file. See {{help|'ff'}}.
 
  +
The <code>'fileformat'</code> option is local to each buffer. It is set by Vim when a file is read, or can be specified in a command telling Vim how to read a file. In addition, the <code>'fileformat'</code> option can be changed to specify the line endings that will be added to each line when the buffer is written to a file.
   
  +
The <code>'fileformats'</code> option is global and specifies which file formats will be tried when Vim reads a file (unless otherwise specified, Vim attempts to automatically detect which file format should be used to read a file). The first file format in <code>'fileformats'</code> is also used as the default for a new buffer.
Some programs create files with inconsistent line endings. For example, the first few lines may end with LF, and Vim will decide the file has the unix file format. But later lines may use CR LF and Vim may show the CR as ^M.
 
   
  +
The following command displays the <code>fileformat</code> option (abbreviated as <code>ff</code>) for the current buffer, and the <code>fileformats</code> global option (abbreviated as <code>ffs</code>) which determines how Vim reads and writes files: {{help|'ff'}} {{help|'ffs'}}
To replace every CR with LF (when searching, <tt>\r</tt> matches CR, but when replacing, <tt>\r</tt> inserts LF):
 
 
<pre>
 
<pre>
  +
:set ff? ffs?
:%s/\r/\r/g
 
 
</pre>
 
</pre>
   
  +
This command also shows where each option was last set:
To delete every ^M from the buffer:
 
 
<pre>
 
<pre>
  +
:verbose set ff? ffs?
:%s/\r//g
 
 
</pre>
 
</pre>
   
  +
The <code>fileformats</code> option is often not explicitly set (the defaults are usually adequate). However, the above command may indicate that the option was set in your [[vimrc]] because that file probably contains <code>set nocompatible</code> which sets many options.
The <tt>\r</tt> is two characters that are interpreted as the single character CR ({{help|/\r}}). The above command will delete all CR characters, regardless of where they occur in a line.
 
   
  +
==File format detection==
To delete ^M only when it occurs at the end of a line:
 
  +
The <code>'fileformats'</code> option (<code>'ffs'</code>) has these defaults:
<pre>
 
  +
{| class="cleartable"
:%s/\r$//
 
  +
|<code>ffs=unix,dos</code> || Unix based systems
</pre>
 
  +
|-
  +
|<code>ffs=dos,unix</code> || Windows and DOS systems
  +
|-
  +
|<code>ffs=mac,unix,dos</code> || Mac OS 9 systems
  +
|}
   
  +
When a file is read, the order of the items specified in <code>'ffs'</code> has no effect (for example, <code>ffs=unix,dos</code> has the same effect as <code>ffs=dos,unix</code> when reading). The order is only important when a new buffer is created (if not empty, the first item in <code>'ffs'</code> is used as the file format for a new buffer; this determines which line endings will be added when the buffer is saved).
To delete ^M at line endings, and replace it with a space everywhere else:
 
<pre>
 
:%s/\r$//
 
:%s/\r/ /g
 
</pre>
 
   
  +
Suppose your system has <code>ffs=dos,unix</code> and you open an existing file. Vim will look for both dos and unix line endings, but Vim has a built-in preference for the unix format.
It is possible to use an actual CR instead of the two characters <tt>\r</tt> in the above commands. To enter a CR, you type Ctrl-V Ctrl-M (or Ctrl-Q Ctrl-M under Windows). The <Enter> key can be used in place of Ctrl-M. You can also use Ctrl-K (enter digraph) to insert the CR, by typing Ctrl-k and then <Enter> twice (once for the key value, once to accept the entry). However, using <tt>\r</tt> is more easily read, copied, pasted, etc., and thus is the recommended procedure.
 
  +
*If all lines in the file end with CRLF, the dos file format will be applied, meaning that each CRLF is removed when reading the lines into a buffer, and the buffer <code>'ff'</code> option will be dos.
  +
*If one or more lines end with LF only, the unix file format will be applied, meaning that each LF is removed (but each CR will be present in the buffer, and will display as <code>^M</code>), and the buffer <code>'ff'</code> option will be unix.
   
  +
==Converting the current file==
==Unix users==
 
  +
A common problem is that you open a file and see <code>^M</code> at the end of many lines. Entering <code>:set ff?</code> will probably show that the file was read as unix: the problem is that some lines actually end with CRLF while others end with LF. To fix this, you need to tell Vim to read the file again using dos file format. When reading as dos, all CRLF line endings, and all LF-only line endings, are removed. Then you need to change the file format for the buffer and save the file. The following procedures will easily handle this situation, but they only work reliably on reasonably recent versions of Vim (7.2.40 or higher).
Under Unix, you may find that a Vim script does not work because you have downloaded a script that contains Carriage Return (CR) characters. Each CR displays as ^M, and will cause some scripts to fail.
 
   
  +
;Convert from dos/unix to unix
If you put script.vim in your plugin directory, you may not see any useful error messages about CR characters. You can source the script after starting Vim (for example, <tt>:source ~/.vim/plugin/script.vim</tt>) to see if errors including "^M" are shown.
 
  +
To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with LF only:
  +
{| class="cleartable"
  +
|<code>:update</code> || Save any changes.
  +
|-
  +
|<code>:e ++ff=dos</code> || Edit file again, using dos file format (<code>'fileformats'</code> is ignored).<ref name="broken" group="A">The <code>:e</code> command reads the current file again, using the <code>++ff=dos</code> option so the read will omit all CRLF and LF-only line terminators (dos file format). Each <code>^M</code> at the end of a line should disappear. Some older versions of Vim do not perform this step correctly and the <code>^M</code> endings are not removed; upgrade Vim to fix. {{help|:e}}</ref>
  +
|-
  +
|<code>:setlocal ff=unix</code> || This buffer will use LF-only line endings when written.<ref group="A">Use <code>:setlocal</code> (or <code>:setl</code>) to avoid changing the global default.</ref>
  +
|-
  +
|<code>:w</code> || Write buffer using unix (LF-only) line endings.
  +
|}
   
  +
In the above, replacing <code>:set ff=unix</code> with <code>:set ff=mac</code> would write the file with mac (CR-only) line endings. Or, if it was a mac file to start with, you would use <code>:e ++ff=mac</code> to read the file correctly, so you could convert the line endings to unix or dos.
If there are many files that you need to fix, you can use the Unix <tt>dos2unix</tt> utility, for example:
 
   
  +
;Convert from dos/unix to dos
<pre>
 
  +
To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with CRLF only:
cd .vim
 
  +
{| class="cleartable"
dos2unix */*.vim
 
  +
|<code>:update</code> || Save any changes.
# Or, if run as root, you can preserve timestamps:
 
  +
|-
dos2unix -p */*.vim
 
  +
|<code>:e ++ff=dos</code> || Edit file again, using dos file format (<code>'fileformats'</code> is ignored).<ref name="broken" group="A"/>
</pre>
 
  +
|-
  +
|<code>:w</code> || Write buffer using dos (CRLF) line endings.
  +
|}
   
  +
;Notes A
==Comments==
 
  +
<references group="A"/>
{{Todo}}
 
Explain use of:
 
<pre>
 
:set ff=unix
 
:set ff=dos
 
</pre>
 
   
  +
==Converting clean files==
The following reads file foo.txt using dos file format (CR LF is the expected line ending, but LF will also be accepted as a line ending). The second command writes the file using the unix file format, so every line will end with LF only. So this converts from dos format (or a mixture of dos/unix), to unix format.
 
  +
When working with "clean" files (where every line has the same line ending), Vim's default settings provide reliable file format detection, and conversion is easy.
<pre>
 
  +
:e ++ff=dos foo.txt
 
  +
Suppose you have a collection of files where some are dos (every line ends with CRLF), and others are unix (every line ends with LF only). To convert all the dos files to unix (while not modifying the unix files):<ref group="B" name="failmix">This procedure will fail if a file has a mixture of dos and unix line endings because such files are detected as unix, and the CR characters are retained in the buffer.</ref>
:w ++ff=unix
 
  +
{| class="cleartable"
</pre>
 
  +
|<code>:args *.c *.h</code> || Specify the files to convert.<ref group="B">This example processes all <code>*.c</code> and <code>*.h</code> files in the current directory by setting the argument list to the wanted names. {{help|:args}}</ref>
  +
|-
  +
|<code>:argdo set ff=unix&#124;update</code> || For each argument, set unix file format for the buffer, and save the file if needed.<ref group="B">The <code>:argdo</code> command operates on each file in the argument list. For each file, it sets the buffer to use unix file format. That sets the modified flag for buffers that were detected as dos. The <code>:update</code> command writes the buffer if its modified flag is set.</ref>
  +
|}
  +
  +
Suppose you have a collection of files where some are dos (every line ends with CRLF), and others are unix (every line ends with LF only). To convert all the unix files to dos (while not modifying the dos files):<ref group="B" name="failmix"/>
  +
{| class="cleartable"
  +
|<code>:args *.c *.h</code> || Specify the files to convert.
  +
|-
  +
|<code>:argdo set ff=dos&#124;update</code> || For each argument, set dos file format for the buffer, and save the file if needed.
  +
|}
  +
  +
If you have opened several files where some are dos and some are unix, you can convert the dos files to unix:<ref group="B" name="failmix"/>
  +
{| class="cleartable"
  +
|<code>:bufdo! set ff=unix&#124;w</code> || For each buffer, set unix file format, and write the file.
  +
|}
  +
  +
If you have opened several files where some are dos and some are unix, you can convert the unix files to dos:<ref group="B" name="failmix"/>
  +
{| class="cleartable"
  +
|<code>:bufdo! set ff=dos&#124;w</code> || For each buffer, set dos file format, and write the file.
  +
|}
  +
  +
;Notes B
  +
<references group="B"/>
  +
  +
==Converting mixed files==
  +
When working with "mixed" files (where some lines have one kind of terminator, while other lines have a different terminator), reliable conversion requires more effort. Some methods do not work reliably with older Vim 7.2 versions. The procedures here should work in Vim 7.2 and later.
  +
  +
;Convert from dos/unix to unix
  +
To convert from any mixture of CRLF endings and LF-only endings, to LF-only endings:<ref group="C" name="defect">A defect with this procedure is that all files are modified, even if no change was required. That is, a file is written even if it originally had all lines in the wanted format.</ref>
  +
{| class="cleartable"
  +
|<code>:set hidden</code> || Allow modified buffers to be hidden.
  +
|-
  +
|<code>:set ffs=dos</code> || Assume dos line endings (CRLF or LF-only) when reading files.
  +
|-
  +
|<code>:args *.c *.h</code> || Specify the files to convert.
  +
|-
  +
|<code>:argdo set ff=unix&#124;w</code> || For each argument, set unix file format for the buffer, and write the file.<ref group="C">The <code>:argdo</code> command operates on each file in the argument list. For each file, it sets the buffer to use unix file format, and writes the file (even if the file has not been marked as modified). An alternative would be to use <code>:argdo w ++ff=unix</code> which will write each file as unix, with the potential problem that the buffer will still be marked as using dos format (so if you later make a change and save it, the file will be written in dos format).</ref>
  +
|}
  +
  +
;Convert from dos/unix to dos
  +
To convert from any mixture of CRLF endings and LF-only endings, to CRLF endings:<ref group="C" name="defect"/>
  +
{| class="cleartable"
  +
|<code>:set ffs=dos</code> || Assume dos line endings (CRLF or LF-only) when reading files.
  +
|-
  +
|<code>:args *.c *.h</code> || Specify the files to convert.
  +
|-
  +
|<code>:argdo w</code> || Write each file with CRLF line endings.
  +
|}
  +
  +
;Notes C
  +
<references group="C"/>
  +
  +
==Removing unwanted CR or LF characters==
  +
First ensure you have read the file with the appropriate file format. For example, use <code>:e ++ff=dos</code> to remove all CRLF and LF-only line terminators, or use <code>:e ++ff=mac</code> if the file uses CR as a line terminator,.
  +
  +
After reading with the correct file format, the buffer may still contain unwanted CR characters. You can search for these with <code>/\r</code> (slash starts a search; backslash <code>r</code> represents CR when searching; press Enter to search).
   
  +
To delete <code>^M</code> at line endings, and replace it with a space everywhere else (the <code>c</code> flag will prompt to confirm that you want each replacement, and the <code>e</code> flag prevents an error message if the string is not found):
The following converts from mac format to unix format.
 
 
<pre>
 
<pre>
  +
:%s/\r\+$//e
:e ++ff=mac foo.txt
 
  +
:%s/\r/ /gce
:w ++ff=unix
 
 
</pre>
 
</pre>
   
  +
To process, say, all <code>*.txt</code> files in the current directory:
----
 
To process, say, all *.txt files in the current directory:
 
 
<pre>
 
<pre>
 
vim *.txt
 
vim *.txt
 
:set hidden
 
:set hidden
:bufdo %s/\r$//e
+
:bufdo %s/\r\+$//e
:bufdo %s/\r/ /eg
+
:bufdo %s/\r/ /ge
 
:xa
 
:xa
 
</pre>
 
</pre>
   
  +
To delete every <code>^M</code>, regardless of where they occur in a line (this is not a good idea if two lines were separated only by a CR because the command joins the lines together):
----
 
Suppose you do not want to change the file, but you want to hide the ^M characters. You can use:
 
 
<pre>
 
<pre>
  +
:%s/\r//g
:hi SpecialKey guifg=bg
 
 
</pre>
 
</pre>
   
  +
To replace every CR with LF (when searching, <code>\r</code> matches CR, but when replacing, <code>\r</code> inserts LF; this is not a good idea if LF occurs at the end of a line, because an extra blank line will be created):
Unfortunately, this hides ''all'' special characters.
 
:As discussed above, just doing <tt>:e ++ff=dos</tt> will re-read the file in DOS mode. This will also not modify the file, until you save anyway.
 
 
If you want to remove ^M at the end of each line, yet have Vim regard the file as unchanged, enter:
 
 
<pre>
 
<pre>
:%s/\r$//e | set nomod
+
:%s/\r/\r/g
 
</pre>
 
</pre>
   
  +
If a file uses CR line terminators, it should be read as mac (using <code>:e ++ff=mac</code>). After doing that, you may see unwanted ^J (LF) characters. In a mac buffer, all CR characters will have been removed because CR is the line terminator, and searching for <code>\r</code> will find unwanted LF characters. Use these commands to remove ^J from the start of all lines, and to replace all other ^J with a line break:
The <tt>e</tt> flag means no error occurs if no matches are found. Also see {{help|'mod'}}.
 
 
----
 
The following is a little dangerous as it attempts to delete all CR characters from every file that you open. You could put this in your vimrc file.
 
 
 
<pre>
 
<pre>
  +
%s/^\r//e
" Delete one or more CR at the end of each line.
 
  +
%s/\r/\r/ge
autocmd BufRead * silent! %s/\r\+$//
 
" Alternative: Also delete trailing whitespace.
 
autocmd BufRead * silent! %s/[\r \t]\+$//
 
 
</pre>
 
</pre>
   
  +
==Terminator after last line==
----
 
  +
Every line in a text file should have a terminator (for example, a dos file should end with CRLF). When reading a file, Vim accepts the last line as a normal line, even if it has no terminator. Normally, Vim writes a terminator after every line, including the last. For rare occasions, it is possible to save a file with no terminator after the last line:
If you want to write a file with no linebreak at the end:
 
 
<pre>
 
<pre>
:set noeol bin
+
:set noendofline binary
  +
:w
 
</pre>
 
</pre>
   
  +
The above only works in Unix, and must be manually triggered. With some scripting, it is possible to [[Preserve_missing_end-of-line_at_end_of_text_files|automatically preserve a missing end-of-line]] on any file format.
Now, when you write the file, the last line will have no line ending. You do ''not'' normally want this. Every line, including the last, should have a line ending.
 
   
  +
Some obsolete dos files use Ctrl-Z as an end-of-file character. When reading a dos file, Vim accepts any Ctrl-Z bytes within the file as normal characters (these will appear in the buffer as ^Z), however if Ctrl-Z is the last byte in the file, it is omitted.
----
 
I had a bunch of HTML files that needed to be converted. As they were all open I just did the following:
 
<pre>
 
:bufdo! set ff=unix | update
 
</pre>
 
   
  +
==How file format conversion works==
----
 
  +
Understanding the principles involved in converting file formats can help avoid mistakes.
You can display the file format for the current buffer in the status line:
 
<pre>
 
set statusline=%<%f%h%m%r%=%{&ff}\ %l,%c%V\ %P
 
</pre>
 
   
  +
Suppose you have some files that use a mixture of CRLF and LF-only line endings (all line terminators use CRLF, or all use LF-only, or there are some of each). These steps are required when converting each file:
----
 
  +
*Read the file as dos so any text ending with CRLF or LF-only is regarded as a line. These line endings (CRLF and LF) are removed and are not present in the buffer.
All I care about is if the file format is ''not'' unix. If it's not, I want a big red warning. That way I'm not the jerk who checks in a file that causes every line to get modified by the diff patch.
 
  +
*If you want to force all line endings to CRLF, write as dos. The <code>:w</code> command is required (not <code>:update</code> or <code>:wa</code> because these only write if the buffer has not been modified, and no modification has occurred).
  +
*If you want to force all line endings to LF-only, write as unix.
  +
*If you want to force all line endings to CR-only, write as mac.
   
  +
If all lines in a file end with LF-only, the file can be converted to use CRLF endings by reading as unix and writing as dos. However, if some lines end with CRLF, reading a file as unix will keep each CR in the buffer, and writing the file using any format will write each CR to the file, as if it were a normal character. When writing, line endings are added, so any CR characters that were in the original file, will be written in addition to line endings.
So, I added this to my existing statusline:
 
<pre>
 
%9*%{&ff=='unix'?'':&ff.'\ format'}%*
 
</pre>
 
   
  +
An LF-only file can also be converted to CRLF by reading as dos and writing as dos.
Here's what is does:
 
<pre>
 
%9*
 
\- Change highlighting to user setting #9 (see :he hl-User1..9)
 
%{
 
\- Begin evaluating as expression until } is encountered
 
&ff=='unix'?'':&ff.'\ format'
 
\- This is a ternary that returns either an empty string, or 'XX format'
 
}
 
\- This marks the end of the expression
 
%*
 
\- Restores normal highlight
 
</pre>
 
   
  +
When reading a file as dos, if a CR followed by LF is encountered (CRLF), those two bytes are removed, and the preceding text is regarded as a line. Similarly, if LF is encountered, it is removed, and the preceding text is regarded as a line. However, if a CR is encountered (without a following LF), the CR will be regarded as a normal character and will be copied into the buffer where it will be displayed as <code>^M</code> (Ctrl-M, the code for CR).
So, how do you use it? First I call:
 
  +
  +
==Results of incorrect file format detection==
  +
Suppose a file contains two lines:
 
<pre>
 
<pre>
  +
Line 1
:set statusline?
 
  +
Line 2
 
</pre>
 
</pre>
   
  +
When reading the file, if Vim does not correctly detect the file format, here is what you will see in the buffer (<code>^J</code> is Ctrl-J or LF; <code>^M</code> is Ctrl-M or CR).
Which returns:
 
<pre>
 
statusline=%<%f :: %{TagName()} %(%h%m%r %)%=%-15.15(%l,%c%V%)%P
 
</pre>
 
   
  +
File has unix line endings; file read with <code>ff=dos</code>:
But, remember that if you want to set a status line you must escape all white space. So that line would have to be entered as:
 
 
<pre>
 
<pre>
  +
Line 1
statusline=%<%f\ ::\ %{TagName()}\ %(%h%m%r\ %)%=%-15.15(%l,%c%V%)%P
 
  +
Line 2
 
</pre>
 
</pre>
   
  +
File has unix line endings; file read with <code>ff=mac</code>:
So when I added my modification I have:
 
 
<pre>
 
<pre>
  +
Line 1^JLine 2^J
:set statusline=%<%f\ :%9*%{&ff=='unix'?'':&ff.'\ format'}%*:\ %{TagName()}\ %(%h%m%r\ %)%=%-15.15(%l,%c%V%)%P
 
 
</pre>
 
</pre>
   
  +
File has dos line endings; file read with <code>ff=unix</code>:
But then, to make the user highlighting #9 big and red I started by viewing all the existing highlighting configurations (I'm too lazy to write my own) by calling:
 
 
<pre>
 
<pre>
  +
Line 1^M
:hi
 
  +
Line 2^M
 
</pre>
 
</pre>
   
  +
File has dos line endings; file read with <code>ff=mac</code>:
The entry titled ErrorMsg looked good to me so I copied its settings which were:
 
 
<pre>
 
<pre>
  +
Line 1
term=standout cterm=bold ctermfg=7 ctermbg=1
 
  +
^JLine 2
  +
^J
 
</pre>
 
</pre>
   
  +
File has mac line endings; file read with <code>ff=unix</code> or <code>ff=dos</code>:
I then called:
 
 
<pre>
 
<pre>
  +
Line 1^MLine 2^M
:hi User9 term=standout cterm=bold ctermfg=7 ctermbg=1
 
 
</pre>
 
</pre>
   
  +
==Vim script problems==
Now my status line is unchanged and uncluttered, unless I have opened a dos file. That's pretty cool.
 
  +
Under Unix, you may find that a Vim script does not work because you have downloaded a script that contains CR characters. Each CR displays as <code>^M</code> and will cause some scripts to fail.
   
  +
If you put, say, <code>script.vim</code> in your plugin directory, you may not see any useful error messages about CR characters when using Vim. You can source the script after starting Vim (for example, <code>:source ~/.vim/plugin/script.vim</code>) to see if errors including <code>^M</code> are shown. To fix, you need to convert the file to unix format.
I wrote this in REAL basic terms, because I really wish someone had explained it to me like this. I hope it's well received.
 
   
  +
==Pitfalls==
----
 
  +
Some suggestions for working with file formats suffer from pitfalls that are described here.
If you always want a particular file to have, say, dos file format, you can put a modeline in that file. For example, add this line (a C-style comment) near the beginning or end of the file:
 
  +
  +
You are editing a file which you expect to be in unix format, yet you see many <code>^M</code> characters. The following attempt to convert the file to unix format does not work:
  +
{| class="cleartable"
  +
|<code>:setlocal ff=unix</code> || Set unix file format for current buffer.
  +
|-
  +
|<code>:w</code> || Write buffer to file.
  +
|}
  +
  +
The file was probably already detected as unix format, so the :set ff=unix command will do nothing (the problem is that the file uses dos format, but Vim read it as unix because at least one line had an LF-only ending). Furthermore, each <code>^M</code> represents a CR character that is in the current buffer, and writing the buffer will write that CR to the file (not what you want).
  +
  +
You are editing a file which you expect to be in unix format, yet you see many <code>^M</code> characters. You perform the following to convert it to unix format, then perform further edits:
  +
{| class="cleartable"
  +
|<code>:e ++ff=dos</code> || Read file again in dos format, to accept both CRLF and LF-only line endings.
  +
|-
  +
|<code>:w ++ff=unix</code> || Write buffer to file using LF-only line endings.
  +
|-
  +
|... || Do some edits.
  +
|-
  +
|<code>:w</code> || Save the edits.
  +
|}
  +
  +
The first two steps above are correct, and the file will initially be written in unix format. However, the buffer is still marked as dos format, so the <code>:w</code> will overwrite the file using CRLF line endings. The <code>:e ++ff=dos</code> command tells Vim to read the file again, forcing dos file format. Vim will remove CRLF and LF-only line endings, leaving only the text of each line in the buffer. However, if you are going to edit the file, you need to use these commands:
  +
{| class="cleartable"
  +
|<code>:e ++ff=dos</code> || Read file again in dos format.
  +
|-
  +
|<code>:setlocal ff=unix</code> || Mark buffer so LF-only line endings will be used when buffer is written.
  +
|-
  +
|... || Do some edits.
  +
|-
  +
|<code>:w</code> || Save the edits (will use LF-only line endings).
  +
|}
  +
  +
Here is a mistaken attempt to convert certain files from dos to unix format by starting Vim with commands to convert all <code>*.c</code> and <code>*.h</code> files in the current directory:
  +
{| class="cleartable"
  +
|<code>vim +"argdo setlocal ff=unix" +wqa *.c *.h</code>
  +
|}
  +
  +
This will work if <code>'fileformats'</code> includes dos and if the files have only CRLF line endings. However, if <code>'fileformats'</code> includes both dos and unix, and if a file has at least one LF-only line ending, that file will be detected as unix, and any CR in the file will be shown in the buffer as <code>^M</code>. The <code>:setlocal ff=unix</code> will not flag a unix file as modified, so the <code>+wqa</code> command (same as <code>:xa</code>) will not save that file. If <code>:w</code> is used to write the buffer, nothing useful will be achieved because the CR characters will be written to the file.
  +
  +
==Other approaches==
  +
You may find a discussion of other techniques for handling line endings elsewhere. Some drawbacks of other procedures are mentioned here.
  +
  +
You can specify a file format for a particular file by inserting a modeline in that file. For example, in file <code>my.c</code>, you may put the following comment near the top or bottom of the file in an attempt to maintain dos line endings, regardless of what system is used to edit the file:
 
<pre>
 
<pre>
 
/* vim: set ff=dos: */
 
/* vim: set ff=dos: */
 
</pre>
 
</pre>
   
  +
In general, using a modeline is useless in this context, although it may help ''if'' the file format is correctly detected when the file is read, because the next write will save the file in the preferred format specified in the modeline. However, the modeline does not avoid problems, and may make problems worse. For example, if file <code>my.c</code> has one or more lines that end with LF only, and the file is edited on a default Windows system, the file will be detected as having unix format, and the modeline will then change the format to dos, which will set the buffer modified flag. The buffer will display each CR as <code>^M</code>. If you now save the file, each line will be written with a CRLF ending. However, the <code>^M</code> characters that were visible in the buffer will be written to the file, so some lines will now end with CRCRLF (two CR characters).
----
 
====Hide unwanted text====
 
Here is a technique to hide ^M line endings by highlighting them with the Ignore highlight group. This is from a tip that was proposed by Frienddaniel (the tip has been merged to here).
 
   
  +
Another unhelpful approach is to hide <code>^M</code> characters which occur at the end of a line by highlighting them with the Ignore highlight group:
 
<pre>
 
<pre>
 
:match Ignore /\r$/
 
:match Ignore /\r$/
 
</pre>
 
</pre>
   
  +
While this may be helpful as a quick workaround when viewing a file, in general, it is a misguided approach because the characters are hidden, but present, which will inevitably cause trouble when editing. In addition, it is much better to correctly handle the problem rather than temporarily hide it.
Any ^M characters that are embedded within lines (not at the line endings) will be displayed as normal because they are problems that need to be fixed.
 
   
  +
==Tools==
'''''Note''''' If ^M characters are at the end of every line, Vim ''should'' open the file in DOS file format. If you see the ^M, there are probably a few lines that do not have one, and the file has been opened in Unix file format. If this is the case, delete all the ^M characters with a simple <tt>:s</tt> command, and save the file (after <tt>:set fileformat=dos</tt> if desired). Just hiding the problem with a highlight group is probably a bad idea. If you hide the problem, you will never know if a file has pesky line endings that might screw up your regular expressions (^M doesn't match at the end of line '$' atom), plugins (TagList, for example, has no idea where to jump in a file with inconsistent line endings), or even your use of the file (some compilers, interpreters, parsers, etc will get thrown for a loop with unexpected line-endings, makefiles refuse to work, and all sorts of other trouble can happen). You should always prefer fixing a problem over hiding it!
 
  +
Several tools are available to convert files from one type of line ending to another. These need to be run at the command line, and are not related to Vim.
   
  +
On Unix-based systems, the <code>file</code> utility can display what kind of line endings are present in a file. For example, <code>file *.c</code> will report what line terminators (CRLF, CR, LF) are present in each <code>*.c</code> file. The <code>dos2unix</code> utility can convert from dos or mac format to unix, and the <code>unix2dos</code> utility can convert from unix to dos format, optionally while preserving file timestamps.
----
 
  +
  +
Many other conversion tools are available. For example, [https://ccrma.stanford.edu/~craig/utility/flip/ <code>flip</code>] can convert files between dos/mac/unix formats, and versions for each platform are available.
  +
  +
==See also==
  +
*[[VimTip736|Display file format in the status line]] show an alert if a non-native file format is used
  +
*[[Automatically reload files with mixed line-endings in DOS fileformat]] to load files in the correct format with no user intervention
  +
  +
==Comments==

Revision as of 12:34, 15 July 2012

Tip 1585 Printable Monobook Previous Next

created 2008 · complexity basic · version 7.0


Vim recognizes three file formats (unix, dos, mac) that determine what line ending characters (line terminators) are removed from each line when a file is read, or are added to each line when a file is written. A file format problem can display ^M characters, or can prevent scripts from running correctly. This tip explains how to avoid problems, and how to convert from one file format to another. Use of the 'fileformat' and 'fileformats' options is also explained. See below if all you want to know is how to remove ^M characters, or how to fix the line endings in the file you are working on (in brief, enter :e ++ff=dos to remove ^M when viewing a file).

The line terminator expected for each file format is:

unix LF only (each line ends with an LF character).
dos CRLF (each line ends with two characters, CR then LF).
mac CR only (each line ends with a CR character).

CR is carriage return (return cursor to left margin), which is Ctrl-M or ^M or hex 0D.

LF is linefeed (move cursor down), which is Ctrl-J or ^J or hex 0A. Sometimes, LF is written as NL (newline).

Mac OS version 9 and earlier use mac line endings, while Mac OS X and later use unix line endings.

File format options

The 'fileformat' option is local to each buffer. It is set by Vim when a file is read, or can be specified in a command telling Vim how to read a file. In addition, the 'fileformat' option can be changed to specify the line endings that will be added to each line when the buffer is written to a file.

The 'fileformats' option is global and specifies which file formats will be tried when Vim reads a file (unless otherwise specified, Vim attempts to automatically detect which file format should be used to read a file). The first file format in 'fileformats' is also used as the default for a new buffer.

The following command displays the fileformat option (abbreviated as ff) for the current buffer, and the fileformats global option (abbreviated as ffs) which determines how Vim reads and writes files: :help 'ff' :help 'ffs'

:set ff? ffs?

This command also shows where each option was last set:

:verbose set ff? ffs?

The fileformats option is often not explicitly set (the defaults are usually adequate). However, the above command may indicate that the option was set in your vimrc because that file probably contains set nocompatible which sets many options.

File format detection

The 'fileformats' option ('ffs') has these defaults:

ffs=unix,dos Unix based systems
ffs=dos,unix Windows and DOS systems
ffs=mac,unix,dos Mac OS 9 systems

When a file is read, the order of the items specified in 'ffs' has no effect (for example, ffs=unix,dos has the same effect as ffs=dos,unix when reading). The order is only important when a new buffer is created (if not empty, the first item in 'ffs' is used as the file format for a new buffer; this determines which line endings will be added when the buffer is saved).

Suppose your system has ffs=dos,unix and you open an existing file. Vim will look for both dos and unix line endings, but Vim has a built-in preference for the unix format.

  • If all lines in the file end with CRLF, the dos file format will be applied, meaning that each CRLF is removed when reading the lines into a buffer, and the buffer 'ff' option will be dos.
  • If one or more lines end with LF only, the unix file format will be applied, meaning that each LF is removed (but each CR will be present in the buffer, and will display as ^M), and the buffer 'ff' option will be unix.

Converting the current file

A common problem is that you open a file and see ^M at the end of many lines. Entering :set ff? will probably show that the file was read as unix: the problem is that some lines actually end with CRLF while others end with LF. To fix this, you need to tell Vim to read the file again using dos file format. When reading as dos, all CRLF line endings, and all LF-only line endings, are removed. Then you need to change the file format for the buffer and save the file. The following procedures will easily handle this situation, but they only work reliably on reasonably recent versions of Vim (7.2.40 or higher).

Convert from dos/unix to unix

To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with LF only:

:update Save any changes.
:e ++ff=dos Edit file again, using dos file format ('fileformats' is ignored).[A 1]
:setlocal ff=unix This buffer will use LF-only line endings when written.[A 2]
:w Write buffer using unix (LF-only) line endings.

In the above, replacing :set ff=unix with :set ff=mac would write the file with mac (CR-only) line endings. Or, if it was a mac file to start with, you would use :e ++ff=mac to read the file correctly, so you could convert the line endings to unix or dos.

Convert from dos/unix to dos

To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with CRLF only:

:update Save any changes.
:e ++ff=dos Edit file again, using dos file format ('fileformats' is ignored).[A 1]
:w Write buffer using dos (CRLF) line endings.
Notes A
  1. ^ a b The :e command reads the current file again, using the ++ff=dos option so the read will omit all CRLF and LF-only line terminators (dos file format). Each ^M at the end of a line should disappear. Some older versions of Vim do not perform this step correctly and the ^M endings are not removed; upgrade Vim to fix. :help :e
  2. ^ Use :setlocal (or :setl) to avoid changing the global default.

Converting clean files

When working with "clean" files (where every line has the same line ending), Vim's default settings provide reliable file format detection, and conversion is easy.

Suppose you have a collection of files where some are dos (every line ends with CRLF), and others are unix (every line ends with LF only). To convert all the dos files to unix (while not modifying the unix files):[B 1]

:args *.c *.h Specify the files to convert.[B 2]
:argdo set ff=unix|update For each argument, set unix file format for the buffer, and save the file if needed.[B 3]

Suppose you have a collection of files where some are dos (every line ends with CRLF), and others are unix (every line ends with LF only). To convert all the unix files to dos (while not modifying the dos files):[B 1]

:args *.c *.h Specify the files to convert.
:argdo set ff=dos|update For each argument, set dos file format for the buffer, and save the file if needed.

If you have opened several files where some are dos and some are unix, you can convert the dos files to unix:[B 1]

:bufdo! set ff=unix|w For each buffer, set unix file format, and write the file.

If you have opened several files where some are dos and some are unix, you can convert the unix files to dos:[B 1]

:bufdo! set ff=dos|w For each buffer, set dos file format, and write the file.
Notes B
  1. ^ a b c d This procedure will fail if a file has a mixture of dos and unix line endings because such files are detected as unix, and the CR characters are retained in the buffer.
  2. ^ This example processes all *.c and *.h files in the current directory by setting the argument list to the wanted names. :help :args
  3. ^ The :argdo command operates on each file in the argument list. For each file, it sets the buffer to use unix file format. That sets the modified flag for buffers that were detected as dos. The :update command writes the buffer if its modified flag is set.

Converting mixed files

When working with "mixed" files (where some lines have one kind of terminator, while other lines have a different terminator), reliable conversion requires more effort. Some methods do not work reliably with older Vim 7.2 versions. The procedures here should work in Vim 7.2 and later.

Convert from dos/unix to unix

To convert from any mixture of CRLF endings and LF-only endings, to LF-only endings:[C 1]

:set hidden Allow modified buffers to be hidden.
:set ffs=dos Assume dos line endings (CRLF or LF-only) when reading files.
:args *.c *.h Specify the files to convert.
:argdo set ff=unix|w For each argument, set unix file format for the buffer, and write the file.[C 2]
Convert from dos/unix to dos

To convert from any mixture of CRLF endings and LF-only endings, to CRLF endings:[C 1]

:set ffs=dos Assume dos line endings (CRLF or LF-only) when reading files.
:args *.c *.h Specify the files to convert.
:argdo w Write each file with CRLF line endings.
Notes C
  1. ^ a b A defect with this procedure is that all files are modified, even if no change was required. That is, a file is written even if it originally had all lines in the wanted format.
  2. ^ The :argdo command operates on each file in the argument list. For each file, it sets the buffer to use unix file format, and writes the file (even if the file has not been marked as modified). An alternative would be to use :argdo w ++ff=unix which will write each file as unix, with the potential problem that the buffer will still be marked as using dos format (so if you later make a change and save it, the file will be written in dos format).

Removing unwanted CR or LF characters

First ensure you have read the file with the appropriate file format. For example, use :e ++ff=dos to remove all CRLF and LF-only line terminators, or use :e ++ff=mac if the file uses CR as a line terminator,.

After reading with the correct file format, the buffer may still contain unwanted CR characters. You can search for these with /\r (slash starts a search; backslash r represents CR when searching; press Enter to search).

To delete ^M at line endings, and replace it with a space everywhere else (the c flag will prompt to confirm that you want each replacement, and the e flag prevents an error message if the string is not found):

:%s/\r\+$//e
:%s/\r/ /gce

To process, say, all *.txt files in the current directory:

vim *.txt
:set hidden
:bufdo %s/\r\+$//e
:bufdo %s/\r/ /ge
:xa

To delete every ^M, regardless of where they occur in a line (this is not a good idea if two lines were separated only by a CR because the command joins the lines together):

:%s/\r//g

To replace every CR with LF (when searching, \r matches CR, but when replacing, \r inserts LF; this is not a good idea if LF occurs at the end of a line, because an extra blank line will be created):

:%s/\r/\r/g

If a file uses CR line terminators, it should be read as mac (using :e ++ff=mac). After doing that, you may see unwanted ^J (LF) characters. In a mac buffer, all CR characters will have been removed because CR is the line terminator, and searching for \r will find unwanted LF characters. Use these commands to remove ^J from the start of all lines, and to replace all other ^J with a line break:

%s/^\r//e
%s/\r/\r/ge

Terminator after last line

Every line in a text file should have a terminator (for example, a dos file should end with CRLF). When reading a file, Vim accepts the last line as a normal line, even if it has no terminator. Normally, Vim writes a terminator after every line, including the last. For rare occasions, it is possible to save a file with no terminator after the last line:

:set noendofline binary
:w

The above only works in Unix, and must be manually triggered. With some scripting, it is possible to automatically preserve a missing end-of-line on any file format.

Some obsolete dos files use Ctrl-Z as an end-of-file character. When reading a dos file, Vim accepts any Ctrl-Z bytes within the file as normal characters (these will appear in the buffer as ^Z), however if Ctrl-Z is the last byte in the file, it is omitted.

How file format conversion works

Understanding the principles involved in converting file formats can help avoid mistakes.

Suppose you have some files that use a mixture of CRLF and LF-only line endings (all line terminators use CRLF, or all use LF-only, or there are some of each). These steps are required when converting each file:

  • Read the file as dos so any text ending with CRLF or LF-only is regarded as a line. These line endings (CRLF and LF) are removed and are not present in the buffer.
  • If you want to force all line endings to CRLF, write as dos. The :w command is required (not :update or :wa because these only write if the buffer has not been modified, and no modification has occurred).
  • If you want to force all line endings to LF-only, write as unix.
  • If you want to force all line endings to CR-only, write as mac.

If all lines in a file end with LF-only, the file can be converted to use CRLF endings by reading as unix and writing as dos. However, if some lines end with CRLF, reading a file as unix will keep each CR in the buffer, and writing the file using any format will write each CR to the file, as if it were a normal character. When writing, line endings are added, so any CR characters that were in the original file, will be written in addition to line endings.

An LF-only file can also be converted to CRLF by reading as dos and writing as dos.

When reading a file as dos, if a CR followed by LF is encountered (CRLF), those two bytes are removed, and the preceding text is regarded as a line. Similarly, if LF is encountered, it is removed, and the preceding text is regarded as a line. However, if a CR is encountered (without a following LF), the CR will be regarded as a normal character and will be copied into the buffer where it will be displayed as ^M (Ctrl-M, the code for CR).

Results of incorrect file format detection

Suppose a file contains two lines:

Line 1
Line 2

When reading the file, if Vim does not correctly detect the file format, here is what you will see in the buffer (^J is Ctrl-J or LF; ^M is Ctrl-M or CR).

File has unix line endings; file read with ff=dos:

Line 1
Line 2

File has unix line endings; file read with ff=mac:

Line 1^JLine 2^J

File has dos line endings; file read with ff=unix:

Line 1^M
Line 2^M

File has dos line endings; file read with ff=mac:

Line 1
^JLine 2
^J

File has mac line endings; file read with ff=unix or ff=dos:

Line 1^MLine 2^M

Vim script problems

Under Unix, you may find that a Vim script does not work because you have downloaded a script that contains CR characters. Each CR displays as ^M and will cause some scripts to fail.

If you put, say, script.vim in your plugin directory, you may not see any useful error messages about CR characters when using Vim. You can source the script after starting Vim (for example, :source ~/.vim/plugin/script.vim) to see if errors including ^M are shown. To fix, you need to convert the file to unix format.

Pitfalls

Some suggestions for working with file formats suffer from pitfalls that are described here.

You are editing a file which you expect to be in unix format, yet you see many ^M characters. The following attempt to convert the file to unix format does not work:

:setlocal ff=unix Set unix file format for current buffer.
:w Write buffer to file.

The file was probably already detected as unix format, so the :set ff=unix command will do nothing (the problem is that the file uses dos format, but Vim read it as unix because at least one line had an LF-only ending). Furthermore, each ^M represents a CR character that is in the current buffer, and writing the buffer will write that CR to the file (not what you want).

You are editing a file which you expect to be in unix format, yet you see many ^M characters. You perform the following to convert it to unix format, then perform further edits:

:e ++ff=dos Read file again in dos format, to accept both CRLF and LF-only line endings.
:w ++ff=unix Write buffer to file using LF-only line endings.
... Do some edits.
:w Save the edits.

The first two steps above are correct, and the file will initially be written in unix format. However, the buffer is still marked as dos format, so the :w will overwrite the file using CRLF line endings. The :e ++ff=dos command tells Vim to read the file again, forcing dos file format. Vim will remove CRLF and LF-only line endings, leaving only the text of each line in the buffer. However, if you are going to edit the file, you need to use these commands:

:e ++ff=dos Read file again in dos format.
:setlocal ff=unix Mark buffer so LF-only line endings will be used when buffer is written.
... Do some edits.
:w Save the edits (will use LF-only line endings).

Here is a mistaken attempt to convert certain files from dos to unix format by starting Vim with commands to convert all *.c and *.h files in the current directory:

vim +"argdo setlocal ff=unix" +wqa *.c *.h

This will work if 'fileformats' includes dos and if the files have only CRLF line endings. However, if 'fileformats' includes both dos and unix, and if a file has at least one LF-only line ending, that file will be detected as unix, and any CR in the file will be shown in the buffer as ^M. The :setlocal ff=unix will not flag a unix file as modified, so the +wqa command (same as :xa) will not save that file. If :w is used to write the buffer, nothing useful will be achieved because the CR characters will be written to the file.

Other approaches

You may find a discussion of other techniques for handling line endings elsewhere. Some drawbacks of other procedures are mentioned here.

You can specify a file format for a particular file by inserting a modeline in that file. For example, in file my.c, you may put the following comment near the top or bottom of the file in an attempt to maintain dos line endings, regardless of what system is used to edit the file:

/* vim: set ff=dos: */

In general, using a modeline is useless in this context, although it may help if the file format is correctly detected when the file is read, because the next write will save the file in the preferred format specified in the modeline. However, the modeline does not avoid problems, and may make problems worse. For example, if file my.c has one or more lines that end with LF only, and the file is edited on a default Windows system, the file will be detected as having unix format, and the modeline will then change the format to dos, which will set the buffer modified flag. The buffer will display each CR as ^M. If you now save the file, each line will be written with a CRLF ending. However, the ^M characters that were visible in the buffer will be written to the file, so some lines will now end with CRCRLF (two CR characters).

Another unhelpful approach is to hide ^M characters which occur at the end of a line by highlighting them with the Ignore highlight group:

:match Ignore /\r$/

While this may be helpful as a quick workaround when viewing a file, in general, it is a misguided approach because the characters are hidden, but present, which will inevitably cause trouble when editing. In addition, it is much better to correctly handle the problem rather than temporarily hide it.

Tools

Several tools are available to convert files from one type of line ending to another. These need to be run at the command line, and are not related to Vim.

On Unix-based systems, the file utility can display what kind of line endings are present in a file. For example, file *.c will report what line terminators (CRLF, CR, LF) are present in each *.c file. The dos2unix utility can convert from dos or mac format to unix, and the unix2dos utility can convert from unix to dos format, optionally while preserving file timestamps.

Many other conversion tools are available. For example, flip can convert files between dos/mac/unix formats, and versions for each platform are available.

See also

Comments