Changes: File format

Revision as of 12:34, 15 July 2012

Tip 1585 Printable Monobook Previous Next

created 2008 · complexity basic · version 7.0

Vim recognizes three file formats (unix, dos, mac) that determine what line ending characters (line terminators) are removed from each line when a file is read, or are added to each line when a file is written. A file format problem can display ^M characters, or can prevent scripts from running correctly. This tip explains how to avoid problems, and how to convert from one file format to another. Use of the 'fileformat' and 'fileformats' options is also explained. See below if all you want to know is how to remove ^M characters, or how to fix the line endings in the file you are working on (in brief, enter :e ++ff=dos to remove ^M when viewing a file).

The line terminator expected for each file format is:

unix	LF only (each line ends with an LF character).
dos	CRLF (each line ends with two characters, CR then LF).
mac	CR only (each line ends with a CR character).

CR is carriage return (return cursor to left margin), which is Ctrl-M or ^M or hex 0D.

LF is linefeed (move cursor down), which is Ctrl-J or ^J or hex 0A. Sometimes, LF is written as NL (newline).

Mac OS version 9 and earlier use mac line endings, while Mac OS X and later use unix line endings.

File format options

The 'fileformat' option is local to each buffer. It is set by Vim when a file is read, or can be specified in a command telling Vim how to read a file. In addition, the 'fileformat' option can be changed to specify the line endings that will be added to each line when the buffer is written to a file.

The 'fileformats' option is global and specifies which file formats will be tried when Vim reads a file (unless otherwise specified, Vim attempts to automatically detect which file format should be used to read a file). The first file format in 'fileformats' is also used as the default for a new buffer.

The following command displays the fileformat option (abbreviated as ff) for the current buffer, and the fileformats global option (abbreviated as ffs) which determines how Vim reads and writes files: :help 'ff' :help 'ffs'

:set ff? ffs?

This command also shows where each option was last set:

:verbose set ff? ffs?

The fileformats option is often not explicitly set (the defaults are usually adequate). However, the above command may indicate that the option was set in your vimrc because that file probably contains set nocompatible which sets many options.

File format detection

The 'fileformats' option ('ffs') has these defaults:

`ffs=unix,dos`	Unix based systems
`ffs=dos,unix`	Windows and DOS systems
`ffs=mac,unix,dos`	Mac OS 9 systems

When a file is read, the order of the items specified in 'ffs' has no effect (for example, ffs=unix,dos has the same effect as ffs=dos,unix when reading). The order is only important when a new buffer is created (if not empty, the first item in 'ffs' is used as the file format for a new buffer; this determines which line endings will be added when the buffer is saved).

Suppose your system has ffs=dos,unix and you open an existing file. Vim will look for both dos and unix line endings, but Vim has a built-in preference for the unix format.

If all lines in the file end with CRLF, the dos file format will be applied, meaning that each CRLF is removed when reading the lines into a buffer, and the buffer 'ff' option will be dos.
If one or more lines end with LF only, the unix file format will be applied, meaning that each LF is removed (but each CR will be present in the buffer, and will display as ^M), and the buffer 'ff' option will be unix.

Converting the current file

A common problem is that you open a file and see ^M at the end of many lines. Entering :set ff? will probably show that the file was read as unix: the problem is that some lines actually end with CRLF while others end with LF. To fix this, you need to tell Vim to read the file again using dos file format. When reading as dos, all CRLF line endings, and all LF-only line endings, are removed. Then you need to change the file format for the buffer and save the file. The following procedures will easily handle this situation, but they only work reliably on reasonably recent versions of Vim (7.2.40 or higher).

Convert from dos/unix to unix

To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with LF only:

`:update`	Save any changes.
`:e ++ff=dos`	Edit file again, using dos file format (`'fileformats'` is ignored).^{[A 1]}
`:setlocal ff=unix`	This buffer will use LF-only line endings when written.^{[A 2]}
`:w`	Write buffer using unix (LF-only) line endings.

In the above, replacing :set ff=unix with :set ff=mac would write the file with mac (CR-only) line endings. Or, if it was a mac file to start with, you would use :e ++ff=mac to read the file correctly, so you could convert the line endings to unix or dos.

Convert from dos/unix to dos

To convert the current file from any mixture of CRLF/LF-only line endings, so all lines end with CRLF only:

`:update`	Save any changes.
`:e ++ff=dos`	Edit file again, using dos file format (`'fileformats'` is ignored).^{[A 1]}
`:w`	Write buffer using dos (CRLF) line endings.

Notes A

^ ^a ^b The :e command reads the current file again, using the ++ff=dos option so the read will omit all CRLF and LF-only line terminators (dos file format). Each ^M at the end of a line should disappear. Some older versions of Vim do not perform this step correctly and the ^M endings are not removed; upgrade Vim to fix. :help :e
^ Use :setlocal (or :setl) to avoid changing the global default.

Converting clean files

When working with "clean" files (where every line has the same line ending), Vim's default settings provide reliable file format detection, and conversion is easy.

Suppose you have a collection of files where some are dos (every line ends with CRLF), and others are unix (every line ends with LF only). To convert all the dos files to unix (while not modifying the unix files):^{[B 1]}

`:args .c .h`	Specify the files to convert.^{[B 2]}
`:argdo set ff=unix\|update`	For each argument, set unix file format for the buffer, and save the file if needed.^{[B 3]}

Suppose you have a collection of files where some are dos (every line ends with CRLF), and others are unix (every line ends with LF only). To convert all the unix files to dos (while not modifying the dos files):^{[B 1]}

`:args .c .h`	Specify the files to convert.
`:argdo set ff=dos\|update`	For each argument, set dos file format for the buffer, and save the file if needed.

If you have opened several files where some are dos and some are unix, you can convert the dos files to unix:^{[B 1]}

:bufdo! set ff=unix|w For each buffer, set unix file format, and write the file.

If you have opened several files where some are dos and some are unix, you can convert the unix files to dos:^{[B 1]}

:bufdo! set ff=dos|w For each buffer, set dos file format, and write the file.

Notes B

^ ^a ^b ^c ^d This procedure will fail if a file has a mixture of dos and unix line endings because such files are detected as unix, and the CR characters are retained in the buffer.
^ This example processes all *.c and *.h files in the current directory by setting the argument list to the wanted names. :help :args
^ The :argdo command operates on each file in the argument list. For each file, it sets the buffer to use unix file format. That sets the modified flag for buffers that were detected as dos. The :update command writes the buffer if its modified flag is set.

Converting mixed files

When working with "mixed" files (where some lines have one kind of terminator, while other lines have a different terminator), reliable conversion requires more effort. Some methods do not work reliably with older Vim 7.2 versions. The procedures here should work in Vim 7.2 and later.

Convert from dos/unix to unix

To convert from any mixture of CRLF endings and LF-only endings, to LF-only endings:^{[C 1]}

`:set hidden`	Allow modified buffers to be hidden.
`:set ffs=dos`	Assume dos line endings (CRLF or LF-only) when reading files.
`:args .c .h`	Specify the files to convert.
`:argdo set ff=unix\|w`	For each argument, set unix file format for the buffer, and write the file.^{[C 2]}

Convert from dos/unix to dos

To convert from any mixture of CRLF endings and LF-only endings, to CRLF endings:^{[C 1]}

`:set ffs=dos`	Assume dos line endings (CRLF or LF-only) when reading files.
`:args .c .h`	Specify the files to convert.
`:argdo w`	Write each file with CRLF line endings.

Notes C

^ ^a ^b A defect with this procedure is that all files are modified, even if no change was required. That is, a file is written even if it originally had all lines in the wanted format.
^ The :argdo command operates on each file in the argument list. For each file, it sets the buffer to use unix file format, and writes the file (even if the file has not been marked as modified). An alternative would be to use :argdo w ++ff=unix which will write each file as unix, with the potential problem that the buffer will still be marked as using dos format (so if you later make a change and save it, the file will be written in dos format).

Removing unwanted CR or LF characters

First ensure you have read the file with the appropriate file format. For example, use :e ++ff=dos to remove all CRLF and LF-only line terminators, or use :e ++ff=mac if the file uses CR as a line terminator,.

After reading with the correct file format, the buffer may still contain unwanted CR characters. You can search for these with /\r (slash starts a search; backslash r represents CR when searching; press Enter to search).

To delete ^M at line endings, and replace it with a space everywhere else (the c flag will prompt to confirm that you want each replacement, and the e flag prevents an error message if the string is not found):

:%s/\r\+$//e
:%s/\r/ /gce

To process, say, all *.txt files in the current directory:

vim *.txt
:set hidden
:bufdo %s/\r\+$//e
:bufdo %s/\r/ /ge
:xa

To delete every ^M, regardless of where they occur in a line (this is not a good idea if two lines were separated only by a CR because the command joins the lines together):

:%s/\r//g

To replace every CR with LF (when searching, \r matches CR, but when replacing, \r inserts LF; this is not a good idea if LF occurs at the end of a line, because an extra blank line will be created):

:%s/\r/\r/g

If a file uses CR line terminators, it should be read as mac (using :e ++ff=mac). After doing that, you may see unwanted ^J (LF) characters. In a mac buffer, all CR characters will have been removed because CR is the line terminator, and searching for \r will find unwanted LF characters. Use these commands to remove ^J from the start of all lines, and to replace all other ^J with a line break:

%s/^\r//e
%s/\r/\r/ge

Terminator after last line

Every line in a text file should have a terminator (for example, a dos file should end with CRLF). When reading a file, Vim accepts the last line as a normal line, even if it has no terminator. Normally, Vim writes a terminator after every line, including the last. For rare occasions, it is possible to save a file with no terminator after the last line:

:set noendofline binary
:w

The above only works in Unix, and must be manually triggered. With some scripting, it is possible to automatically preserve a missing end-of-line on any file format.

Some obsolete dos files use Ctrl-Z as an end-of-file character. When reading a dos file, Vim accepts any Ctrl-Z bytes within the file as normal characters (these will appear in the buffer as ^Z), however if Ctrl-Z is the last byte in the file, it is omitted.

How file format conversion works

Understanding the principles involved in converting file formats can help avoid mistakes.

Suppose you have some files that use a mixture of CRLF and LF-only line endings (all line terminators use CRLF, or all use LF-only, or there are some of each). These steps are required when converting each file:

Read the file as dos so any text ending with CRLF or LF-only is regarded as a line. These line endings (CRLF and LF) are removed and are not present in the buffer.
If you want to force all line endings to CRLF, write as dos. The :w command is required (not :update or :wa because these only write if the buffer has not been modified, and no modification has occurred).
If you want to force all line endings to LF-only, write as unix.
If you want to force all line endings to CR-only, write as mac.

If all lines in a file end with LF-only, the file can be converted to use CRLF endings by reading as unix and writing as dos. However, if some lines end with CRLF, reading a file as unix will keep each CR in the buffer, and writing the file using any format will write each CR to the file, as if it were a normal character. When writing, line endings are added, so any CR characters that were in the original file, will be written in addition to line endings.

An LF-only file can also be converted to CRLF by reading as dos and writing as dos.

When reading a file as dos, if a CR followed by LF is encountered (CRLF), those two bytes are removed, and the preceding text is regarded as a line. Similarly, if LF is encountered, it is removed, and the preceding text is regarded as a line. However, if a CR is encountered (without a following LF), the CR will be regarded as a normal character and will be copied into the buffer where it will be displayed as ^M (Ctrl-M, the code for CR).

Results of incorrect file format detection

Suppose a file contains two lines:

Line 1
Line 2

When reading the file, if Vim does not correctly detect the file format, here is what you will see in the buffer (^J is Ctrl-J or LF; ^M is Ctrl-M or CR).

File has unix line endings; file read with ff=dos:

Line 1
Line 2

File has unix line endings; file read with ff=mac:

Line 1^JLine 2^J

File has dos line endings; file read with ff=unix:

Line 1^M
Line 2^M

File has dos line endings; file read with ff=mac:

Line 1
^JLine 2
^J

File has mac line endings; file read with ff=unix or ff=dos:

Line 1^MLine 2^M

Vim script problems

Under Unix, you may find that a Vim script does not work because you have downloaded a script that contains CR characters. Each CR displays as ^M and will cause some scripts to fail.

If you put, say, script.vim in your plugin directory, you may not see any useful error messages about CR characters when using Vim. You can source the script after starting Vim (for example, :source ~/.vim/plugin/script.vim) to see if errors including ^M are shown. To fix, you need to convert the file to unix format.

Pitfalls

Some suggestions for working with file formats suffer from pitfalls that are described here.

You are editing a file which you expect to be in unix format, yet you see many ^M characters. The following attempt to convert the file to unix format does not work:

`:setlocal ff=unix`	Set unix file format for current buffer.
`:w`	Write buffer to file.

The file was probably already detected as unix format, so the :set ff=unix command will do nothing (the problem is that the file uses dos format, but Vim read it as unix because at least one line had an LF-only ending). Furthermore, each ^M represents a CR character that is in the current buffer, and writing the buffer will write that CR to the file (not what you want).

You are editing a file which you expect to be in unix format, yet you see many ^M characters. You perform the following to convert it to unix format, then perform further edits:

`:e ++ff=dos`	Read file again in dos format, to accept both CRLF and LF-only line endings.
`:w ++ff=unix`	Write buffer to file using LF-only line endings.
...	Do some edits.
`:w`	Save the edits.

The first two steps above are correct, and the file will initially be written in unix format. However, the buffer is still marked as dos format, so the :w will overwrite the file using CRLF line endings. The :e ++ff=dos command tells Vim to read the file again, forcing dos file format. Vim will remove CRLF and LF-only line endings, leaving only the text of each line in the buffer. However, if you are going to edit the file, you need to use these commands:

`:e ++ff=dos`	Read file again in dos format.
`:setlocal ff=unix`	Mark buffer so LF-only line endings will be used when buffer is written.
...	Do some edits.
`:w`	Save the edits (will use LF-only line endings).

Here is a mistaken attempt to convert certain files from dos to unix format by starting Vim with commands to convert all *.c and *.h files in the current directory:

vim +"argdo setlocal ff=unix" +wqa *.c *.h

This will work if 'fileformats' includes dos and if the files have only CRLF line endings. However, if 'fileformats' includes both dos and unix, and if a file has at least one LF-only line ending, that file will be detected as unix, and any CR in the file will be shown in the buffer as ^M. The :setlocal ff=unix will not flag a unix file as modified, so the +wqa command (same as :xa) will not save that file. If :w is used to write the buffer, nothing useful will be achieved because the CR characters will be written to the file.

Other approaches

You may find a discussion of other techniques for handling line endings elsewhere. Some drawbacks of other procedures are mentioned here.

You can specify a file format for a particular file by inserting a modeline in that file. For example, in file my.c, you may put the following comment near the top or bottom of the file in an attempt to maintain dos line endings, regardless of what system is used to edit the file:

/* vim: set ff=dos: */

In general, using a modeline is useless in this context, although it may help if the file format is correctly detected when the file is read, because the next write will save the file in the preferred format specified in the modeline. However, the modeline does not avoid problems, and may make problems worse. For example, if file my.c has one or more lines that end with LF only, and the file is edited on a default Windows system, the file will be detected as having unix format, and the modeline will then change the format to dos, which will set the buffer modified flag. The buffer will display each CR as ^M. If you now save the file, each line will be written with a CRLF ending. However, the ^M characters that were visible in the buffer will be written to the file, so some lines will now end with CRCRLF (two CR characters).

Another unhelpful approach is to hide ^M characters which occur at the end of a line by highlighting them with the Ignore highlight group:

:match Ignore /\r$/

While this may be helpful as a quick workaround when viewing a file, in general, it is a misguided approach because the characters are hidden, but present, which will inevitably cause trouble when editing. In addition, it is much better to correctly handle the problem rather than temporarily hide it.

Tools

Several tools are available to convert files from one type of line ending to another. These need to be run at the command line, and are not related to Vim.

On Unix-based systems, the file utility can display what kind of line endings are present in a file. For example, file *.c will report what line terminators (CRLF, CR, LF) are present in each *.c file. The dos2unix utility can convert from dos or mac format to unix, and the unix2dos utility can convert from unix to dos format, optionally while preserving file timestamps.

Many other conversion tools are available. For example, flip can convert files between dos/mac/unix formats, and versions for each platform are available.

Comments

[broken-1] The :e command reads the current file again, using the ++ff=dos option so the read will omit all CRLF and LF-only line terminators (dos file format). Each ^M at the end of a line should disappear. Some older versions of Vim do not perform this step correctly and the ^M endings are not removed; upgrade Vim to fix. :help :e

[2] Use :setlocal (or :setl) to avoid changing the global default.

[failmix-3] This procedure will fail if a file has a mixture of dos and unix line endings because such files are detected as unix, and the CR characters are retained in the buffer.

[4] This example processes all *.c and *.h files in the current directory by setting the argument list to the wanted names. :help :args

[5] The :argdo command operates on each file in the argument list. For each file, it sets the buffer to use unix file format. That sets the modified flag for buffers that were detected as dos. The :update command writes the buffer if its modified flag is set.

[defect-6] A defect with this procedure is that all files are modified, even if no change was required. That is, a file is written even if it originally had all lines in the wanted format.

[7] The :argdo command operates on each file in the argument list. For each file, it sets the buffer to use unix file format, and writes the file (even if the file has not been marked as modified). An alternative would be to use :argdo w ++ff=unix which will write each file as unix, with the potential problem that the buffer will still be marked as using dos format (so if you later make a change and save it, the file will be written in dos format).

[A 1]

[A 2]

[B 1]

[B 2]

[B 3]

[C 1]

[C 2]