Wikia

Vim Tips Wiki

Bash file encoding alias

Talk0
1,612pages on
this wiki
Revision as of 06:42, July 13, 2012 by JohnBot (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Tip 1641 Printable Monobook Previous Next

created 2009 · complexity basic · author Richard bw · version 7.0


Vim can be used to detect the file encoding used in a particular file (for example, utf-8, utf-16le, or latin1). This tip shows an alias to invoke Vim with suitable arguments to check the encoding used for a specified file.

Bash shell aliasEdit

Here is a simple alias for the Bash shell to display what Vim thinks is a file's encoding:

alias vimenc='vim -c '\''let $enc = &fileencoding | execute "!echo Encoding:  $enc" | q'\'''

This saves having to open Vim, then open the file and check the file encoding, and exit. The alias requires a file as a parameter.

Usage examples:

$ vimenc UTF-16.xml
Encoding: utf-16le
Press ENTER or type command to continue

$ vimenc ISO-8859-1.xml
Encoding: latin1
Press ENTER or type command to continue

ExplanationEdit

When an existing file is read, Vim tries to interpret the bytes in the file as characters using each encoding specified in the 'fileencodings' option. The first encoding that produces no conversion error is used, and that encoding is reported as the file encoding by the alias shown above.

After using the global 'fileencodings' option to determine the file encoding, Vim stores the result in the buffer local option 'fileencoding' (the first option is plural, ending with an 's'; the second option is singular). In the alias, the let $enc = &fileencoding statement assigns the value of 'fileencoding' to an environment variable named enc (the '$' tells Vim to set an environment variable which is displayed by the echo command of the shell). :help expr-option :help :let-environment :help :!cmd

When the 'encoding' option is set to a Unicode value such as utf-8, the default for 'fileencodings' is "ucs-bom,utf-8,default,latin1" which will check, in order:

  1. Presence of a Unicode BOM.
  2. UTF-8.
  3. System locale's default character set.
  4. Latin1 (which will always work).

See alsoEdit

CommentsEdit

Around Wikia's network

Random Wiki