Converting LANG to UTF-8

Please review this tip:

This tip was imported from vim.org and needs general review.
You might clean up comments or merge similar tips.
Add suitable categories so people can find the tip.
Please avoid the discussion page (use the Comments section below for notes).
If the tip contains good advice for current Vim, remove the {{review}} line.

Tip 708 Printable Monobook Previous Next

created April 29, 2004 · complexity basic · author Grant Bowman · version 6.0

On my system I converted from a single byte character set (any of ISO-8859-15 type sets) to use a variable multi-byte UTF-8 encoding. When I did so my mappings that used to work that were set in my vimrc were wrong because my vimrc was written to assume ISO-8859-1/latin1. The LANG environment variable set during user login tells GNU libc6 and most programs written for Unix to use a different character encoding by default. My new setting of LANG=en_US.UTF-8 incorrectly made Vim assume that my ~/.vimrc was also written in UTF-8 and stored as if fileencoding=utf-8. This was a problem for <M-k> meta key bindings. Also any character code above 127 in UTF-8 is represented by two bytes instead of only one, so any characters above 127 will be misinterpreted after converting. A quick solution to make your old file work exactly as intended is to wrap your ~/.vimrc at the top and bottom with 'encoding' commands like this:

set encoding=iso-8859-1
[bulk of vimrc file]
set encoding=utf-8

This allows the keys to be correctly assigned as intended when the vimrc was created. In my case this was before I changed my LANG setting.

In addition to using a new LANG environment variable set in ~/.bashrc (Vim correctly reads it and changes to :set encoding=utf-8) I have also set fileencodings=iso-8859-1 in vimrc so that it matches the system-default locale setting of libc6. This is so that all old (and new) files on my disk match up with what is expected by the rest of my system. Vim will automatically do a file conversion upon reading and writing each file. This seems safe but more testing is required. The best reference I found for these issues is http://www.cl.cam.ac.uk/~mgk25/unicode.html

Related vimtips include VimTip246 VimTip546 and VimTip576.

Comments

Wouldn't it be simpler to just add the line

scriptencoding latin1

at the top of your vimrc? (see :help :scriptencoding). This would tell Vim how your script was encoded so it could read it appropriately, without messing up with 'encoding' which affects things all over Vim. (And since this particular line is actually in 7-bit ASCII it is encoded identically in UTF-8 and in most 8-bit encodings including latin1, so no risk of confusion there, except for systems based on EBCDIC maybe.)

Converting LANG to UTF-8

Comments

Fan Feed