History
Article Edit this page Discussion

Converting LANG to UTF-8

From Vim Tips Wiki

Jump to: navigation, search

Tip 708 Previous Next Created: April 29, 2004 Complexity: basic Author: Grant Bowman Version: 6.0


On my system I converted from a single byte character set (any of ISO-8859-15 type sets) to use a variable multi-byte UTF-8 encoding. When I did so my mappings that used to work that were set in my vimrc were wrong because my vimrc was written to assume ISO-8859-1/latin1. The LANG environment variable set during user login tells GNU libc6 and most programs written for Unix to use a different character encoding by default. My new setting of LANG=en_US.UTF-8 incorrectly made Vim assume that my vimrc was also written in UTF-8 and stored as if fileencoding=utf-8. This was a problem for <M-k> meta key bindings. Also any character code above 127 in UTF-8 is represented by two bytes instead of only one, so any characters above 127 will be misinterpreted after converting. A quick solution to make your old file work exactly as intended is to wrap your vimrc at the top and bottom with 'encoding' commands like this:

set encoding=iso-8859-1
[bulk of vimrc file]
set encoding=utf-8

This allows the keys to be correctly assigned as intended when the vimrc was created. In my case this was before I changed my LANG setting.

In addition to using a new LANG environment variable set in ~/.bashrc (Vim correctly reads it and changes to :set encoding=utf-8) I have also set fileencodings=iso-8859-1 in vimrc so that it matches the system-default locale setting of libc6. This is so that all old (and new) files on my disk match up with what is expected by the rest of my system. Vim will automatically do a file conversion upon reading and writing each file. This seems safe but more testing is required. The best reference I found for these issues is http://www.cl.cam.ac.uk/~mgk25/unicode.html

Related vimtips include VimTip246 VimTip546 and VimTip576.

[edit] Comments

Wouldn't it be simpler to just add the line

scriptencoding latin1

at the top of your vimrc? (see :help :scriptencoding). This would tell Vim how your script was encoded so it could read it appropriately, without messing up with 'encoding' which affects things all over Vim. (And since this particular line is actually in 7-bit ASCII it is encoded identically in UTF-8 and in most 8-bit encodings including latin1, so no risk of confusion there, except for systems based on EBCDIC maybe.)


Rate this article:

Share this article:

Hubs Highlights International Sites Wikia messages
Entertainment
Gaming
Cartoons & Comics
Science Fiction
Hobbies
Sports
See all...
Grand Theft Auto Wiki
Doctor Who
Legend of Zelda Wiki
Terminator Wiki
Everquest II Wiki
Mystery Science Theater 3000
German
Spanish
Chinese
Japanese
More...
Wikia is hiring for several open positions
Send this article to a friend
"Converting LANG to UTF-8"
 
 
Hi!

I thought you'd like this page from Wikia!

http://vim.wikia.com

Come check it out!
Send confirmation