Vim Tips Wiki
Register
Advertisement

Previous TipNext Tip

Tip: #1074 - Detect encoding from the charset specified in HTML files

Created: December 9, 2005 22:41 Complexity: advanced Author: Wu Yongwei Version: 6.0 Karma: 3/3 Imported from: Tip#1074

If one needs to edit files encoded in multiple legacy encodings, then the Vim fileencodings option cannot help much. Some hacks can be used to put the file encoding in the file (see Tip #911). However, in the case of HTML files, the encoding information is often in the HTML file already, esp. for non-Latin1 Web pages, i.e.:


<meta http-equiv="Content-Type" content="text/html; charset=gb2312">


The following code can be put in _vimrc to detect and use such encoding specification:



code begins -----

if has('autocmd')

function! ConvertHtmlEncoding(encoding) 
if a:encoding ==? 'gb2312' 
return 'cp936' " GB2312 imprecisely means CP936 in HTML 
elseif a:encoding ==? 'iso-8859-1' 
return 'latin1' " The canonical encoding name in Vim 
elseif a:encoding ==? 'utf8' 
return 'utf-8' " Other encoding aliases should follow here 
else 
return a:encoding 
endif 
endfunction 


function! DetectHtmlEncoding() 
if &filetype != 'html' 
return 
endif 
normal m` 
normal gg 
if search('\c<meta http-equiv=\("\?\)Content-Type\1 content="text/html; charset=[-A-Za-z0-9_]\+">') != 0 
let reg_bak=@" 
normal y$ 
let charset=matchstr(@", 'text/html; charset=\zs[-A-Za-z0-9_]\+') 
let charset=ConvertHtmlEncoding(charset) 
normal `` 
let @"=reg_bak 
if &fileencodings ==  
let auto_encodings=',' . &encoding . ',' 
else 
let auto_encodings=',' . &fileencodings . ',' 
endif 
if charset !=? &fileencoding && 
\auto_encodings =~ ',' . &fileencoding . ',' 
silent! exec 'e ++enc=' . charset 
endif 
else 
normal `` 
endif 
endfunction 


" Detect charset encoding in an HTML file 
au BufReadPost *.htm* nested call DetectHtmlEncoding() 

code ends -----


Please notice that the nested autocommand is used to ensure the syntax highlighting is OK and the remembered cursor position is still kept.


It is recommended to use `set encoding=utf-8' in order to ensure successful encoding conversion.

Comments

Remember the final 'endif'...

wolcendo--AT--friko2.onet.pl , December 21, 2005 3:32


Advertisement