Vim Tips Wiki
Line 86: Line 86:
   
 
Thanks
 
Thanks
-- marcmontu - 12:37, Friday, July 13, 2012 (UTC
+
-- marcmontu - 12:37, Friday, July 13, 2012 (UTC)

Revision as of 12:38, 13 July 2012

Tip 1675 Printable Monobook Previous Next

created August 31, 2011 · complexity basic · author Marcmontu · version 7.0


Characters such as 'á' or 'ç' (with diacritics) may be included in code comments and are successfully processed by many tools. However some tools do not work with these characters.

Instead of changing the tools, it is common to remove the diacritical signs (e.g. replace á with a, and ç with c). This tip provides a script to do the necessary substitutions in a single step.

Script

Create file ~/.vim/plugin/diacritics.vim (Unix) or $HOME/vimfiles/plugin/diacritics.vim (Windows) containing one of the scripts below, then restart Vim. Alternatively, add one of the scripts to your vimrc and restart Vim.

The following script uses Vim's tr() to translate characters with diacritics to characters without. It loads the buffer into a variable in memory and translates all characters in one operation, so it is efficient. However, there is no opportunity to review changes as they occur.

" Remove diacritical signs from characters in specified range of lines.
" Examples of characters replaced: á -> a, ç -> c, Á -> A, Ç -> C.
function! s:RemoveDiacritics(line1, line2)
  let diacs = 'áâãàçéêíóôõüú'  " lowercase diacritical signs
  let repls = 'aaaaceeiooouu'  " corresponding replacements
  let diacs .= toupper(diacs)
  let repls .= toupper(repls)
  let all = join(getline(a:line1, a:line2), "\n")
  call setline(a:line1, split(tr(all, diacs, repls), "\n"))
endfunction
command! -range=% RemoveDiacritics call s:RemoveDiacritics(<line1>, <line2>)

The following alternative script uses :s to search and replace, with an opportunity to confirm each change so it can be reviewed (or press a to proceed with all changes). The substitute uses a replacement expression (\=) to look up the translation character in a dictionary.

" Remove diacritical signs from characters in specified range of lines.
" Examples of characters replaced: á -> a, ç -> c, Á -> A, Ç -> C.
" Uses substitute so changes can be confirmed.
function! s:RemoveDiacritics(line1, line2)
  let diacs = 'áâãàçéêíóôõüú'  " lowercase diacritical signs
  let repls = 'aaaaceeiooouu'  " corresponding replacements
  let diacs .= toupper(diacs)
  let repls .= toupper(repls)
  let diaclist = split(diacs, '\zs')
  let repllist = split(repls, '\zs')
  let trans = {}
  for i in range(len(diaclist))
    let trans[diaclist[i]] = repllist[i]
  endfor
  execute a:line1.','.a:line2 . 's/['.diacs.']/\=trans[submatch(0)]/gIce'
endfunction
command! -range=% RemoveDiacritics call s:RemoveDiacritics(<line1>, <line2>)

Each alternative script above defines the :RemoveDiacritics command, and the command accepts a range which defaults to the whole buffer. Some examples follow (type the first couple of letters of the command then press Tab for command completion, or press the up arrow for command history):

:RemoveDiacritics               " whole buffer
:.RemoveDiacritics              " current line
:'<,'>RemoveDiacritics          " last selected range of lines

See ranges for more information.

References

Comments

Sorry for being away for so long.

Fritzophrenic, Chrisbra, JohnBot and JohnBeckett: thank you very much for improving the script and the tip!

@Fritzophrenic, thanks for suggesting the command, I started using this approach for similar tasks. I wasn't aware of the problems in multibyte encoding, but for sure it is better to make it robust. In my opinion it is easier to maintain the diacritic signs and its replacements in a single string (easier to type, line is shorten so it is less likely to added a new diacritic char and forgetting to add its replacement), so I agree with Chrisbra, approach:

let diacCharsList=split(diacChars, '\zs')

I didn't understand the changes to the References section, but that is probably because I'm new to wiki editing. I'd be glad if someone can explain that to me :)

Here are the original intentions:

1) http://en.wikipedia.org/wiki/Diacritic As English is not my first language, I spent some time to find a keyword when I was searching for a way to remove the diacritical signs. When I posted the tip I thought that if it already existed and I've found it, my first reaction would be "what is diacritical??" - and probably my first guess would be that it is Vim parlance.

2) http://stackoverflow.com/questions/765894/can-i-substitute-multiple-items-in-a-single-regular-expression-in-vim-or-perl After I've give up searching for an existing way of replacing the diacritic characters and decided to create the script, I spent some time thinking on how to write it. I had the idea of performing all the changes with a single :s, but was unable to figure out how. Therefore I copied it from another person, and referencing that page was intending to given him the credit for the implementation.

I thought that it could also be useful to understand the implementation. The comment line

"exe ":%s/[ãáâ]/\={'ã':'a','á':'a','â':'a'}[submatch(0)]/gIc"

was also with the purpose of explaining the unusual line for someone attempting to change/improve it.

Thanks -- marcmontu - 12:37, Friday, July 13, 2012 (UTC)