Wikia

Vim Tips Wiki

Tlgrok/Map Unicode character to script (language) name

Talk0
1,613pages on
this wiki

< User:Tlgrok

Revision as of 12:54, January 27, 2009 by Tlgrok (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This is a draft, quarantined as a subpage of my own userpage until (and if) it is ready to face the world. Feel free to edit it.


The function below maps a Unicode character code to the name of the script to which it belongs. It might prove useful for all sorts of things.

" This function deals with all Unicode Modern Scripts. It may easily be
" extended to deal with other scripts.
"
" Codes are taken from
" http://en.wikipedia.org/wiki/Summary_of_Unicode_character_assignments#Modern_Scripts
" .

function! UnicodeCodeToScriptName (code)
    if (a:code>=1536 && a:code<=1791) | return "Arabic" | endif
    if (a:code>=1872 && a:code<=1919) | return "Arabic Supplement" | endif
    if (a:code>=1328 && a:code<=1423) | return "Armenian" | endif
    if (a:code>=6912 && a:code<=7039) | return "Balinese" | endif
    if (a:code>=2432 && a:code<=2559) | return "Bengali" | endif
    if (a:code>=12544 && a:code<=12591) | return "Bopomofo" | endif
    if (a:code>=12704 && a:code<=12735) | return "Bopomofo Extended" | endif
    if (a:code>=6656 && a:code<=6687) | return "Buginese" | endif
    if (a:code>=5952 && a:code<=5983) | return "Buhid" | endif
    if (a:code>=5024 && a:code<=5119) | return "Cherokee" | endif
    if (a:code>=11392 && a:code<=11519) | return "Coptic" | endif
    if (a:code>=880 && a:code<=1023) | return "Coptic, Greek and" | endif
    if (a:code>=1024 && a:code<=1279) | return "Cyrillic" | endif
    if (a:code>=1280 && a:code<=1327) | return "Cyrillic Supplement" | endif
    if (a:code>=2304 && a:code<=2431) | return "Devanagari" | endif
    if (a:code>=4608 && a:code<=4991) | return "Ethiopic" | endif
    if (a:code>=11648 && a:code<=11743) | return "Ethiopic Extended" | endif
    if (a:code>=4992 && a:code<=5023) | return "Ethiopic Supplement" | endif
    if (a:code>=4256 && a:code<=4351) | return "Georgian" | endif
    if (a:code>=11520 && a:code<=11567) | return "Georgian Supplement" | endif
    if (a:code>=11264 && a:code<=11359) | return "Glagolitic" | endif
    if (a:code>=880 && a:code<=1023) | return "Greek and Coptic" | endif
    if (a:code>=7936 && a:code<=8191) | return "Greek Extended" | endif
    if (a:code>=2688 && a:code<=2815) | return "Gujarati" | endif
    if (a:code>=2560 && a:code<=2687) | return "Gurmukhi" | endif
    if (a:code>=4352 && a:code<=4607) | return "Hangul Jamo" | endif
    if (a:code>=44032 && a:code<=55215) | return "Hangul Syllables" | endif
    if (a:code>=5920 && a:code<=5951) | return "Hanunoo" | endif
    if (a:code>=1424 && a:code<=1535) | return "Hebrew" | endif
    if (a:code>=12448 && a:code<=12543) | return "Katakana" | endif
    if (a:code>=12784 && a:code<=12799) | return "Katakana Phonetic Extensions" | endif
    if (a:code>=12352 && a:code<=12447) | return "Hiragana" | endif
    if (a:code>=12688 && a:code<=12703) | return "Kanbun" | endif
    if (a:code>=3200 && a:code<=3327) | return "Kannada" | endif
    if (a:code>=6016 && a:code<=6143) | return "Khmer" | endif
    if (a:code>=6624 && a:code<=6655) | return "Khmer Symbols" | endif
    if (a:code>=3712 && a:code<=3839) | return "Lao" | endif
    if (a:code>=0 && a:code<=127) | return "Latin, Basic" | endif
    if (a:code>=128 && a:code<=255) | return "Latin-1 Supplement" | endif
    if (a:code>=7680 && a:code<=7935) | return "Latin Extended Additional" | endif
    if (a:code>=256 && a:code<=383) | return "Latin Extended-A" | endif
    if (a:code>=384 && a:code<=591) | return "Latin Extended-B" | endif
    if (a:code>=11360 && a:code<=11391) | return "Latin Extended-C" | endif
    if (a:code>=42784 && a:code<=43007) | return "Latin Extended-D" | endif
    if (a:code>=6400 && a:code<=6479) | return "Limbu" | endif
    if (a:code>=3328 && a:code<=3455) | return "Malayalam" | endif
    if (a:code>=6144 && a:code<=6319) | return "Mongolian" | endif
    if (a:code>=4096 && a:code<=4255) | return "Myanmar" | endif
    if (a:code>=6528 && a:code<=6623) | return "New Tai Lue" | endif
    if (a:code>=1984 && a:code<=2047) | return "NKo" | endif
    if (a:code>=5760 && a:code<=5791) | return "Ogham" | endif
    if (a:code>=2816 && a:code<=2943) | return "Oriya" | endif
    if (a:code>=43072 && a:code<=43135) | return "Phags-pa" | endif
    if (a:code>=5792 && a:code<=5887) | return "Runic" | endif
    if (a:code>=3456 && a:code<=3583) | return "Sinhala" | endif
    if (a:code>=43008 && a:code<=43055) | return "Syloti Nagri" | endif
    if (a:code>=1792 && a:code<=1871) | return "Syriac" | endif
    if (a:code>=5888 && a:code<=5919) | return "Tagalog" | endif
    if (a:code>=5984 && a:code<=6015) | return "Tagbanwa" | endif
    if (a:code>=6480 && a:code<=6527) | return "Tai Le" | endif
    if (a:code>=2944 && a:code<=3071) | return "Tamil" | endif
    if (a:code>=3072 && a:code<=3199) | return "Telugu" | endif
    if (a:code>=1920 && a:code<=1983) | return "Thaana" | endif
    if (a:code>=3584 && a:code<=3711) | return "Thai" | endif
    if (a:code>=3840 && a:code<=4095) | return "Tibetan" | endif
    if (a:code>=11568 && a:code<=11647) | return "Tifinagh" | endif
    if (a:code>=5120 && a:code<=5759) | return "Unified Canadian Aboriginal Syllabics" | endif
    if (a:code>=42128 && a:code<=42191) | return "Yi Radicals" | endif
    if (a:code>=40960 && a:code<=42127) | return "Yi Syllables" | endif
    " Unknown script
    return ""
endfunction

Comments Edit

It there a Vim script "switch" statement I am unaware of? tlgrok 23:19, 26 January 2009 (UTC)

It is also possible to write a function which gets a string and uses char2nr() to get the character code of the first letter in it. tlgrok 12:54, 27 January 2009 (UTC)

Around Wikia's network

Random Wiki