created March 1, 2004 · complexity intermediate · author Benjamin Peterson · version 7.0
CSV files (comma-separated values) are often used to save tables of data in plain text. Following are some useful techniques for working with CSV files. You can:
- Highlight all text in any column.
- View fields (convert csv text to columns or separate lines).
- Navigate using the HJKL keys to go left, down, up, right by cell.
- Search for text in a specific column.
Highlighting a column
It's easy to find a column in csv text if you source the following script. Enter a command like :Csv 23 to highlight column 23.
" Highlight a column in csv text. " :Csv 1 " highlight first column " :Csv 12 " highlight twelfth column " :Csv 0 " switch off highlight function! CSVH(colnr) if a:colnr > 1 let n = a:colnr - 1 execute 'match Keyword /^\([^,]*,\)\{'.n.'}\zs[^,]*/' execute 'normal! 0'.n.'f,' elseif a:colnr == 1 match Keyword /^[^,]*/ normal! 0 else match endif endfunction command! -nargs=1 Csv :call CSVH(<args>)
Viewing csv fields
The following commands can be entered to convert csv text to columns for easy viewing. Work on a temporary copy of your data because these commands will damage it!
" Convert csv text to columns (press u to undo). " Warning: This deletes ',' and crops wide columns. :let width = 20 :let fill = repeat(' ', width) :%s/\([^,]*\),\=/\=strpart(submatch(1).fill, 0, width)/ge :%s/\s\+$//ge
Alternatively, you can change each comma to a newline to put each field on its own line:
" Change CSV fields on current line to a list of separate items. :s/,/\r/g " Same, for all lines. :%s/,/\r/g
In the replace text of a substitute, \r substitutes a newline.
The simple script given above does not provide easy navigation, and it assumes that commas are only used as delimiters. The following code has more features, and it is presented as a plugin due to its length.
Features
- Fields are correctly highlighted, according to the CSV specification: quotes, commas inside quotes, quote inside quotes are all correctly processed.
- It does not go beyond the last column (from a count of the columns in the first and last three lines).
- HJKL go left, down, up, right by "cell". Focus is set to the first character of the cell.
- 0 and $ highlight the first and last cell. Focus is set to the first character of the cell.
- Ctrl-f, Ctrl-b page forward and back, while staying in the same column.
- \J is Join :help J and \K is Keyword :help K (assuming the default \ local leader key :help maplocalleader).
- The column number and heading (from the first line in the buffer) are displayed when moving around.
- Search within column. The command
:SC n=str
will search for str in the n-th column. If "n=" is omitted, the search is within the currently highlighted column. For example::SC 2=john " search for john in the 2nd column only
:SC john " search for john in the currently highlighted column
Case sensitivity is the same as for / (toggle with :set ic! ic? see :help 'ignorecase'). After the search, the @/ variable is set. So, for example, after :SC 2=john, one can use g//d to delete all lines whose second field contains john.
- Sort lines by column. The command
:Sort n
will sort by the n-th column in increasing order. If n is omitted, then sort by the current highlighted column. - Delete column. The command
:DC n
will delete the n-th column. If n is omitted, then delete the current highlighted column.
Usage
Create file ~/.vim/ftplugin/csv.vim (Unix) or $HOME/vimfiles/ftplugin/csv.vim (Windows) containing the script below.
Define file type detection for *.csv files (see file type detection for moin):
autocmd BufNewFile,BufRead *.csv setf csv
Open a file (named anything.csv) that contains fields separated by commas. Use H J K L 0 $ to move from cell to cell.
If you open a *.csv file, the current column will be highlighted. Conversely, you may open a file with csv data, yet the file is not named *.csv, so highlighting will not occur.
You can switch highlighting off by setting the filetype option to nothing:
:set ft=
This command switches highlighting on:
:set ft=csv
The code
This is the ftplugin csv.vim script:
" Filetype plugin for editing CSV files. " From http://vim.wikia.com/wiki/VimTip667 if v:version < 700 || exists("b:did_ftplugin") finish endif let b:did_ftplugin = 1 " Echo a warning message. function! s:Warn(msg) echohl WarningMsg echo a:msg echohl NONE endfunction " Get the number of columns (maximum of number in first and last three " lines; at least one of them should contain typical csv data). function! s:GetNumCols() let b:csv_max_col = 1 for l in [1, 2, 3, line('$') - 2, line('$') - 1, line('$')] " Determine number of columns by counting the (unescaped) commas. " Note: The regexp may also return unbalanced ", so filter out anything " which isn't a comma in the second pass. let c = strlen(substitute(substitute(getline(l), '\%(\%("\%([^"]\|""\)*"\)\|\%([^,"]*\)\)', '', 'g'), '"', '', 'g')) + 1 if b:csv_max_col < c let b:csv_max_col = c endif endfor if b:csv_max_col <= 1 let b:csv_max_col = 10000 call s:Warn("No comma-separated columns were detected.") endif return b:csv_max_col endfunction " Return regex to find the n-th column. function! s:GetExpr(colnr) if a:colnr > 1 return '^\%(\%(\%("\%([^"]\|""\)*"\)\|\%([^,"]*\)\),\)\{' . (a:colnr - 1) . '}\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)' else return '^\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)' endif endfunction " Extract and echo the column header on the status line. function! s:PrintColInfo(colnr) let colHeading = substitute(matchstr(getline(1), s:GetExpr(a:colnr)), '^\s*\(.*\)\s*$', '\1', '') let info = 'Column ' . a:colnr if empty(colHeading) echo info else echon info . ': ' echohl Keyword " Limit length to avoid "Hit ENTER" prompt. echon strpart(colHeading, 0, (&columns / 2)) . (len(colHeading) > (&columns / 2) ? "..." : "") echohl NONE endif endfunction " Highlight n-th column (if n > 0). " Remove previous highlight match (ignore error if none). " matchadd() priority -1 means 'hlsearch' will override the match. function! s:Highlight(colnr) silent! call matchdelete(b:csv_match) if a:colnr > 0 if exists("*matchadd") let b:csv_match = matchadd('Keyword', s:GetExpr(a:colnr), -1) else execute '2match Keyword /' . s:GetExpr(a:colnr) .'/' endif call s:Focus_Col(a:colnr) endif endfunction " Focus the cursor on the n-th column of the current line. function! s:Focus_Col(colnr) normal! 0 call search(s:GetExpr(a:colnr), '', line('.')) call s:PrintColInfo(a:colnr) endfunction " Highlight next column. function! s:HighlightNextCol() if b:csv_column < b:csv_max_col let b:csv_column += 1 endif call s:Highlight(b:csv_column) endfunction " Highlight previous column. function! s:HighlightPrevCol() if b:csv_column > 1 let b:csv_column -= 1 endif call s:Highlight(b:csv_column) endfunction " Wrapping would distort the column-based layout. " Lines must not be broken when typed. setlocal nowrap textwidth=0 " Undo the stuff we changed. let b:undo_ftplugin = "setlocal wrap< textwidth<" \ . "|if exists('*matchdelete')|call matchdelete(b:csv_match)|else|2match none|endif" \ . "|sil! exe 'nunmap <buffer> H'" \ . "|sil! exe 'nunmap <buffer> L'" \ . "|sil! exe 'nunmap <buffer> J'" \ . "|sil! exe 'nunmap <buffer> K'" \ . "|sil! exe 'nunmap <buffer> <C-f>'" \ . "|sil! exe 'nunmap <buffer> <C-b>'" \ . "|sil! exe 'nunmap <buffer> 0'" \ . "|sil! exe 'nunmap <buffer> $'" \ . "|sil exe 'augroup csv' . bufnr('')" \ . "|sil exe 'au!'" \ . "|sil exe 'augroup END'" call s:GetNumCols() " Highlight the first column, but not if reloading or resetting filetype. if ! exists('b:csv_column') let b:csv_column = 1 endif " Following highlights column if set filetype manually " (BufEnter will also do it if filetype is set during load). silent call <SID>Highlight(b:csv_column) " Compare two lines based on the highlighted column. function! s:CompareLines(line1, line2) let col1 = matchstr(a:line1, s:GetExpr(b:csv_column_sort)) let col2 = matchstr(a:line2, s:GetExpr(b:csv_column_sort)) return col1 > col2 endfunction " Sort the n-th column, the highlighted one by default. function! s:SortCol(colnr) if a:colnr == "" let b:csv_column_sort = b:csv_column else let b:csv_column_sort = str2nr(a:colnr) endif if b:csv_column_sort < 1 || b:csv_column_sort > b:csv_max_col call s:Warn("column number out of range.") endif call setline(2, sort(getline(2, '$'), function("s:CompareLines"))) endfunction command! -buffer -nargs=? Sort execute <SID>SortCol("<args>") " Delete the n-th column, the highlighted one by default. function! s:DeleteCol(colnr) if a:colnr == "" let col = b:csv_column else let col = str2nr(a:colnr) endif if col < 1 || col > b:csv_max_col call s:Warn("column number out of range.") endif let regex = substitute(s:GetExpr(col), '"\\zs', '\\zs"', '') let regex = substitute(regex, '\\ze"', '"\\ze', '') let regex = substitute(regex, '\\ze', ',\\?\\ze', 'g') exec 'global/'.regex.'/s///' endfunction command! -buffer -nargs=? DC execute <SID>DeleteCol("<args>") " Search the n-th column. Argument in n=regex form where n is the column " number, and regex is the expression to use. If "n=" is omitted, then " use the current highlighted column. function! s:SearchCol(colexp) let regex = '\%(\([1-9][0-9]*\)=\)\?\(.*\)' let colstr = substitute(a:colexp, regex, '\1', '') let target = substitute(a:colexp, regex, '\2', '') if colstr == "" let col = b:csv_column else let col = str2nr(substitute(a:colexp, '=.*', '', '')) if col < 1 || col > b:csv_max_col call s:Warn("column number out of range") endif endif if col == 1 let @/ = '^\%(\%("\%([^,]\|""\)*\zs'.target.'\ze\%([^,]\|""\)*"\)\|\%([^,"]*\zs'.target.'\ze[^,"]*\)\)' else let @/ = '^\%(\%(\%("\%([^"]\|""\)*"\)\|\%([^,"]*\)\),\)\{' . (col-1) . '}\%(\%("\%([^,]\|""\)*\zs'.target.'\ze\%([^,]\|""\)*"\)\|\%([^,"]*\zs'.target.'\ze[^,"]*\)\)' endif "-- execute 'match Search /\c' . @/ . '/' endfunction " Use :SC n=string<CR> to search for string in the n-th column command! -buffer -nargs=1 SC execute <SID>SearchCol("<args>")|normal n nnoremap <silent> <buffer> H :call <SID>HighlightPrevCol()<CR> nnoremap <silent> <buffer> L :call <SID>HighlightNextCol()<CR> nnoremap <silent> <buffer> J <Down>:call <SID>Focus_Col(b:csv_column)<CR> nnoremap <silent> <buffer> K <Up>:call <SID>Focus_Col(b:csv_column)<CR> nnoremap <silent> <buffer> <C-f> <PageDown>:call <SID>Focus_Col(b:csv_column)<CR> nnoremap <silent> <buffer> <C-b> <PageUp>:call <SID>Focus_Col(b:csv_column)<CR> nnoremap <silent> <buffer> <C-d> <C-d>:call <SID>Focus_Col(b:csv_column)<CR> nnoremap <silent> <buffer> <C-u> <C-u>:call <SID>Focus_Col(b:csv_column)<CR> nnoremap <silent> <buffer> 0 :let b:csv_column=1<CR>:call <SID>Highlight(b:csv_column)<CR> nnoremap <silent> <buffer> $ :let b:csv_column=b:csv_max_col<CR>:call <SID>Highlight(b:csv_column)<CR> nnoremap <silent> <buffer> gm :call <SID>Focus_Col(b:csv_column)<CR> nnoremap <silent> <buffer> <LocalLeader>J J nnoremap <silent> <buffer> <LocalLeader>K K " The column highlighting is window-local, not buffer-local, so it can persist " even when the filetype is undone or the buffer changed. execute 'augroup csv' . bufnr('') autocmd! " These events only highlight in the current window. " Note: Highlighting gets slightly confused if the same buffer is present in " two split windows next to each other, because then the events aren't fired. autocmd BufLeave <buffer> silent call <SID>Highlight(0) autocmd BufEnter <buffer> silent call <SID>Highlight(b:csv_column) augroup END
Explanation of the regular expression
The above code is fairly easy to understand, except perhaps for the regular expressions. The following regex (inspired by RegExLib 1520) is used several times:
\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)
It is explained as follows:
1. \%( # unbackref'ed grouping for one CSV field 2. \%( # first possibility of a field 3. " # starts and ends with " 4. \zs # beginning of matched string 5. \%( 6. [^"]\|"" # anything not ", or a "" (escaped quote) 7. \)* 8. \ze # end of matched string 9. " 10. \) # end first possibility of field 11. \| # or 12. \%( # second possibility 13. \zs 14. [^,"]* # any thing but " and comma 15. \ze 16. \) 17. \)
Working with Excel xls files
One can use the xls2csv Perl script to convert Excel files to CSV, and then view/edit with Vim.
Wish List / Ideas
The header line (line 1) should be differently highlighted.
A future unified interface could be the following:
The regular expressions needed to define the syntax. All dots are literal. Columns are numbered from 1. zero means before the first. col=0|^|\d|$|. # 0-th, 0-th, \d-th, last, current cols=((col|col-col),)*(col|col-col) # a list of column, or column ranges flags=[c|C][a|d] # flag used in sorting sortcol=column[flags]* sortcols=(sortcol.)*sortcol Future commands can be as follows. All commands are two-letter, starting with c or r. The space after the two letter command should be optional. :cd cols? # delete columns cols or delete the current column if empty # change directory if argument are not columns :c/ (cols=)?pattern/? # search forward in columns cols (use current if empty), for pattern :c? (cols=)?pattern/? # search backward in columns cols (use current if empty), for pattern :cx cols # rearrange (eXchange) the columns specified, so that the order after # rearrangement is as specified. Example: after # :cx 6,1,2-4 # columns 1-6 should become columns 6,1,5,2-4 :cm cols col? # move the columns cols to col (use current if empty) # equivalent to :cx col cols :ca col? # append an empty column at col (use current if empty) :ci col? # insert an empty column at col (use current if empty) :cy cols? # yank (copy) the columns cols (use current if empty) # Note: in implementation, only remember the cols for future paste :cp col? # paste (append) the previous yanked columns to col (use current if empty) # clear the yanked list :cP col? # Paste (insert) the previous yanked columns to col (use current if empty) # clear the yanked list :cs sortcols? # sort rows by the columns specified, in that order. c|C denotes # case sensitivity, a|d denotes direction (ascending/decending) # sort current column by default # Note: use string concatenation for implementation # Note: should we use :rs instead? :rh rows? # highlight the given rows (use current if emtpy) :rH # turn of row highlighting
Comments
Please see the comments on the talk page.