Vim Tips Wiki
Advertisement
Tip 667 Printable Monobook Previous Next

created March 1, 2004 · complexity intermediate · author Benjamin Peterson · version 7.0


CSV files (comma-separated values) are often used to save tables of data in plain text. Following are some useful techniques for working with CSV files. You can:

  • Highlight all text in any column.
  • View fields (convert csv text to columns or separate lines).
  • Navigate using the HJKL keys to go left, down, up, right by cell.
  • Search for text in a specific column.

Highlighting a column

It's easy to find a column in csv text if you source the following script. Enter a command like :Csv 23 to highlight column 23.

" Highlight a column in csv text.
" :Csv 1    " highlight first column
" :Csv 12   " highlight twelfth column
" :Csv 0    " switch off highlight
function! CSVH(colnr)
  if a:colnr > 1
    let n = a:colnr - 1
    execute 'match Keyword /^\([^,]*,\)\{'.n.'}\zs[^,]*/'
    execute 'normal! 0'.n.'f,'
  elseif a:colnr == 1
    match Keyword /^[^,]*/
    normal! 0
  else
    match
  endif
endfunction
command! -nargs=1 Csv :call CSVH(<args>)

Highlighting the header row

If the first row is a header row, you can highlight the header fields by putting the following lines in the file syntax/csv.vim

syntax match csvHeading /\%1l\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)/
highlight def link csvHeading Type

Viewing csv fields

The following commands can be entered to convert csv text to columns for easy viewing. Work on a temporary copy of your data because these commands will damage it!

" Convert csv text to columns (press u to undo).
" Warning: This deletes ',' and crops wide columns.
:let width = 20
:let fill = repeat(' ', width)
:%s/\([^,]*\),\=/\=strpart(submatch(1).fill, 0, width)/ge
:%s/\s\+$//ge

Alternatively, you can change each comma to a newline to put each field on its own line:

" Change CSV fields on current line to a list of separate items.
:s/,/\r/g

" Same, for all lines.
:%s/,/\r/g

In the replace text of a substitute, \r substitutes a newline.

Navigating in csv text

The simple script given above does not provide easy navigation, and it assumes that commas are only used as delimiters. The following code has more features, and it is presented as a plugin due to its length.

Features

  • Fields are correctly highlighted, according to the CSV specification: quotes, commas inside quotes, quote inside quotes are all correctly processed.
  • It does not go beyond the last column (from a count of the columns in the first and last three lines).
  • HJKL go left, down, up, right by "cell". Focus is set to the first character of the cell.
  • 0 and $ highlight the first and last cell. Focus is set to the first character of the cell.
  • Ctrl-f, Ctrl-b page forward and back, while staying in the same column.
  • \J is Join :help J and \K is Keyword :help K (assuming the default \ local leader key :help maplocalleader).
  • The column number and heading (from the first line in the buffer) are displayed when moving around.
  • Search within column. The command
    :SC n=str
    will search for str in the n-th column. If "n=" is omitted, the search is within the currently highlighted column. For example:
    :SC 2=john  " search for john in the 2nd column only
    :SC john    " search for john in the currently highlighted column

Case sensitivity is the same as for / (toggle with :set ic! ic? see :help 'ignorecase'). After the search, the @/ variable is set. So, for example, after :SC 2=john, one can use g//d to delete all lines whose second field contains john.

  • Sort lines by column. The command
    :Sort n
    will sort by the n-th column in increasing order. If n is omitted, then sort by the current highlighted column.
  • Delete column. The command
    :DC n
    will delete the n-th column. If n is omitted, then delete the current highlighted column.

Usage

Create file ~/.vim/ftplugin/csv.vim (Unix) or $HOME/vimfiles/ftplugin/csv.vim (Windows) containing the script below.

Define file type detection for *.csv files (see file type detection for moin):

autocmd BufNewFile,BufRead *.csv setf csv

Open a file (named anything.csv) that contains fields separated by commas. Use H J K L 0 $ to move from cell to cell.

If you open a *.csv file, the current column will be highlighted. Conversely, you may open a file with csv data, yet the file is not named *.csv, so highlighting will not occur.

You can switch highlighting off by setting the filetype option to nothing:

:set ft=

This command switches highlighting on:

:set ft=csv

The code

This is the ftplugin csv.vim script:

" Filetype plugin for editing CSV files.
" From http://vim.wikia.com/wiki/VimTip667
if v:version < 700 || exists("b:did_ftplugin")
  finish
endif
let b:did_ftplugin = 1

" Echo a warning message.
function! s:Warn(msg)
  echohl WarningMsg
  echo a:msg
  echohl NONE
endfunction

" Get the number of columns (maximum of number in first and last three
" lines; at least one of them should contain typical csv data).
function! s:GetNumCols()
  let b:csv_max_col = 1
  for l in [1, 2, 3, line('$') - 2, line('$') - 1, line('$')]
    " Determine number of columns by counting the (unescaped) commas.
    " Note: The regexp may also return unbalanced ", so filter out anything
    " which isn't a comma in the second pass.
    let c = strlen(substitute(substitute(getline(l), '\%(\%("\%([^"]\|""\)*"\)\|\%([^,"]*\)\)', '', 'g'), '"', '', 'g')) + 1
    if b:csv_max_col < c
      let b:csv_max_col = c
    endif
  endfor
  if b:csv_max_col <= 1
    let b:csv_max_col = 10000
    call s:Warn("No comma-separated columns were detected.")
  endif
  return b:csv_max_col
endfunction

" Return regex to find the n-th column.
function! s:GetExpr(colnr)
  if a:colnr > 1
    return '^\%(\%(\%("\%([^"]\|""\)*"\)\|\%([^,"]*\)\),\)\{' . (a:colnr - 1) . '}\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)'
  else
    return '^\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)'
  endif
endfunction

" Extract and echo the column header on the status line.
function! s:PrintColInfo(colnr)
  let colHeading = substitute(matchstr(getline(1), s:GetExpr(a:colnr)), '^\s*\(.*\)\s*$', '\1', '')
  let info = 'Column ' . a:colnr
  if empty(colHeading)
    echo info
  else
    echon info . ': '
    echohl Keyword
    " Limit length to avoid "Hit ENTER" prompt.
    echon strpart(colHeading, 0, (&columns / 2)) . (len(colHeading) > (&columns / 2) ? "..." : "")
    echohl NONE
  endif
endfunction

" Highlight n-th column (if n > 0).
" Remove previous highlight match (ignore error if none).
" matchadd() priority -1 means 'hlsearch' will override the match.
function! s:Highlight(colnr)
  silent! call matchdelete(b:csv_match)
  if a:colnr > 0
    if exists("*matchadd")
      let b:csv_match = matchadd('Keyword', s:GetExpr(a:colnr), -1)
    else
      execute '2match Keyword /' . s:GetExpr(a:colnr) .'/'
    endif
    call s:Focus_Col(a:colnr)
  endif
endfunction

" Focus the cursor on the n-th column of the current line.
function! s:Focus_Col(colnr)
  normal! 0
  call search(s:GetExpr(a:colnr), '', line('.'))
  call s:PrintColInfo(a:colnr)
endfunction

" Highlight next column.
function! s:HighlightNextCol()
  if b:csv_column < b:csv_max_col
    let b:csv_column += 1
  endif
  call s:Highlight(b:csv_column)
endfunction

" Highlight previous column.
function! s:HighlightPrevCol()
  if b:csv_column > 1
    let b:csv_column -= 1
  endif
  call s:Highlight(b:csv_column)
endfunction

" Wrapping would distort the column-based layout.
" Lines must not be broken when typed.
setlocal nowrap textwidth=0
" Undo the stuff we changed.
let b:undo_ftplugin = "setlocal wrap< textwidth<"
    \ . "|if exists('*matchdelete')|call matchdelete(b:csv_match)|else|2match none|endif"
    \ . "|sil! exe 'nunmap <buffer> H'"
    \ . "|sil! exe 'nunmap <buffer> L'"
    \ . "|sil! exe 'nunmap <buffer> J'"
    \ . "|sil! exe 'nunmap <buffer> K'"
    \ . "|sil! exe 'nunmap <buffer> <C-f>'"
    \ . "|sil! exe 'nunmap <buffer> <C-b>'"
    \ . "|sil! exe 'nunmap <buffer> 0'"
    \ . "|sil! exe 'nunmap <buffer> $'"
    \ . "|sil exe 'augroup csv' . bufnr('')"
    \ . "|sil exe 'au!'"
    \ . "|sil exe 'augroup END'"

call s:GetNumCols()
" Highlight the first column, but not if reloading or resetting filetype.
if ! exists('b:csv_column')
  let b:csv_column = 1
endif
" Following highlights column if set filetype manually
" (BufEnter will also do it if filetype is set during load).
silent call <SID>Highlight(b:csv_column)

" Compare two lines based on the highlighted column.
function! s:CompareLines(line1, line2)
  let col1 = matchstr(a:line1, s:GetExpr(b:csv_column_sort))
  let col2 = matchstr(a:line2, s:GetExpr(b:csv_column_sort))
  return col1 > col2
endfunction

" Sort the n-th column, the highlighted one by default.
function! s:SortCol(colnr)
  if a:colnr == ""
    let b:csv_column_sort = b:csv_column
  else
    let b:csv_column_sort = str2nr(a:colnr)
  endif
  if b:csv_column_sort < 1 || b:csv_column_sort > b:csv_max_col
    call s:Warn("column number out of range.")
  endif
  call setline(2, sort(getline(2, '$'), function("s:CompareLines")))
endfunction
command! -buffer -nargs=? Sort execute <SID>SortCol("<args>")

" Delete the n-th column, the highlighted one by default.
function! s:DeleteCol(colnr)
  if a:colnr == ""
    let col = b:csv_column
  else
    let col = str2nr(a:colnr)
  endif
  if col < 1 || col > b:csv_max_col
    call s:Warn("column number out of range.")
  endif
  let regex = substitute(s:GetExpr(col), '"\\zs', '\\zs"', '')
  let regex = substitute(regex, '\\ze"', '"\\ze', '')
  let regex = substitute(regex, '\\ze', ',\\?\\ze', 'g')
  exec 'global/'.regex.'/s///'
  let b:csv_max_col-=1
  if b:csv_column>b:csv_max_col
    b:csv_column=b:csv_max_col
  endif
endfunction
command! -buffer -nargs=? DC execute <SID>DeleteCol("<args>")

" Search the n-th column. Argument in n=regex form where n is the column
" number, and regex is the expression to use. If "n=" is omitted, then
" use the current highlighted column.
function! s:SearchCol(colexp)
  let regex = '\%(\([1-9][0-9]*\)=\)\?\(.*\)'
  let colstr = substitute(a:colexp, regex, '\1', '')
  let target = substitute(a:colexp, regex, '\2', '')
  if colstr == ""
    let col = b:csv_column
  else
    let col = str2nr(substitute(a:colexp, '=.*', '', ''))
    if col < 1 || col > b:csv_max_col
      call s:Warn("column number out of range")
    endif
  endif
  if col == 1
    let @/ = '^\%(\%("\%([^,]\|""\)*\zs'.target.'\ze\%([^,]\|""\)*"\)\|\%([^,"]*\zs'.target.'\ze[^,"]*\)\)'
  else
    let @/ = '^\%(\%(\%("\%([^"]\|""\)*"\)\|\%([^,"]*\)\),\)\{' . (col-1) . '}\%(\%("\%([^,]\|""\)*\zs'.target.'\ze\%([^,]\|""\)*"\)\|\%([^,"]*\zs'.target.'\ze[^,"]*\)\)'
  endif
  "-- execute 'match Search /\c' . @/ . '/'
endfunction
" Use :SC n=string<CR> to search for string in the n-th column
command! -buffer -nargs=1 SC execute <SID>SearchCol("<args>")|normal! n
nnoremap <silent> <buffer> H :call <SID>HighlightPrevCol()<CR>
nnoremap <silent> <buffer> L :call <SID>HighlightNextCol()<CR>
nnoremap <silent> <buffer> J <Down>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> K <Up>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> <C-f> <PageDown>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> <C-b> <PageUp>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> <C-d> <C-d>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> <C-u> <C-u>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> 0 :let b:csv_column=1<CR>:call <SID>Highlight(b:csv_column)<CR>
nnoremap <silent> <buffer> $ :let b:csv_column=b:csv_max_col<CR>:call <SID>Highlight(b:csv_column)<CR>
nnoremap <silent> <buffer> gm :call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> <LocalLeader>J J
nnoremap <silent> <buffer> <LocalLeader>K K

" The column highlighting is window-local, not buffer-local, so it can persist
" even when the filetype is undone or the buffer changed.
execute 'augroup csv' . bufnr('')
  autocmd!
  " These events only highlight in the current window.
  " Note: Highlighting gets slightly confused if the same buffer is present in
  " two split windows next to each other, because then the events aren't fired.
  autocmd BufLeave <buffer> silent call <SID>Highlight(0)
  autocmd BufEnter <buffer> silent call <SID>Highlight(b:csv_column)
augroup END

Explanation of the regular expression

The above code is fairly easy to understand, except perhaps for the regular expressions. The following regex (inspired by RegExLib 1520) is used several times:

\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)

It is explained as follows:

1.  \%(                 # unbackref'ed grouping for one CSV field
2.     \%(              # first possibility of a field
3.        "             # starts and ends with "
4.        \zs           # beginning of matched string
5.        \%(
6.           [^"]\|""   # anything not ", or a "" (escaped quote)
7.        \)*
8.        \ze           # end of matched string
9.        "
10.    \)               # end first possibility of field
11.   \|                # or
12.    \%(              # second possibility
13.       \zs
14.       [^,"]*        # any thing but " and comma
15.       \ze
16.    \)
17. \)

Working with Excel xls files

One can use the xls2csv Perl script to convert Excel files to CSV, and then view/edit with Vim.

Wish List / Ideas

The header line (line 1) can be differently highlighted. Does anybody know how this can be done? I cannot think of a way of doing it. --Lpb612 19:05, 23 July 2008 (UTC)

How about this? (The main trick is the \%1l regexp item, which matches line 1; you know the rest of the regexp :-) Strictly, syntax highlighting shouldn't be done in the ftplugin, but in a separate syntax/csv.vim, though. -- Inkarkat 20:37, 23 July 2008 (UTC)
syntax match csvHeading /\%1l\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs[^,"]*\ze\)\)/
highlight def link csvHeading Type

A future unified interface could be the following:


The regular expressions needed to define the syntax. All dots are literal.
Columns are numbered from 1. zero means before the first. 

col=0|^|\d|$|.   # 0-th, 0-th, \d-th, last, current 
cols=((col|col-col),)*(col|col-col) # a list of column, or column ranges
rows=cols

flags=[c|C][a|d]  # flag used in sorting
sortcol=column[flags]*   
sortcols=(sortcol.)*sortcol

Future commands can be as follows. All commands are two-letter, starting
with c or r.  The space after the two letter command should be optional. 

:cd cols?
  # delete columns cols or delete the current column if empty
  # change directory if argument are not columns

:c/ (cols=)?pattern/?
  # search forward in columns cols (use current if empty), for pattern

:c? (cols=)?pattern/?
  # search backward in columns cols (use current if empty), for pattern

:cx cols
  # rearrange (eXchange) the columns specified, so that the order after
  # rearrangement is as specified. Example: after
  # :cx 6,1,2-4 
  # columns 1-6 should become columns 6,1,5,2-4

:cm cols col?
  # move the columns cols to col (use current if empty)
  # equivalent to :cx col cols

:ca col?
  # append an empty column at col (use current if empty)

:ci col?
  # insert an empty column at col (use current if empty)

:cy cols?
  # yank (copy) the columns cols (use current if empty)
  # Note: in implementation, only remember the cols for future paste

:cp col?
  # paste (append) the previous yanked columns to col (use current if empty)
  # clear the yanked list

:cP col?
  # Paste (insert) the previous yanked columns to col (use current if empty)
  # clear the yanked list

:cs sortcols?
  # sort rows by the columns specified, in that order. c|C denotes 
  # case sensitivity, a|d denotes direction (ascending/decending)
  # sort current column by default
  # Note: use string concatenation for implementation
  # Note: should we use :rs instead? 

:rh rows?
  # highlight the given rows (use current if emtpy)

:rH
  # turn of row highlighting

Comments

Please see the comments on the talk page.

Advertisement