Working with CSV files

Tip 667 Printable Monobook Previous Next

created March 1, 2004 · complexity intermediate · author Benjamin Peterson · version 7.0

CSV files (comma-separated values) are often used to save tables of data in plain text. Following are some useful techniques for working with CSV files. You can:

Highlight all text in any column.
View fields (convert csv text to columns or separate lines).
Navigate using the HJKL keys to go left, down, up, right by cell.
Search for text in a specific column.

Highlighting a column

It's easy to find a column in csv text if you source the following script. Enter a command like :Csv 23 to highlight column 23.

" Highlight a column in csv text.
" :Csv 1    " highlight first column
" :Csv 12   " highlight twelfth column
" :Csv 0    " switch off highlight
function! CSVH(colnr)
  if a:colnr > 1
    let n = a:colnr - 1
    execute 'match Keyword /^\([^,]*,\)\{'.n.'}\zs[^,]*/'
    execute 'normal! 0'.n.'f,'
  elseif a:colnr == 1
    match Keyword /^[^,]*/
    normal! 0
  else
    match
  endif
endfunction
command! -nargs=1 Csv :call CSVH(<args>)

Viewing csv fields

The following commands can be entered to convert csv text to columns for easy viewing. Work on a temporary copy of your data because these commands will damage it!

" Convert csv text to columns (press u to undo).
" Warning: This deletes ',' and crops wide columns.
:let width = 20
:let fill = repeat(' ', width)
:%s/\([^,]*\),\=/\=strpart(submatch(1).fill, 0, width)/ge
:%s/\s\+$//ge

Alternatively, you can change each comma to a newline to put each field on its own line:

" Change CSV fields on current line to a list of separate items.
:s/,/\r/g

" Same, for all lines.
:%s/,/\r/g

In the replace text of a substitute, \r substitutes a newline.

Navigating in csv text

The simple script given above does not provide easy navigation, and it assumes that commas are only used as delimiters. The following code has more features, and it is presented as a plugin due to its length.

Features

Fields are correctly highlighted, according to the CSV specification: quotes, commas inside quotes, quote inside quotes are all correctly processed.
It does not go beyond the last column (from a count of the columns in the first and last three lines).
HJKL go left, down, up, right by "cell". Focus is set to the first character of the cell.
0 and $ highlight the first and last cell. Focus is set to the first character of the cell.
Ctrl-f, Ctrl-b page forward and back, while staying in the same column.
\J is Join :help J and \K is Keyword :help K (assuming the default \ local leader key :help maplocalleader).
The column number and heading (from the first line in the buffer) are displayed when moving around.
Search within column. The command
```
:SC n=str
```
will search for str in the n-th column. If "n=" is omitted, the search is within the currently highlighted column. For example:
```
:SC 2=john  " search for john in the 2nd column only
```
```
:SC john    " search for john in the currently highlighted column
```

Case sensitivity is the same as for / (toggle with :set ic! ic? see :help 'ignorecase'). After the search, the @/ variable is set. So, for example, after :SC 2=john, one can use g//d to delete all lines whose second field contains john.

Usage

Create file ~/.vim/ftplugin/csv.vim (Unix) or $HOME/vimfiles/ftplugin/csv.vim (Windows) containing the script below.

Define file type detection for *.csv files (see file type detection for moin):

autocmd BufNewFile,BufRead *.csv setf csv

Open a file (named anything.csv) that contains fields separated by commas. Use H J K L 0 $ to move from cell to cell.

If you open a *.csv file, the current column will be highlighted. Conversely, you may open a file with csv data, yet the file is not named *.csv, so highlighting will not occur.

You can switch highlighting off by setting the filetype option to nothing:

:set ft=

This command switches highlighting on:

:set ft=csv

The code

This is the ftplugin csv.vim script:

" Filetype plugin for editing CSV files.
" From http://vim.wikia.com/wiki/VimTip667
if v:version < 700 || exists("b:did_ftplugin")
  finish
endif
let b:did_ftplugin = 1

" Get the number of columns (maximum of number in first and last three
" lines; at least one of them should contain typical csv data).
function! s:GetNumCols()
  let b:csv_max_col = 1
  for l in [1, 2, 3, line('$') - 2, line('$') - 1, line('$')]
    " Determine number of columns by counting the (unescaped) commas.
    " Note: The regexp may also return unbalanced ", so filter out anything
    " which isn't a comma in the second pass.
    let c = strlen(substitute(substitute(getline(l), '\%(\%("\%([^"]\|""\)*"\)\|\%(\%([^,"]\|""\)*\)\)', '', 'g'), '"', '', 'g')) + 1
    if b:csv_max_col < c
      let b:csv_max_col = c
    endif
  endfor
  if b:csv_max_col <= 1
    let b:csv_max_col = 10000
    echohl WarningMsg
    echo "No comma-separated columns were detected. "
    echohl NONE
  endif
  return b:csv_max_col
endfunction

" Return regex to find the n-th column.
function! s:GetExpr(colnr)
  if a:colnr > 1
    return '^\%(\%(\%("\%([^"]\|""\)*"\)\|\%(\%([^,"]\|""\)*\)\),\)\{' . (a:colnr - 1) . '}\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs\%([^,"]\|""\)*\ze\)\)'
  else
    return '^\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs\%([^,"]\|""\)*\ze\)\)'
  endif
endfunction

" Extract and echo the column header on the status line.
function! s:PrintColInfo(colnr)
  let colHeading = substitute( matchstr( getline(1), s:GetExpr(a:colnr) ), '^\s*\(.*\)\s*$', '\1', '' )
  let info = 'Column ' . a:colnr
  if ! empty(colHeading)
    let info .= ': ' . colHeading
  endif
  " Limit length to avoid "Hit ENTER" prompt.
  echo strpart(info, 0, (&columns / 2)) . (len(info) > (&columns / 2) ? "..." : "")
endfunction

" Highlight n-th column (if n > 0).
" Remove previous highlight match (ignore error if none).
" matchadd() priority -1 means 'hlsearch' will override the match.
function! s:Highlight(colnr)
  silent! call matchdelete(b:csv_match)
  if a:colnr > 0
    let b:csv_match = matchadd('Keyword', s:GetExpr(a:colnr), -1)
    call s:Focus_Col(a:colnr)
  endif
endfunction

" Focus the cursor on the n-th column of the current line.
function! s:Focus_Col(colnr)
  normal! 0
  call search(s:GetExpr(a:colnr), '', line('.'))
  call s:PrintColInfo(a:colnr)
endfunction

" Highlight next column.
function! s:HighlightNextCol()
  if b:csv_column < b:csv_max_col
    let b:csv_column += 1
  endif
  call s:Highlight(b:csv_column)
endfunction

" Highlight previous column.
function! s:HighlightPrevCol()
  if b:csv_column > 1
    let b:csv_column -= 1
  endif
  call s:Highlight(b:csv_column)
endfunction

" Wrapping would distort the column-based layout.
" Lines must not be broken when typed.
setlocal nowrap textwidth=0
" Undo the stuff we changed.
let b:undo_ftplugin = "setlocal wrap< textwidth<"
    \ . "|sil! call matchdelete(b:csv_match)"
    \ . "|sil! exe 'nunmap <buffer> H'"
    \ . "|sil! exe 'nunmap <buffer> L'"
    \ . "|sil! exe 'nunmap <buffer> J'"
    \ . "|sil! exe 'nunmap <buffer> K'"
    \ . "|sil! exe 'nunmap <buffer> <C-f>'"
    \ . "|sil! exe 'nunmap <buffer> <C-b>'"
    \ . "|sil! exe 'nunmap <buffer> 0'"
    \ . "|sil! exe 'nunmap <buffer> $'"
    \ . "|sil exe 'augroup csv' . bufnr('')"
    \ . "|sil exe 'au!'"
    \ . "|sil exe 'augroup END'"

call s:GetNumCols()
" Highlight the first column, but not if reloading or resetting filetype.
if ! exists('b:csv_column')
  let b:csv_column = 1
endif
" Following highlights column if set filetype manually
" (BufEnter will also do it if filetype is set during load).
silent call <SID>Highlight(b:csv_column)

" Search the n-th column. Argument in n=regex form where n is the column
" number, and regex is the expression to use. If "n=" is omitted, then
" use the current highlighted column.
function! s:SearchCol(colexp)
  let regex = '\%(\([1-9][0-9]*\)=\)\?\(.*\)'
  let colstr = substitute(a:colexp, regex, '\1', '')
  let target = substitute(a:colexp, regex, '\2', '')
  if colstr == ""
    let col = b:csv_column
  else
    let col = str2nr(substitute(a:colexp, '=.*', '', ''))
    if col<1 || col>b:csv_max_col
      echoerr "column number out of range"
    endif
  endif
  if col == 1
    let @/ = '^\%(\%("\%([^,]\|""\)*\zs'.target.'\ze\%([^,]\|""\)*"\)\|\%(\%([^,"]\|""\)*\zs'.target.'\ze\%([^,"]\|""\)*\)\)'
  else
    let @/ = '^\%(\%(\%("\%([^"]\|""\)*"\)\|\%(\%([^,"]\|""\)*\)\),\)\{' . (col-1) . '}\%(\%("\%([^,]\|""\)*\zs'.target.'\ze\%([^,]\|""\)*"\)\|\%(\%([^,"]\|""\)*\zs'.target.'\ze\%([^,"]\|""\)*\)\)'
  endif
endfunction
" Use :SC n=string<CR> to search for string in the n-th column
command! -nargs=1 SC execute <SID>SearchCol("<args>")|normal n
" to avoid press-enter message due to long regex
nnoremap <silent> <buffer> n n
nnoremap <silent> <buffer> N N

nnoremap <silent> <buffer> H :call <SID>HighlightPrevCol()<CR>
nnoremap <silent> <buffer> L :call <SID>HighlightNextCol()<CR>
nnoremap <silent> <buffer> J <Down>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> K <Up>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> <C-f> <PageDown>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> <C-b> <PageUp>:call <SID>Focus_Col(b:csv_column)<CR>
nnoremap <silent> <buffer> 0 :let b:csv_column=1<CR>:call <SID>Highlight(b:csv_column)<CR>
nnoremap <silent> <buffer> $ :let b:csv_column=b:csv_max_col<CR>:call <SID>Highlight(b:csv_column)<CR>
nnoremap <silent> <buffer> <LocalLeader>J J
nnoremap <silent> <buffer> <LocalLeader>K K

" The match is window-local, not buffer-local, so it can persist even when the
" filetype is undone or the buffer changed.
execute 'augroup csv' . bufnr('')
  autocmd!
  " These events only highlight in the current window.
  " Note: Highlighting gets slightly confused if the same buffer is present in
  " two split windows next to each other, because then the events aren't fired.
  autocmd BufLeave <buffer> silent call <SID>Highlight(0)
  autocmd BufEnter <buffer> silent call <SID>Highlight(b:csv_column)
augroup END

Explanation of the regular expression used

The above code is fairly easy to understand, except perhaps the regular expression used. This section provides some explanation of it, which may be useful for others who want to improve the code, or even in other projects. The expression was inspired by a similar one given at http://regexlib.com/REDetails.aspx?regexp_id=1520

The following regular expression is used several times:

\%(\%("\zs\%([^"]\|""\)*\ze"\)\|\%(\zs\%([^,"]\|""\)*\ze\)\)

It is explained as follows:

1.  \%(   /* unbackref'ed grouping for one CSV field */
2.      \%( /* first possibility of a field: one that starts and ends with " */
3.          "
4.          \zs /* beginning of matched string */
5.          \%(
6.              [^"]\|"" /* anything not ", or a "" (escaped quote) */
7.          \)*
8.          \ze /* end of matched string */
9.          "
10.      \) /* end first possibility of field */
11.     \| /* or */
12.     \%( /* second possibility */
13.         \zs
14.         \%([^,"]\|""\)* /* any thing but comma and " (except for "") */
15.         \ze
16.     \)
17. \)

Working with Excel xls files

One can use the xls2csv Perl script to convert Excel files to CSV, and then view/edit with Vim. See http://search.cpan.org/perldoc?xls2csv

Comments

Please see the comments on the talk page.