« Going to EVN? Go to Epazote. | Explaining Tor to Non-Technical Students » |
Munging CSV with Emacs regexp
hacks(This post is mostly for me, as I end up doing things like this infrequently and tend to forget how to do it.)
I recently found myself with a CSV file that looked like so:
First,Last,Affiliation
....
and wanted to get it into a form for emailing:
First Last (Affiliation)
...
Emacs' replace-regexp
is my tool of choice, so I messed around a bit and constructed the following regexp (should be all one unbroken line in emacs regexp minibuffer):
\([[:alnum:] ]+\),
\([[:alnum:] ]+\),
\([[:alnum:] ,\.\"\&\/\)\(]+\)
and replaced it with:
\1 \2 (\3)
The regexp does the following:
\(...\)
is a regexp container, that you can reference in the replacement with\1
(and for successive containers, use\2
,\3
, etc.)[[:alnum:] ]+
says find any sequence of characters containingA-Z
,a-z
or0-9
or whitespace.[[:alnum:] ,\.\"\&\/\)\(]+
does the same but also includes a number of non-letter characters that people seem to use in their affiliations when presented with free-text entry, specifically [,."/)(
]. (The comma doesn't need to be escaped... and we want to include it as someone may have an affiliation like "Univ. of Ca., Berkeley" where there's a comma inside the quoted string that's not a CSV delimiter.)
So, this is a long-winded way of saying: grab the first chunk of stuff before a comma, remember it; grab the next chunk of stuff before a comma, remember it; grab the rest of the stuff to the end of the line; remember it. And the replacement says, put the first chunk down then a space then the second chunk and a space then an open parenthesis, then the final chunk and finally a close parenthesis.