Project Development: How to parse Comma Separated Values(CVS) file content with Regex in Java

Sunday, October 6, 2013

How to parse Comma Separated Values(CVS) file content with Regex in Java

CVS file is one comma separated string formatted file. It has been widely used in many fields, e.g Google Contacts Export Format, etc. It's really a good tool if your data is columnized like a table. It might be looked like:

NameAddressCell #
John12333123-456-7890
Peter13444234-567-8910

So the information is very well formatted, just like the data in an Excel Table. Then, it's a good idea to export this data to cvs file, then use the cvs file formatted data for your program.

KEY: The most important reason to use cvs formatted file: Well formatted and easy to use.

Now, for programmers like me, how can we parse it?
The data in cvs file is like a table, it's stored as rows, but separated by commas, as the name suggested.
So the intuitive idea to take care of the data is read the file line by line, then split the string you read by comma.
Code will looks like:
String s = in.readLine();
String[] str = s.split(",");

While if you think this is done, then you are 50% wrong, since most of strings might has comma in them. So it's a wise choice to use Regular Expression to split the string by comma, only if this comma is not the part of string.

The Regex expression I use is:

s.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
Hope this could help you solve the problem.

Happy Coding.

No comments:

Post a Comment