项目主页 https://github.com/uniVocity/univocity-parsers
uniVocity-parsers是一组可靠快速的Java解析器集合,包含多种文件类型的处理接口,为开发新解析器提供一个坚实的框架。
CSV解析器可以处理很多分隔符格式,比如竖线分隔、CSV的子类型等。接下来,uniVocity-parsers会支持更多格式。如果有需要支持的格式,可以向 parsers@univocity.com 发邮件。我们会根据大家的要求加入新的解析器。
我们还提供了每个类的说明,可以根据你的需要自己创建自定义解析器。我们还为构建解析器提供商业支持(可以向 support@univocity.com 发邮件)。<!-- csv -->
<dependency>
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
<version>2.5.7</version>
<type>jar</type>
</dependency>
uniVocity-parsers实现了以下功能需求:
1.1 CSV文件1.2 固定宽度文件1.3 TSV文件
2.1 File注释2.2 部分读取2.3 跳跃读取
# 这个示例从Wikipedia截取(en.wikipedia.org/wiki/Comma-separated_values)
#
# 2 根据RFC4180标准,使用双引号 ("") 作为字段引用的转义符
#
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
# 下面是包含多行的值以及空行的例子
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
,,"Venture ""Extended Edition""","",4900.00
public Reader getReader(String relativePath) {
...
return new InputStreamReader(this.getClass().getResourceAsStream(relativePath), "UTF-8");
...
}
CsvParserSettings settings = new CsvParserSettings();
// 文件中使用 '\n' 作为行分隔符
// 确保像MacOS和Windows这样的系统
// 也可以正确处理(MacOS使用'\r';Windows使用'\r\n')
settings.getFormat().setLineSeparator("\n");
// 创建CSV解析器
CsvParser parser = new CsvParser(settings);
// 一行语句处理所有行
List<String[]> allRows = parser.parseAll(getReader("/examples/example.csv"));
1 [Year, Make, Model, Description, Price]
-----------------------
2 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
3 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
4 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
5 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
6 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------
// 创建CSV解析器
CsvParser parser = new CsvParser(settings);
// 调用beginParsing逐个读取记录,使用迭代器iterator
parser.beginParsing(getReader("/examples/example.csv"));
String[] row;
while ((row = parser.parseNext()) != null) {
println(out, Arrays.toString(row));
}
// 在读取结束时自动关闭所有资源,
// 或者当错误发生时,可以在任何使用调用stopParsing()
// 只有在不是读取所有内容的情况下调用下面方法
// 但如果不调用也没有非常严重的问题
parser.stopParsing();
"You are \"beautiful\""
"Yes, \\\"in the inside\"\\"
// 引号内部的值,如果包含引号需要用反斜线转义 \"
settings.getFormat().setQuoteEscape('\\');
// 如果发现引号内部有两个反斜线,表示值中包含了一个反斜线
settings.getFormat().setCharToEscapeQuoteEscaping('\\');
[You are "beautiful"]
[Yes, \"in the inside"\]
// settings对象提供了很多配置选项
CsvParserSettings parserSettings = new CsvParserSettings();
// 可以将解析器配置成自动检测输入中的行分隔符
parserSettings.setLineSeparatorDetectionEnabled(true);
// RowListProcessor将每一行解析的内容存储到一个List
RowListProcessor rowProcessor = new RowListProcessor();
// 可以配置解析器为每行解析的内容使用RowProcessor处理
// 可以在 'com.univocity.parsers.common.processor' 包中找到RowProcessors,也可以自己定义
parserSettings.setRowProcessor(rowProcessor);
// 考虑文件中的第一行内容解析为列标题
parserSettings.setHeaderExtractionEnabled(true);
// 为每个指定配置创建一个解析器实例
CsvParser parser = new CsvParser(parserSettings);
// 'parse'方法能够解析文件并将结果转给你指定的RowProcessor
parser.parse(getReader("/examples/example.csv"));
// 从RowListProcessor中获取解析的记录
// 注意,不同的RowProcessor实现会提供不同的功能组合
String[] headers = rowProcessor.getHeaders();
List<String[]> rows = rowProcessor.getRows();
[Year, Make, Model, Description, Price]
=======================
1 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
2 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
3 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
4 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
5 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------
// ObjectRowProcessor会把解析的值进行转换,然后输出
ObjectRowProcessor rowProcessor = new ObjectRowProcessor() {
@Override
public void rowProcessed(Object[] row, ParsingContext context) {
//对行内容处理,这里只进行打印
println(out, Arrays.toString(row));
}
};
// 将"Price"列(索引号4)的值转换为BigDecimal
rowProcessor.convertIndexes(Conversions.toBigDecimal()).set(4);
// 将"Make, Model and Description"列的值转成小谢,把"chevy"的值转为null
rowProcessor.convertFields(Conversions.toLowerCase(), Conversions.toNull("chevy")).set("Make", "Model", "Description");
// 将索引号0(year)的值转成BigInteger,Nulls会转为BigInteger.ZERO
rowProcessor.convertFields(new BigIntegerConversion(BigInteger.ZERO, "0")).set("year");
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
// rowProcessor在这里执行
parser.parse(getReader("/examples/example.csv"));
[1997, ford, e350, ac, abs, moon, 3000.00]
[1999, null, venture "extended edition", null, 4900.00]
[1996, jeep, grand cherokee, must sell!
air, moon roof, loaded, 4799.00]
[1999, null, venture "extended edition, very large", null, 5000.00]
[0, null, venture "extended edition", null, 4900.00]
class TestBean {
// 如果quantity列中的值为"?"或"-",会被替换为null
@NullString(nulls = { "?", "-" })
// 如果值解析为null,会被转换成字符串"0"
@Parsed(defaultNullRead = "0")
private Integer quantity; // 根据属性值的类型会选择对应的转换器进行处理
// 上面的情况会使用IntegerConversion
// 字段名会与文件中的列标题自动匹配
@Trim
@LowerCase
// comments字段的值与索引号为4的列匹配(0表示第一列,也就是说对应文件中的第5列)
@Parsed(index = 4)
private String comments;
// 还可以为字段指定列名
@Parsed(field = "amount")
private BigDecimal amount;
@Trim
@LowerCase
// "no"、"n"和"null"会转换为false;"yes"和"y"会转换为true
@BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
@Parsed
private Boolean pending;
//
// BeanListProcessor将每个解析后的行转换为指定类的一个实例,将结果存储到一个list
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/bean_test.csv"));
// BeanListProcessor从输入中提取出一个对象列表
List<TestBean> beans = rowProcessor.getBeans();
[TestBean [quantity=1, comments=?, amount=555.999, pending=true], TestBean [quantity=0, comments=" something ", amount=null, pending=false]]
class WordsToSetConversion implements Conversion<String, Set<String>> {
private final String separator;
private final boolean toUpperCase;
public WordsToSetConversion(String... args) {
String separator = ",";
boolean toUpperCase = true;
if (args.length == 1) {
separator = args[0];
}
if (args.length == 2) {
toUpperCase = Boolean.valueOf(args[1]);
}
this.separator = separator;
this.toUpperCase = toUpperCase;
}
public WordsToSetConversion(String separator, boolean toUpperCase) {
this.separator = separator;
this.toUpperCase = toUpperCase;
}
@Override
public Set<String> execute(String input) {
if (input == null) {
return Collections.emptySet();
}
if (toUpperCase) {
input = input.toUpperCase();
}
Set<String> out = new TreeSet<String>();
for (String token : input.split(separator)) {
// 提取空格分隔的单词
for (String word : token.trim().split("\\s")) {
out.add(word.trim());
}
}
return out;
}
//
class Car {
@Parsed
private Integer year;
@Convert(conversionClass = WordsToSetConversion.class, args = { ",", "true" })
@Parsed
private Set<String> description;
//
BeanListProcessor<Car> rowProcessor = new BeanListProcessor<Car>(Car.class);
parserSettings.setRowProcessor(rowProcessor);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/example.csv"));
// 获取car对象
List<Car> cars = rowProcessor.getBeans();
for (Car car : cars) {
// 只处理带有描述的car对象
if (!car.getDescription().isEmpty()) {
println(out, car.getDescription() + " - " + car.toString());
}
}
[ABS, AC, MOON] - year=1997, make=Ford, model=E350, price=3000.00
[AIR, LOADED, MOON, MUST, ROOF, SELL!] - year=1996, make=Jeep, model=Grand Cherokee, price=4799.00
// 首先,创建RowProcessor处理所有的“详细”信息
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();
// 将“Amount”列(文件中的第1个位置)值转换为integer
detailProcessor.convertIndexes(Conversions.toInteger()).set(1);
// 接着,创建MasterDetailProcessor识别行是否为“主信息”
// 行位置参数表明主信息行是在“详细”信息的前面或者后面出现
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.BOTTOM, detailProcessor) {
@Override
protected boolean isMasterRecord(String[] row, ParsingContext context) {
// 如果行内容为主信息,返回true
// 在这个例子中,行开头带有”Total“字样的列是主信息
return "Total".equals(row[0]);
}
};
// 我们要在主信息行的“Amount”列存储BigIntegers值
masterRowProcessor.convertIndexes(Conversions.toBigInteger()).set(1);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setHeaderExtractionEnabled(true);
// 将RowProcessor设为masterRowProcessor
parserSettings.setRowProcessor(masterRowProcessor);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/master_detail.csv"));
// 获取MasterDetailRecord元素
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
MasterDetailRecord masterRecord = rows.get(0);
// 主记录包含了主信息行和多行详细信息
Object[] masterRow = masterRecord.getMasterRow();
List<Object[]> detailRows = masterRecord.getDetailRows();
[Total, 100]
=======================
1 [Item1, 50]
-----------------------
2 [Item2, 40]
-----------------------
3 [Item3, 10]
-----------------------
YearMake_Model___________________________________Description_____________________________Price___
1997Ford_E350____________________________________ac, abs, moon___________________________3000.00_
1999ChevyVenture "Extended Edition"______________________________________________________4900.00_
1996Jeep_Grand Cherokee__________________________MUST SELL!
air, moon roof, loaded_______4799.00_
1999ChevyVenture "Extended Edition, Very Large"__________________________________________5000.00_
_________Venture "Extended Edition"______________________________________________________4900.00_
// 创建待解析文件中的字段长度
FixedWidthFieldLengths lengths = new FixedWidthFieldLengths(4, 5, 40, 40, 8);
// 创建固定宽度解析器的默认配置
FixedWidthParserSettings settings = new FixedWidthParserSettings(lengths);
// 设置文件中未写入内容的填充字符
settings.getFormat().setPadding('_');
// 示例文件中采用'\n'作为行分隔符
// 行分隔符确保了像MacOS和Windows这样的系统
// 能够正确处理文件(MacOS使用'\r';Windows使用uses '\r\n')
settings.getFormat().setLineSeparator("\n");
// 使用给定配置创建固定宽度的解析器
FixedWidthParser parser = new FixedWidthParser(settings);
// 一次解析所有行
List<String[]> allRows = parser.parseAll(getReader("/examples/example.txt"));
1 [Year, Make, Model, Description, Price]
-----------------------
2 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
3 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
4 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
5 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
6 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------
# TSV也可以有注释
# 多行记录由\n分隔
# 可以接受的分隔符有:\n、\t、\r和\\
Year Make Model Description Price
1997 Ford E350 ac, abs, moon 3000.00
1999 Chevy Venture "Extended Edition" 4900.00
# 值包含多行及空行示例!
1996 Jeep Grand Cherokee MUST SELL!\nair, moon roof, loaded 4799.00
1999 Chevy Venture "Extended Edition, Very Large" 5000.00
Venture "Extended Edition" 4900.00
TsvParserSettings settings = new TsvParserSettings();
// 示例文件中使用'\n'作为行分隔符
// 行分隔符确保了像MacOS和Windows这样的系统
// 能够正确处理文件(MacOS使用'\r';Windows使用uses '\r\n')
settings.getFormat().setLineSeparator("\n");
// 创建TSV解析器
TsvParser parser = new TsvParser(settings);
// 一次处理所有行
List<String[]> allRows = parser.parseAll(getReader("/examples/example.tsv"));
1 [Year, Make, Model, Description, Price]
-----------------------
2 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
3 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
4 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
5 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
6 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
...
// 值选择"Price"、"Year"和"Make"列
// 解析器会跳过其它字段
parserSettings.selectFields("Price", "Year", "Make");
// 让我们解析这些配置并输出解析的行
List<String[]> parsedRows = parseWithSettings(parserSettings);
1 [3000.00, 1997, Ford]
-----------------------
2 [4900.00, 1999, Chevy]
-----------------------
...
// 这里只根据索引号选择
// 解析器会跳过其它列
parserSettings.selectIndexes(4, 0, 1);
// 根据配置进行解析并输出结果
List<String[]> parsedRows = parseWithSettings(parserSettings);
// 这里只选择"Price"、"Year"和"Make"列
// 解析器会跳过其它字段
parserSettings.selectFields("Price", "Year", "Make");
// 默认会对列重新排序
// 禁用后,所有列会按照文件中定义的顺序输出
// 未选择的字段不会解析,输出null
parserSettings.setColumnReorderingEnabled(false);
// 使用上面的配置解析并输出
List<String[]> parsedRows = parseWithSettings(parserSettings);
[1997, Ford, null, null, 3000.00]
-----------------------
2 [1999, Chevy, null, null, 4900.00]
-----------------------
3 [1996, Jeep, null, null, 4799.00]
...
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");
parserSettings.setHeaderExtractionEnabled(true);
// 使用列处理器读取所有列的值
ColumnProcessor rowProcessor = new ColumnProcessor();
parserSettings.setRowProcessor(rowProcessor);
CsvParser parser = new CsvParser(parserSettings);
// 送入列处理器
parser.parse(getReader("/examples/example.csv"));
// 最后,得到列解析的值
Map<String, List<String>> columnValues = rowProcessor.getColumnValuesAsMapOfNames();
Year -> [1997, 1999, 1996, 1999, null]
Description -> [ac, abs, moon, null, MUST SELL!
air, moon roof, loaded, null, null]
Model -> [E350, Venture "Extended Edition", Grand Cherokee, Venture "Extended Edition, Very Large", Venture "Extended Edition"]
Price -> [3000.00, 4900.00, 4799.00, 5000.00, 4900.00]
Make -> [Ford, Chevy, Jeep, Chevy, null]
// 处理大文件,可以使用批量列处理器
// 批量值大小设置为3,每一批最多读取3行记录
settings.setRowProcessor(new BatchedColumnProcessor(3) {
@Override
public void batchProcessed(int rowsInThisBatch) {
List<List<String>> columnValues = getColumnValuesAsList();
println(out, "Batch " + getBatchesProcessed() + ":");
int i = 0;
for (List<String> column : columnValues) {
println(out, "Column " + (i++) + ":" + column);
}
}
});
FixedWidthParser parser = new FixedWidthParser(settings);
parser.parse(getReader("/examples/example.txt"));
Batch 0:
Column 0:[1997, 1999, 1996]
Column 1:[Ford, Chevy, Jeep]
Column 2:[E350, Venture "Extended Edition", Grand Cherokee]
Column 3:[ac, abs, moon, null, MUST SELL!
air, moon roof, loaded]
Column 4:[3000.00, 4900.00, 4799.00]
Batch 1:
Column 0:[1999, null]
Column 1:[Chevy, null]
Column 2:[Venture "Extended Edition, Very Large", Venture "Extended Edition"]
Column 3:[null, null]
Column 4:[5000.00, 4900.00]
// ObjectColumnProcessor将解析的值进行转换并存储到列中
// 使用BatchedObjectColumnProcessor用作每一批的列处理
ObjectColumnProcessor rowProcessor = new ObjectColumnProcessor();
// 把"Price"列(索引号4)解析的内容转换为BigDecimal
rowProcessor.convertIndexes(Conversions.toBigDecimal()).set(4);
// 把"Make, Model and Description"列中的值转为小写,将"chevy"的值置为null
rowProcessor.convertFields(Conversions.toLowerCase(), Conversions.toNull("chevy")).set("Make", "Model", "Description");
// 把索引号0(year)的值转为BigInteger。Null会转为BigInteger.ZERO
rowProcessor.convertFields(new BigIntegerConversion(BigInteger.ZERO, "0")).set("year");
parserSettings.setRowProcessor(rowProcessor);
TsvParser parser = new TsvParser(parserSettings);
// 这里会调用rowProcessor
parser.parse(getReader("/examples/example.tsv"));
// 获取列值:
Map<Integer, List<Object>> columnValues = rowProcessor.getColumnValuesAsMapOfIndexes();
0 -> [1997, 1999, 1996, 1999, 0]
1 -> [ford, null, jeep, null, null]
2 -> [e350, venture "extended edition", grand cherokee, venture "extended edition, very large", venture "extended edition"]
3 -> [ac, abs, moon, null, must sell!
air, moon roof, loaded, null, null]
4 -> [3000.00, 4900.00, 4799.00, 5000.00, 4900.00]v
parserSettings.setRowProcessor(new ConcurrentRowProcessor(rowProcessor));
// 创建一个TSV解析器
TsvParser parser = new TsvParser(new TsvParserSettings());
String[] line;
line = parser.parseLine("A B C");
println(out, Arrays.toString(line));
line = parser.parseLine("1 2 3 4");
println(out, Arrays.toString(line));v
[A, B, C]
[1, 2, 3, 4]
// 可以配置解析器自动检测输入中的行分隔符
parserSettings.setLineSeparatorDetectionEnabled(true);
// 设置解析结果为null时填入的默认值
parserSettings.setNullValue("<NULL>");
// 设置解析结果为空时填入的默认值
parserSettings.setEmptyValue("<EMPTY>"); // for CSV only
// 设置解析文件的标题。如果需要处理标题,那么 'setHeaderExtractionEnabled(true)'
// 会让解析器忽略第一行内容
parserSettings.setHeaders("a", "b", "c", "d", "e");
// 按照逆序打印列信息
// 注意:如果选中指定的字段,列输出的顺序与实际相同
parserSettings.selectFields("e", "d", "c", "b", "a");
// 不跳过开始的空格
parserSettings.setIgnoreLeadingWhitespaces(false);
// 不跳过结尾的空格
parserSettings.setIgnoreTrailingWhitespaces(false);
// 读取固定长度的记录,停止后关闭资源
parserSettings.setNumberOfRecordsToRead(9);
// 不跳过空行
parserSettings.setSkipEmptyLines(false);
// 为每一列设置最大字符个数,<span style="color: #333333;font-family: 'Courier 10 Pitch', Courier, monospace;font-size: 12px;font-style: normal;font-weight: normal">默认为4096字符。</span>
// 这么做可以避免文件格式错误时报告OutOfMemoryErrors错误。
// 在这种情况下会一直读取输入的内容直到文件结束或者内存耗尽。
// 设置限制能避免医疗之外的JVM崩溃。
parserSettings.setMaxCharsPerColumn(100);
// 处于上面相同的原因,设置输入的最大列数
// 默认为512
parserSettings.setMaxColumns(10);
// 设置解析器的缓冲大小
parserSettings.setInputBufferSize(1000);
// 禁用单独线程加载输入缓冲。默认情况下,会根据输入内容增多逐渐增加处理线程个数。
// 当处理大文件时(> 100 Mb),可设置为true以获得更好的性能。
parserSettings.setReadInputOnSeparateThread(false);
// 解析配置并输出解析的行信息
List<String[]> parsedRows = parseWithSettings(parserSettings);
1 [<NULL>, <NULL>, <NULL>, <NULL>, <NULL>]
-----------------------
2 [Price, Description, Model, Make, Year]
-----------------------
3 [3000.00, ac, abs, moon, E350, Ford, 1997]
-----------------------
4 [4900.00, <EMPTY>, Venture "Extended Edition", Chevy, 1999]
-----------------------
5 [<NULL>, <NULL>, <NULL>, <NULL>, ]
-----------------------
6 [<NULL>, <NULL>, <NULL>, <NULL>, ]
-----------------------
7 [4799.00, MUST SELL!
air, moon roof, loaded, Grand Cherokee, Jeep, 1996]
-----------------------
8 [5000.00, <NULL>, Venture "Extended Edition, Very Large", Chevy, 1999]
-----------------------
9 [4900.00, <EMPTY>, Venture "Extended Edition", <NULL>, <NULL>]
-----------------------
...v
// 在这个例子中,我们不会读取最后8个字符(Year这一列)
// 也不会为填充字符设为'_',这样输出看起来更容易理解
// 这样可以看到正在处理的字符
FixedWidthParserSettings parserSettings = new FixedWidthParserSettings(new FixedWidthFieldLengths(4, 5, 40, 40 /*, 8*/));
// 示例文件中使用'\n'作为行分隔符
// 行分隔符确保了像MacOS和Windows这样的系统
// 能够正确处理文件(MacOS使用'\r';Windows使用uses '\r\n')
parserSettings.getFormat().setLineSeparator("\n");
// 固定宽度解析器设置与CSV设置相似,只有一些额外的配置需要了解:
// 如果行内容长度超过定义,超出部分会跳过
parserSettings.setSkipTrailingCharsUntilNewline(true);
// 如果行内容长度小于定义,认为已正确解析。下一行的内容会作为新记录处理
parserSettings.setRecordEndsOnNewline(true);
RowListProcessor rowProcessor = new RowListProcessor();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
FixedWidthParser parser = new FixedWidthParser(parserSettings);
parser.parse(getReader("/examples/example.txt"));
List<String[]> rows = rowProcessor.getRows();
1 [1997, Ford_, E350____________________________________, ac, abs, moon___________________________]
-----------------------
2 [1999, Chevy, Venture "Extended Edition"______________, ________________________________________]
-----------------------
3 [1996, Jeep_, Grand Cherokee__________________________, MUST SELL!]
-----------------------
4 [air,, moon, roof, loaded_______4799.00_]
-----------------------
5 [1999, Chevy, Venture "Extended Edition, Very Large"__, ________________________________________]
-----------------------
6 [____, _____, Venture "Extended Edition"______________, ________________________________________]
-----------------------
// 所有要做的就是创建一个 CsvWriter 实例,带默认配置的 CsvWriterSettings
// 默认情况下,只有包含了字段分隔符的值会在引号中出现
// 如果引号是内容的一部分,会自动转义
// 会自动抛弃空行
CsvWriter writer = new CsvWriter(outputWriter, new CsvWriterSettings());
// 写入文件标题
writer.writeHeaders("Year", "Make", "Model", "Description", "Price");
// 将所有内容写入并关闭 Writer 实例
writer.writeRowsAndClose(rows); // 所有要做的就是创建一个 CsvWriter 实例,带默认配置的 CsvWriterSettings
// 默认情况下,只有包含了字段分隔符的值会在引号中出现
// 如果引号是内容的一部分,会自动转义
// 会自动抛弃空行
CsvWriter writer = new CsvWriter(outputWriter, new CsvWriterSettings());
// 写入文件标题
writer.writeHeaders("Year", "Make", "Model", "Description", "Price");
// 将所有内容写入并关闭 Writer 实例
writer.writeRowsAndClose(rows);
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,Venture "Extended Edition",,4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
,,Venture "Extended Edition",,4900.00
// 使用CsvWriter时,只要创建一个TsvWriter实例,带上TsvWriterSettings配置
TsvWriter writer = new TsvWriter(outputWriter, new TsvWriterSettings());
// 向文件写入标题
writer.writeHeaders("Year", "Make", "Model", "Description", "Price");
// 将所有内容写入并关闭输出的Writer实例
writer.writeRowsAndClose(rows);
Year Make Model Description Price
1997 Ford E350 ac, abs, moon 3000.00
1999 Chevy Venture "Extended Edition" 4900.00
1996 Jeep Grand Cherokee MUST SELL!\nair, moon roof, loaded 4799.00
1999 Chevy Venture "Extended Edition, Very Large" 5000.00
Venture "Extended Edition" 4900.00
CsvWriterSettings settings = new CsvWriterSettings();
// 为值为null的字段设置写入的字符
settings.setNullValue("?");
// 将注释字符设置为 -
settings.getFormat().setComment('-');
// 设置空值写入的字符
settings.setEmptyValue("!");
// 要求写入空行
settings.setSkipEmptyLines(false);
// 根据上面的配置创建writer
CsvWriter writer = new CsvWriter(outputWriter, settings);
// 写标题
writer.writeHeaders("a", "b", "c", "d", "e");
// 逐行写(跳过第一行)
for (int i = 1; i < rows.size(); i++) {
// 可以为每一行写注释
writer.commentRow("This is row " + i);
// 写行内容
writer.writeRow(rows.get(i));
}
// 必须关闭writer。确保关闭CsvWriter示例用到的所有java.io.Writer
// 注意,这里不会抛出任何受检异常。如果出现错误,会得到带有原始错误信息的IllegalStateException
writer.close();
a,b,c,d,e
-This is row 1
1999,Chevy,Venture "Extended Edition",!,4900.00
-This is row 2
1996,Jeep,Grand Cherokee,"MUST SELL!
...
在保持输出格式一致性的同时,可以“透明地”将一些字段写入CSV文件。假设你有一个CSV文件包含了5列,但是只有3列有数据而且顺序不同。只需要为文件配置标题信息并选择需要写值的字段即可。
CsvWriterSettings settings = new CsvWriterSettings();
// 写文件时,null会输出空值(默认为"")
// 这里将writer配置为,用?表示null值
settings.setNullValue("?");
// 如果写入的值不是null,但是空字符串(例如""),writer可以配置成 will can be configured to
// 为非null或空值打印默认的表示内容
settings.setEmptyValue("!");
// 为所有记录内容加上引号
settings.setQuoteAllFields(true);
// 设置文件标题(针对选择的内容,不会自动写入)
settings.setHeaders("Year", "Make", "Model", "Description", "Price");
// 选择哪些输入的字段应该写入。在这个例子中,"make"和"model"字段都是空值
// 选择字段时不区分大小写
settings.selectFields("description", "price", "year");
// 为上面的配置创建一个writer
CsvWriter writer = new CsvWriter(outputWriter, settings);
// 为上面的配置写入指定的标题
writer.writeHeaders();
// 为选择的字段在每一行写值(注意,写入的值与选择的字段必须顺序一致)
writer.writeRow("ac, abs, moon", 3000.00, 1997);
writer.writeRow("", 4900.00, 1999); // 注意:根据emptyQuotedValue的配置,空字符串会替换成 "!"
writer.writeRow("MUST SELL!\nair, moon roof, loaded", 4799.00, 1996);
writer.close();
"Year","Make","Model","Description","Price"
"1997","?","?","ac, abs, moon","3000.0"
"1999","?","?","!","4900.0"
"1996","?","?","MUST SELL!
...
FixedWidthFieldLengths lengths = new FixedWidthFieldLengths(15, 10, 35);
FixedWidthWriterSettings settings = new FixedWidthWriterSettings(lengths);
// 所有null值会写成?
settings.setNullValue("nil");
settings.getFormat().setPadding('_');
settings.setIgnoreLeadingWhitespaces(false);
settings.setIgnoreTrailingWhitespaces(false);
// 创建 ObjectRowWriterProcessor 对象,处理 TestBean 类中带注解的字段
ObjectRowWriterProcessor processor = new ObjectRowWriterProcessor();
settings.setRowWriterProcessor(processor);
// 将“date”字段转换为 yyyy-MMM-dd 格式
processor.convertFields(Conversions.toDate(" yyyy MMM dd "), Conversions.trim()).add("date");
// 为输入行中第2个位置的字符串执行trim操作
processor.convertIndexes(Conversions.trim(), Conversions.toUpperCase()).add(2);
// 设置文件标题(header)信息,这样writer能够知道TestBean示例中写值的正确顺序
settings.setHeaders("date", "quantity", "comments");
// 根据上面的配置创建writer
FixedWidthWriter writer = new FixedWidthWriter(outputWriter, settings);
// 写入配置中的标题信息
writer.writeHeaders();
// 写入“bean”中固定宽度的值信息。请注意:这里没有带注解
// “date”列的属性,值为null(会被转换为? a)
writer.processRecord(new Date(0), null, " a comment ");
writer.processRecord(null, 1000, "");
writer.close();
date___________quantity__comments___________________________
1970 Jan 01____nil_______A COMMENT__________________________
nil____________1000_________________________________________
FixedWidthFieldLengths lengths = new FixedWidthFieldLengths(10, 10, 35, 10, 40);
FixedWidthWriterSettings settings = new FixedWidthWriterSettings(lengths);
// Any null values will be written as ?
settings.setNullValue("?");
// Creates a BeanWriterProcessor that handles annotated fields in the TestBean class.
settings.setRowWriterProcessor(new BeanWriterProcessor<TestBean>(TestBean.class));
// Sets the file headers so the writer knows the correct order when writing values taken from a TestBean instance
settings.setHeaders("amount", "pending", "date", "quantity", "comments");
// Creates a writer with the above settings;
FixedWidthWriter writer = new FixedWidthWriter(outputWriter, settings);
// Writes the headers specified in the settings
writer.writeHeaders();
// writes a fixed width row with empty values (as nothing was set in the TestBean instance).
writer.processRecord(new TestBean());
TestBean bean = new TestBean();
bean.setAmount(new BigDecimal("500.33"));
bean.setComments("Blah,blah");
bean.setPending(false);
bean.setQuantity(100);
// writes a Fixed Width row with the values set in "bean". Notice that there's no annotated
// attribute for the "date" column, so it will just be null (an then converted to ?, as we have settings.setNullValue("?");)
writer.processRecord(bean);
// you can still write rows passing in its values directly.
writer.writeRow(BigDecimal.ONE, true, "1990-01-10", 3, null);
writer.close();
amount pending date quantity comments
? ? ? ? ?
500.33 no ? 100 blah,blah
1 true 1990-01-10 3 ?
TsvWriter writer = new TsvWriter(outputWriter, new TsvWriterSettings());
writer.writeHeaders("A", "B", "C", "D", "E");
// 写第一列值
writer.writeValue(10);
// 写第二列值
writer.writeValue(20);
// 写第四列值(索引号3表示第4列——标题为D)
writer.writeValue(3, 40);
// 覆盖第一列的值,标题为A
writer.writeValue("A", 100.0);
// 将所有值输出到缓冲,创建列
writer.writeValuesToRow();
输出:
A B C D E
100.0 20 40
// 行分隔符确保了像MacOS和Windows这样的系统 // 能够正确处理文件(MacOS使用’\r’;Windows使用uses ‘\r\n’)