我需要读取一个非常大的Excel文件,该文件既有日期编号,也有非日期编号。我发现的所有示例似乎都能做到这一点或那一点(将单元格标识为日期值或在恒定内存中读取文件)。
对于非常大的文件,唯一有效的解决方案是此处所述的StreamingReader方法(此处所述的其他示例要么不适用于我的文件格式,要么出现内存不足堆错误)。
用java读取大型Excel文件(500K行)
http://poi.apache.org/components/spreadsheet/how-to.html#event_api
我读取文件的步骤如下所示。整个示例使用test excel。xmls(一个小测试文件)在github中提供,如下所示:
https://github.com/greshje/example-poi-streaming
POM。XML:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<build.version>1.0.4-001</build.version>
</properties>
<modelVersion>4.0.0</modelVersion>
<groupId>com.greshje.examples</groupId>
<artifactId>poi-streaming-example</artifactId>
<version>1.0.4-SNAPSHOT</version>
<packaging>jar</packaging>
<!--
*
* dependencies
*
-->
<dependencies>
<!-- JUNIT https://mvnrepository.com/artifact/junit/junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<!-- JUNIT-TOOLBOX https://mvnrepository.com/artifact/com.googlecode.junit-toolbox/junit-toolbox -->
<dependency>
<groupId>com.googlecode.junit-toolbox</groupId>
<artifactId>junit-toolbox</artifactId>
<version>2.4</version>
<scope>test</scope>
</dependency>
<!-- SLF4J LOGBACK CLASSIC https://mvnrepository.com/artifact/ch.qos.logback/logback-classic -->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<!-- POI https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.1.2</version>
</dependency>
<!-- POI-OOXML https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.1.2</version>
</dependency>
<!-- XERCES https://mvnrepository.com/artifact/xerces/xerces -->
<dependency>
<groupId>xerces</groupId>
<artifactId>xerces</artifactId>
<version>2.4.0</version>
</dependency>
<!-- XERCES-IMPL https://mvnrepository.com/artifact/xerces/xercesImpl -->
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.12.0</version>
</dependency>
<!-- XLSX-STREAMER https://mvnrepository.com/artifact/com.monitorjbl/xlsx-streamer -->
<dependency>
<groupId>com.monitorjbl</groupId>
<artifactId>xlsx-streamer</artifactId>
<version>0.2.3</version>
</dependency>
</dependencies>
<!--
*
* build
*
-->
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.7.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.3.2</version>
<!--
<configuration>
<finalName></finalName>
</configuration>
-->
</plugin>
</plugins>
</build>
Java代码:
package com.greshje.example.poi.streaming;
import java.io.InputStream;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.monitorjbl.xlsx.StreamingReader;
public class PoiStreamingExample {
private static final Logger log = LoggerFactory.getLogger(PoiStreamingExample.class);
private static final String FILE_NAME = "/com/greshje/example/poi/streaming/test-file.xlsx";
public static void main(String[] args) {
log.info("Starting test...");
log.info("Getting file");
InputStream in = PoiStreamingExample.class.getResourceAsStream(FILE_NAME);
log.info("Got file");
StreamingReader reader = getReader(in, 0);
log.info("File contents:");
for (Row row : reader) {
String rowString = "";
for (Cell cell : row) {
if (rowString != "") {
rowString += ",";
}
// NEED A WAY TO GET A DATE WHERE APPROPRIATE HERE
rowString += cell.getStringCellValue();
}
log.info(rowString);
}
log.info("Done.");
}
public static StreamingReader getReader(InputStream in, int sheetIndex) {
try {
StreamingReader reader = StreamingReader.builder()
.rowCacheSize(100) // number of rows to keep in memory (defaults to 10)
.bufferSize(4096) // buffer size to use when reading InputStream to file (defaults to 1024)
.sheetIndex(sheetIndex) // index of sheet to use
.read(in); // read the file
return reader;
} catch (Exception exp) {
throw new RuntimeException(exp);
}
}
}
我的测试数据如下所示:
输出如下所示(日期值和非日期值表示为数字)。
2020-09-06 10:47:13,814 10:47:13.814 [main] INFO (PoiStreamingExample.java:19) - Starting test...
2020-09-06 10:47:13,822 10:47:13.822 [main] INFO (PoiStreamingExample.java:20) - Getting file
2020-09-06 10:47:13,823 10:47:13.823 [main] INFO (PoiStreamingExample.java:22) - Got file
2020-09-06 10:47:15,117 10:47:15.117 [main] INFO (PoiStreamingExample.java:24) - File contents:
2020-09-06 10:47:15,149 10:47:15.149 [main] INFO (PoiStreamingExample.java:33) - Number,Date (mostly),Date (mostly)
2020-09-06 10:47:15,150 10:47:15.150 [main] INFO (PoiStreamingExample.java:33) - 123456,43550
2020-09-06 10:47:15,150 10:47:15.150 [main] INFO (PoiStreamingExample.java:33) - 123456,43685,44019
2020-09-06 10:47:15,150 10:47:15.150 [main] INFO (PoiStreamingExample.java:33) - 123456,43522,43535
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 123456,43503,43538
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 123456,43535,43564
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 123456,43536,43574
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 7890123,43553,43700
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 7890123,44041
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 7890123,43521,43550
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 7890123,43558,43580
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 7890123,43567,43599
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 7890123,43633,43641
2020-09-06 10:47:15,151 10:47:15.151 [main] INFO (PoiStreamingExample.java:33) - 7890123,43573,43615
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:33) - 7890123,43577,43606
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:33) - 7890123,43719,43754
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:33) - 7890123,43634,43641
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:33) - 123,43550
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:33) - smith,43550
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:33) - jones,43550
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:33) - 43550,43550
2020-09-06 10:47:15,152 10:47:15.152 [main] INFO (PoiStreamingExample.java:35) - Done.
---编辑------------------------
我更新了xls流版本,但仍然没有更新到新版本的问题解决了这个问题!!!
<!-- XLSX-STREAMER https://mvnrepository.com/artifact/com.monitorjbl/xlsx-streamer -->
<dependency>
<groupId>com.monitorjbl</groupId>
<artifactId>xlsx-streamer</artifactId>
<version>2.1.0</version>
</dependency>
这是旧版本:似乎没有任何信息可以确定细胞类型:
此外,获取单元格类型的代码似乎不受支持(在旧版本中)
新版本有更多的单元格信息,并与日期和数字一起工作,给出了公认答案中显示的结果。
使用最新版本的Excel Streaming Reader,即2.1.0,此问题已解决。
使用测试文件。xlsx和以下代码:
import java.io.InputStream;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import com.monitorjbl.xlsx.StreamingReader;
public class PoiStreamingExample {
private static final String FILE_NAME = "./test-file.xlsx";
public static void main(String[] args) {
try (
InputStream is = PoiStreamingExample.class.getResourceAsStream(FILE_NAME);
Workbook workbook = StreamingReader.builder()
.rowCacheSize(100)
.bufferSize(4096)
.open(is)) {
Sheet sheet = workbook.getSheetAt(0);
for (Row r : sheet) {
String rowString = "";
for (Cell c : r) {
if (rowString != "") {
rowString += ",";
}
rowString += c.getStringCellValue();
}
System.out.println(rowString);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
它打印:
Number,Date (mostly),Date (mostly)
123456,3/26/19
123456,8/8/19,7/7/20
123456,2/26/19,3/11/19
123456,2/7/19,3/14/19
123456,3/11/19,4/9/19
123456,3/12/19,4/19/19
7890123,3/29/19,8/23/19
7890123,7/29/20
7890123,2/25/19,3/26/19
7890123,4/3/19,4/25/19
7890123,4/12/19,5/14/19
7890123,6/17/19,6/25/19
7890123,4/18/19,5/30/19
7890123,4/22/19,5/21/19
7890123,9/11/19,10/16/19
7890123,6/18/19,6/25/19
123,43550
smith,43550
jones,43550
43550,43550
我有一个Java代码显示的内容的excel表。现在,当我处理数值/字符串/布尔数据时,它总是给出正确的结果。不同的是,为了测试我的代码,我在Excel中做了很少的改动,并添加了一个日期值(例如4-SEP-09)。在运行我的代码时,它返回了不想要的结果。代码和O/P如下所示。如何从excel表中读取和显示日期值?我在用阿帕奇POI。我可以执行类似的操作,检查单元格值是否为date,或者是否使用?这是
这段代码该怎么优化? 这个表格可以看成两部分,左侧部分表头为非日期,右部分表头为日期。 右部分日期可能会有很多,我让左侧固定,右侧的部分可以滚动,但是我下面实现代码的if-else部分有很多冗余的部分。 我觉得这个fixed属性接口的设计可能不太好吧
本文向大家介绍python转化excel数字日期为标准日期操作,包括了python转化excel数字日期为标准日期操作的使用技巧和注意事项,需要的朋友参考一下 伙伴遇到一个关于excel导入数据到python中,日期变成数字而不是日期格式的问题。第一反应这个数字应该是excel里面的时间戳类似的,所以我就实验增加一天是不是对应的数字就加1。最后证明了我的想法,这样就可以倒推excel里面的数字日期
问题内容: 如何将字符串日期格式转换为日期,我的日期字符串格式为 接下来,我没有运气尝试。 以上所有语句都给出了解析错误。 问题答案: 请阅读的文档time.Parse: 该布局通过显示参考时间(定义为 2006年1月2日星期一15:04:05-0700 如果它是值,将被解释;它用作输入格式的示例。然后将对输入字符串进行相同的解释。 所以正确的格式是
一个H2数据库和表。我尝试从CSV文件中插入记录: 错误是 无法解析“日期”常量“29/06/2018”;SQL语句: CSV数据类似于: < code >日期,城市,天数,总计2019年6月30日,洛杉矶,245.2,34708.2 2019年4月12日,旧金山,234.1,45708.4 2019年5月2日,765.4,56087.2 表架构类似于: 我如何将 CONVERT 语句包含为 yy
问题内容: 我正在使用Apache POI库读取Excel文件。我在读取密码单元时被卡住了。如果用户在密码单元格中输入日期作为密码,即16/05/2012。我正在将此值读取为“ 41045”,而该值应为“ 16/05/2012”。这是我的代码: 有人可以帮忙吗? 谢谢。 问题答案: 您在POI中寻找的类是DataFormatter 当Excel写入文件时,某些单元格将存储为文字字符串,而另一些单元