在项目中使用到 poi 做 Excel 的导入导出,记录一下使用过程中遇到的问题。使用的版本为:3.16
没有考虑到大量数据的问题。只使用 XSSF 处理Excel,某次导出数据时,处理时间非常久,并且报错OutOfMemoryError。
使用XSSF允许访问文档中的所有行,所有数据在全部在内存中进行处理,如果数据量大,将会占用极高的内存。
SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. 1
同事尝试使用XSSF输出到Excel中当数据到达10W行时 ,内存占用达到了13G。随着样式的复杂度、列的增加,内存的占用可能会更高。
SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk. 1
主要是研究poi过程中遇到的一些需要注意的情况,在实际开发过程中可能不会出现。
SXSSFSheet在实现的createRow()方法,做了三个判断。
a. 超过最大Excel2007的最大行数:1048576(2^20);
b.小于写入临时文件行数;
c.小于sheet页中最后的行数,应该是对于读取文件而言,createRow()的数 值小于文件sheet页中已有的行数。createSheet()出来的新sheet页不会出现这种情况;
满足其中一个条件就会报出该异常。以下是部分源码:
int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();
if (rownum < 0 || rownum > maxrow) {
throw new IllegalArgumentException("Invalid row number (" + rownum +
") outside allowable range (0.." + maxrow + ")");
}
// attempt to overwrite a row that is already flushed to disk
if (rownum <= _writer.getLastFlushedRow()) {
throw new IllegalArgumentException(
"Attempting to write a row[" + rownum + "] " +
"in the range [0," + _writer.getLastFlushedRow() + "] that is already written to disk.");
}
// attempt to overwrite a existing row in the input template
if (_sh.getPhysicalNumberOfRows() > 0 && rownum <= _sh.getLastRowNum()) {
throw new IllegalArgumentException(
"Attempting to write a row[" + rownum + "] " +
"in the range [0," + _sh.getLastRowNum() + "] that is already written to disk.");
}
XSSF的createRow()方法,在对应行已经存在的情况下,会删除其内容,即使是读取的文件也是这样。这是和SXSSF有差异的地方之一。以下是对应的源码部分。
XSSFRow prev = _rows.get(rownumI);
if (prev != null) {
// the Cells in an existing row are invalidated on-purpose, in order to clean up correctly, we
// need to call the remove, so things like ArrayFormulas and CalculationChain updates are done
// correctly.
// We remove the cell this way as the internal cell-list is changed by the remove call and
// thus would cause ConcurrentModificationException otherwise
while (prev.getFirstCellNum() != -1) {
prev.removeCell(prev.getCell(prev.getFirstCellNum()));
}
ctRow = prev.getCTRow();
ctRow.set(CTRow.Factory.newInstance());
}
SXSSF解决超大文件的方案是:像滑动窗口一样,只操作部分范围的数据,划过的数据将被写入临时文件中。
默认临时文件为系统缓存的临时目录
Windows:C:\Users\用户~1\AppData\Local\Temp
Linux:/tmp
以下是DefaultTempFileCreationStrategy
中创建临时文件的部分源码
private void createPOIFilesDirectory() throws IOException {
// Identify and create our temp dir, if needed
// The directory is not deleted, even if it was created by this TempFileCreationStrategy
if (dir == null) {
String tmpDir = System.getProperty(JAVA_IO_TMPDIR);
if (tmpDir == null) {
throw new IOException("Systems temporary directory not defined - set the -D" + JAVA_IO_TMPDIR + " jvm property!");
}
dir = new File(tmpDir, POIFILES);
}
createTempDirectory(dir);
}
这部分转载自一份源码分析博客2
1.设置JVM系统变量 -Djava.io.tmpdir=xxx
此方法会改变JVM所有的临时文件目录。
2.实现TempFileCreationStrategy
的接口
3.DefaultTempFileCreationStrategy
的构造方法提供了 dir参数的构造:
public DefaultTempFileCreationStrategy(File dir) { this.dir = dir; }
重新构造DefaultTempFileCreationStrategy实例传值给org.apache.poi.util.TempFile类:
SXSSFWorkbook wb = new SXSSFWorkbook(100);
//更变临时文件目录
TempFile.setTempFileCreationStrategy(new DefaultTempFileCreationStrategy(new File("")));
Sheet sh = wb.createSheet();
...
...
注:3.15之前的版本可能要使用以下这种方法,3.15版本以后这个方法被废弃,推荐使用上面的第三种方法。
TempFile.setTempFileCreationStrategy(new TempFile.DefaultTempFileCreationStrategy(dir));
被TempFile
中废弃的源码
/**
* @deprecated POI 3.15 beta 3. Moved to {@link org.apache.poi.util.DefaultTempFileCreationStrategy}.
*/
public static class DefaultTempFileCreationStrategy extends org.apache.poi.util.DefaultTempFileCreationStrategy {}
以上,感谢您的阅读