使用ZipFileSystem压缩大型文件夹会导致OutOfMemoryError

桓智敏

2023-03-14

问题内容：

该java.nio软件包通过将zip文件视为文件系统，具有处理zip文件的优美方法。这使我们能够像对待普通文件一样对待zip文件内容。因此，仅通过Files.copy将所有文件复制到zip文件中即可压缩整个文件夹。由于也要复制子文件夹，因此我们需要一个访问者：

 private static class CopyFileVisitor extends SimpleFileVisitor<Path> {
    private final Path targetPath;
    private Path sourcePath = null;
    public CopyFileVisitor(Path targetPath) {
        this.targetPath = targetPath;
    }

    @Override
    public FileVisitResult preVisitDirectory(final Path dir,
    final BasicFileAttributes attrs) throws IOException {
        if (sourcePath == null) {
            sourcePath = dir;
        } else {
        Files.createDirectories(targetPath.resolve(sourcePath
                    .relativize(dir).toString()));
        }
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(final Path file,
    final BasicFileAttributes attrs) throws IOException {
    Files.copy(file,
        targetPath.resolve(sourcePath.relativize(file).toString()), StandardCopyOption.REPLACE_EXISTING);
    return FileVisitResult.CONTINUE;
    }
}

这是一个简单的“递归复制目录”访问者。它用于递归复制目录。但是，使用ZipFileSystem，我们还可以使用它将目录复制到zip文件中，如下所示：

public static void zipFolder(Path zipFile, Path sourceDir) throws ZipException, IOException
{
    // Initialize the Zip Filesystem and get its root
    Map<String, String> env = new HashMap<>();
    env.put("create", "true");
    URI uri = URI.create("jar:" + zipFile.toUri());       
    FileSystem fileSystem = FileSystems.newFileSystem(uri, env);
    Iterable<Path> roots = fileSystem.getRootDirectories();
    Path root = roots.iterator().next();

    // Simply copy the directory into the root of the zip file system
    Files.walkFileTree(sourceDir, new CopyFileVisitor(root));
}

这就是我称为压缩整个文件夹的一种优雅方式。但是，在巨大的文件夹（大约3
GB）上使用此方法时，会收到OutOfMemoryError（堆空间）。当使用常规的zip处理库时，不会引发此错误。因此，似乎ZipFileSystem处理副本的方式效率很低：太多要写入的文件保留在内存中，因此OutOfMemoryError发生了。

为什么会这样呢？是ZipFileSystem通常认为使用效率低下（就内存消耗而言），还是我在这里做错了什么？

问题答案：

我查看了ZipFileSystem.java，我相信找到了内存消耗的来源。默认情况下，该实现将ByteArrayOutputStream用作压缩文件的缓冲区，这意味着它受分配给JVM的内存量的限制。

我们可以使用一个（未记录的）环境变量来使实现使用临时文件（"useTempFile"）。它是这样的：

Map<String, Object> env = new HashMap<>();
env.put("create", "true");
env.put("useTempFile", Boolean.TRUE);

此处有更多详细信息：http
:
//www.docjar.com/html/api/com/sun/nio/zipfs/ZipFileSystem.java.html，有趣的行是96、1358和1362。

使用ZipFileSystem压缩大型文件夹会导致OutOfMemoryError

相关阅读

相关文章

相关问答

相关工具

相关文档