比较直接和非直接ByteBuffer的获取/输入操作

殷宾白

2023-03-14

问题内容：

从非直接字节缓冲区获取/输入比从直接字节缓冲区获取/输入更快吗？

如果我必须从直接字节缓冲区读取/写入，最好先读取/写入线程本地字节数组，然后再用字节数组完全更新（用于写入）直接字节缓冲区吗？

问题答案：

从非直接字节缓冲区获取/输入比从直接字节缓冲区获取/输入更快吗？

如果将堆缓冲区与不使用本机字节顺序的直接缓冲区进行比较（大多数系统为低字节序，直接字节缓冲区的默认值为大字节序），则性能非常相似。

如果使用本机有序字节缓冲区，则对于多字节值，性能可能会明显更好。因为byte无论您做什么，它都没什么区别。

在HotSpot /
OpenJDK中，ByteBuffer使用Unsafe类，并且许多native方法都被视为内在函数。这是依赖于JVM的，并且AFAIK
Android VM将其视为最新版本中的固有特性。

如果转储生成的程序集，则可以在一条机器代码指令中看到“不安全”中的内在函数。即，它们没有JNI调用的开销。

实际上，如果您要进行微调，则可能会发现ByteBuffer getXxxx或setXxxx的大部分时间都用于边界检查，而不是实际的内存访问。因此，我仍然
必须在必须达到 最高性能的情况下直接使用Unsafe （注意：Oracle不鼓励这样做）

如果我必须从直接字节缓冲区读取/写入，最好先读取/写入线程本地字节数组，然后再用字节数组完全更新（用于写入）直接字节缓冲区吗？

我不愿看到比这更好的东西。;）听起来很复杂。

通常，最简单的解决方案会更好，更快。

您可以使用此代码自己对此进行测试。

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    bb1.clear();
    bb2.clear();
    long start = System.nanoTime();
    int count = 0;
    while (bb2.remaining() > 0)
        bb2.putInt(bb1.getInt());
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

版画

Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns

我很确定JNI调用花费的时间超过1.2 ns。

为了证明它不是“ JNI”调用，而是引起延迟的周围信号。您可以直接使用Unsafe编写相同的循环。

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    Unsafe unsafe = getTheUnsafe();
    long start = System.nanoTime();
    long addr1 = ((DirectBuffer) bb1).address();
    long addr2 = ((DirectBuffer) bb2).address();
    for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
        unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

public static Unsafe getTheUnsafe() {
    try {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        return (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
        throw new AssertionError(e);
    }
}

版画

Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns

因此，您可以看到该native调用比JNI调用的预期要快得多。此延迟的主要原因可能是二级缓存速度。;）

全部在i3 3.3 GHz上运行

比较直接和非直接ByteBuffer的获取/输入操作

相关阅读

相关文章

相关问答

相关工具

相关文档