问题：

位运算性能，如何提高

施景同

2023-03-14

我有一个简单的任务：确定需要多少字节来将某个数字（字节数组长度）编码到字节数组并编码最终值（实现本文：编码长度和值字节）。

最初我写了一个快速完成任务的方法：

public static Byte[] Encode(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    List<Byte> computedRawData = new List<Byte> { enclosingtag };
    // if array size is less than 128, encode length directly. No questions here
    if (rawData.Length < 128) {
        computedRawData.Add((Byte)rawData.Length);
    } else {
        // convert array size to a hex string
        String hexLength = rawData.Length.ToString("x2");
        // if hex string has odd length, align it to even by prepending hex string
        // with '0' character
        if (hexLength.Length % 2 == 1) { hexLength = "0" + hexLength; }
        // take a pair of hex characters and convert each octet to a byte
        Byte[] lengthBytes = Enumerable.Range(0, hexLength.Length)
                .Where(x => x % 2 == 0)
                .Select(x => Convert.ToByte(hexLength.Substring(x, 2), 16))
                .ToArray();
        // insert padding byte, set bit 7 to 1 and add byte count required
        // to encode length bytes
        Byte paddingByte = (Byte)(128 + lengthBytes.Length);
        computedRawData.Add(paddingByte);
        computedRawData.AddRange(lengthBytes);
    }
    computedRawData.AddRange(rawData);
    return computedRawData.ToArray();
}

这是一段旧代码，编写方式很糟糕。

现在我正在尝试使用按位运算符或BitConverter类来优化代码。这是按位版本的示例：

public static Byte[] Encode2(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    List<Byte> computedRawData = new List<Byte>(rawData);
    if (rawData.Length < 128) {
        computedRawData.Insert(0, (Byte)rawData.Length);
    } else {
        // temp number
        Int32 num = rawData.Length;
        // track byte count, this will be necessary further
        Int32 counter = 1;
        // simply make bitwise AND to extract byte value
        // and shift right while remaining value is still more than 255
        // (there are more than 8 bits)
        while (num >= 256) {
            counter++;
            computedRawData.Insert(0, (Byte)(num & 255));
            num = num >> 8;
        }
        // compose final array
        computedRawData.InsertRange(0, new[] { (Byte)(128 + counter), (Byte)num });
    }
    computedRawData.Insert(0, enclosingtag);
    return computedRawData.ToArray();
}

以及BitConverter类的最终实现：

public static Byte[] Encode3(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    List<Byte> computedRawData = new List<Byte>(rawData);
    if (rawData.Length < 128) {
        computedRawData.Insert(0, (Byte)rawData.Length);
    } else {
        // convert integer to a byte array
        Byte[] bytes = BitConverter.GetBytes(rawData.Length);
        // start from the end of a byte array to skip unnecessary zero bytes
        for (int i = bytes.Length - 1; i >= 0; i--) {
            // once the byte value is non-zero, take everything starting
            // from the current position up to array start.
            if (bytes[i] > 0) {
                // we need to reverse the array to get the proper byte order
                computedRawData.InsertRange(0, bytes.Take(i + 1).Reverse());
                // compose final array
                computedRawData.Insert(0, (Byte)(128 + i + 1));
                computedRawData.Insert(0, enclosingtag);
                return computedRawData.ToArray();
            }
        }
    }
    return null;
}

所有方法都按预期工作。我使用秒表类页面中的一个示例来衡量性能。性能测试让我惊讶。我的测试方法执行了1000次该方法的运行，以编码一个包含100000个元素的字节数组（实际上，只有数组六个元素），平均时间为：

编码——大约200ms
Encode2--大约270ms
Encode3--大约320ms

我个人喜欢方法Encode2，因为代码看起来更具可读性，但性能没那么好。

问题是：您会建议如何提高方法性能或提高编码的可读性？

任何帮助都将不胜感激。

===========================

更新：感谢所有参与此线程的人。我考虑了所有建议，最终得到了这个解决方案：

public static Byte[] Encode6(Byte[] rawData, Byte enclosingtag) {
    if (rawData == null) {
        return new Byte[] { enclosingtag, 0 };
    }
    Byte[] retValue;
    if (rawData.Length < 128) {
        retValue = new Byte[rawData.Length + 2];
        retValue[0] = enclosingtag;
        retValue[1] = (Byte)rawData.Length;
    } else {
        Byte[] lenBytes = new Byte[3];
        Int32 num = rawData.Length;
        Int32 counter = 0;
        while (num >= 256) {
            lenBytes[counter] = (Byte)(num & 255);
            num >>= 8;
            counter++;
        }
        // 3 is: len byte and enclosing tag
        retValue = new byte[rawData.Length + 3 + counter];
        rawData.CopyTo(retValue, 3 + counter);
        retValue[0] = enclosingtag;
        retValue[1] = (Byte)(129 + counter);
        retValue[2] = (Byte)num;
        Int32 n = 3;
        for (Int32 i = counter - 1; i >= 0; i--) {
            retValue[n] = lenBytes[i];
            n++;
        }
    }
    return retValue;
}

最终我从列表转向了固定大小的字节数组。现在针对同一数据集的平均时间约为65ms。似乎列表（而不是按位操作）在性能上给我带来了很大的损失。

共有2个答案

施驰

2023-03-14

使用“reverse，append，reverse”而不是“insert at front”，并预分配所有内容，可能是这样的：（未测试）

public static byte[] Encode4(byte[] rawData, byte enclosingtag) {
    if (rawData == null) {
        return new byte[] { enclosingtag, 0 };
    }
    List<byte> computedRawData = new List<byte>(rawData.Length + 6);
    computedRawData.AddRange(rawData);
    if (rawData.Length < 128) {
        computedRawData.InsertRange(0, new byte[] { enclosingtag, (byte)rawData.Length });
    } else {
        computedRawData.Reverse();
        // temp number
        int num = rawData.Length;
        // track byte count, this will be necessary further
        int counter = 1;
        // simply cast to byte to extract byte value
        // and shift right while remaining value is still more than 255
        // (there are more than 8 bits)
        while (num >= 256) {
            counter++;
            computedRawData.Add((byte)num);
            num >>= 8;
        }
        // compose final array
        computedRawData.Add((byte)num);
        computedRawData.Add((byte)(counter + 128));
        computedRawData.Add(enclosingtag);
        computedRawData.Reverse();
    }
    return computedRawData.ToArray();
}

我不确定它是否会更快，但这是有意义的——现在，大多数情况下都避免了昂贵的“在前面插入”操作，除非只有一个操作（可能不足以平衡两个反转）。

另一个想法是以另一种方式将前面的插入限制为一次：收集所有必须插入的东西，然后执行一次。可能看起来像这样：（未测试）

public static byte[] Encode5(byte[] rawData, byte enclosingtag) {
    if (rawData == null) {
        return new byte[] { enclosingtag, 0 };
    }
    List<byte> computedRawData = new List<byte>(rawData);
    if (rawData.Length < 128) {
        computedRawData.InsertRange(0, new byte[] { enclosingtag, (byte)rawData.Length });
    } else {
        // list of all things that will be inserted
        List<byte> front = new List<byte>(8);
        // temp number
        int num = rawData.Length;
        // track byte count, this will be necessary further
        int counter = 1;
        // simply cast to byte to extract byte value
        // and shift right while remaining value is still more than 255
        // (there are more than 8 bits)
        while (num >= 256) {
            counter++;
            front.Insert(0, (byte)num);  // inserting in tiny list, not so bad
            num >>= 8;
        }
        // compose final array
        front.InsertRange(0, new[] { (byte)(128 + counter), (byte)num });
        front.Insert(0, enclosingtag);
        computedRawData.InsertRange(0, front);
    }
    return computedRawData.ToArray();
}

如果这还不够好或没有帮助（或者如果情况更糟——嘿，可能是这样），我会尝试提出更多的想法。

孟俊晖

2023-03-14

这里的主要问题几乎肯定是List的分配，以及插入新元素时所需的分配，以及列表最终转换为数组时所需的分配。这段代码可能大部分时间都花在垃圾收集器和内存分配器上。相比之下，使用与不使用按位运算符可能意义不大，我会先研究减少分配内存量的方法。

一种方法是发送对预先分配的字节数组的引用和对您在数组中的位置的索引，而不是分配和返回数据，然后返回一个整数，告诉您已经写入了多少字节。处理大型数组通常比处理许多小型对象更有效。正如其他人提到的，使用分析器，看看您的代码在哪里花费时间。

当然，我提到的优化将使您的代码在本质上更低级，更接近您通常在C中所做的工作，但在可读性和性能之间往往存在权衡。

类似资料：

性能32位与64位算术

问题内容：本机整数算术指令是否比其计数器部件慢（在装有OS的计算机上）？编辑：在当前CPU上，例如Intel Core2 Duo，i5 / i7等。问题答案：这取决于确切的CPU和操作。例如，在64位Pentium IV上，64位寄存器的乘法要慢得多。Core 2和更高版本的CPU从一开始就设计用于64位操作。通常，即使是为64位平台编写的代码也使用32位变量，其中的值将适合它们。这主要
如何提高图中心度计算的并行性能？

我得到的性能下降后，并行化的代码，计算图中心。图比较大，100K顶点。单线程应用程序大约需要7分钟。根据julialang站点（http://julia.readthedocs.org/en/latest/manual/parallel-computing/#man-parallel-computing）的建议，我修改了代码并使用pmap api来并行计算。我开始计算8个过程（julia-p 8c
如何提高子查询性能？

问题内容：我有2张桌子，和。用户可以有很多游戏。我需要所有有人数的人，以及他们的人数（有专栏的）。附言：我需要将所有数据加载到管理表中。由于游戏太多。我决定对数据进行分页和限制。但是，甚至限制以下查询也需要花费相同的时间。如何更好地查询？问题答案：您可以在下面尝试使用表达式
如何提高cassandra的写性能？

我有一个名为Emails的列族，我正在将邮件保存到这个CF中，编写5000封邮件需要100秒。我使用的是i3处理器，8gb内存。我的数据中心有6个节点，复制因子=2。我们存储在卡桑德拉中的数据大小会影响性能吗？影响写入性能的所有因素是什么，如何提高性能？预先感谢..
如何提高数据库的性能？

问题内容：我在公司中多次设计数据库。为了提高数据库的性能，我只寻找标准化和索引。如果要求您提高数据库的性能，该数据库包含大约250个表以及一些具有数百万个记录的表，那么您将寻找什么不同的东西？提前致谢。问题答案：优化逻辑设计逻辑级别是关于查询和表本身的结构。首先尝试最大程度地发挥这一作用。目标是在逻辑级别上访问尽可能少的数据。拥有最高效的SQL查询设计支持应用程序需求的逻辑架构（例
NumPy位运算

主要内容：bitwise_and(),bitwise_or(),Invert(),left_shift(),right_shift()本节重点讲解 NumPy 的位运算，NumPy 中提供了以下按位运算函数： numpy按位运算函数序号函数位运算符描述说明 1 bitwise_and & 计算数组元素之间的按位与运算。 2 bitwise_or | 计算数组元素之间的按位或运算。 3 invert ~ 计算数组元素之间的按位取反运算。 4 left_shift << 将二进制数的位数向左

位运算性能，如何提高

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档