问题：

为什么优化的素因数计数算法运行速度较慢

洪和平

2023-03-14

HiI在网上看到了一个计算一个数的不同素因子的答案，它看起来不是最优的。所以我试图改进它，但在一个简单的基准测试中，我的变体比原始版本慢得多。

该算法计算一个数的不同质因数。最初使用一个HashSet来收集因子，然后使用size来获得它们的数目。我的“改进”版本使用了一个int计数器，并将while循环分解为if/while以避免不必要的调用。

更新：tl/dr（有关详细信息，请参阅接受的答案）

原始代码有一个名为Math的性能错误。编译器不必要地修复了sqrt：

int n = ...;
// sqrt does not need to be recomputed if n does not change
for (int i = 3; i <= Math.sqrt(n); i += 2) {
    while (n % i == 0) {
        n /= i;
    }
}

编译器优化了 sqrt 调用，使其仅在 n 个更改时发生。但是通过使循环内容稍微复杂一些（虽然没有功能变化），编译器停止了这种方式的优化，并且每次迭代都会调用sqrt。

原始问题

public class PrimeFactors {

    // fast version, takes 10s for input 8
    static int countPrimeFactorsSet(int n) {
        Set<Integer> primeFactorSet = new HashSet<>();
        while (n % 2 == 0) {
            primeFactorSet.add(2);
            n /= 2;
        }
        for (int i = 3; i <= Math.sqrt(n); i += 2) {
            while (n % i == 0) {
                primeFactorSet.add(i);
                n /= i;
            }
        }
        if (n > 2) {
            primeFactorSet.add(n);
        }
        return primeFactorSet.size();
    }

    // slow version, takes 19s for input 8
    static int countPrimeFactorsCounter(int n) {
        int count = 0; // using simple int
        if (n % 2 == 0) {
            count ++; // only add on first division
            n /= 2;
            while (n % 2 == 0) {
                n /= 2;
            }
        }
        for (int i = 3; i <= Math.sqrt(n); i += 2) {
            if (n % i == 0) {
                count++; // only add on first division
                n /= i;
                while (n % i == 0) {
                    n /= i;
                }
            }
        }
        if (n > 2) {
            count++;
        }
        return count;
    }

    static int findNumberWithNPrimeFactors(final int n) {
        for (int i = 3; ; i++) {
            // switch implementations
            if (countPrimeFactorsCounter(i) == n) {
            // if (countPrimeFactorsSet(i) == n) {
                return i;
            }
        }
    }

    public static void main(String[] args) {
        findNumberWithNPrimeFactors(8); // benchmark warmup
        findNumberWithNPrimeFactors(8);
        long start = System.currentTimeMillis();
        int result = findNumberWithNPrimeFactors(n);
        long duration = System.currentTimeMillis() - start;

        System.out.println("took ms " + duration + " to find " + result);
    }
}

原始版本的输出始终在 10 秒左右（在 java8 上），而“优化”版本的输出接近 20 秒（两者都打印相同的结果）。实际上，仅将单个 while 循环更改为具有包含 while 循环的 if 块已经将原始方法的速度减慢了一半。

使用-Xint以解释模式运行JVM，优化版本的运行速度提高了3倍。使用-Xcomp可以使两个实现以相似的速度运行。因此，与使用简单的int计数器的版本相比，JIT似乎可以使用单个while循环和HashSet来优化版本。

一个合适的微基准测试（我如何在Java中编写正确的微基准测试？）会告诉我别的东西吗？是否有我忽略的性能优化原则（例如 Java 性能提示）？

共有3个答案

微生宝

2023-03-14

如果这里是：＜code＞for（int i=3；i

应该被排除在外，

其次，您可以用不同的方法来执行此代码，例如:

而（n%2==0）{当前；n/=2；}

你可以用： if（n % 2 ==0） { current ; n=n%2; }

本质上，您应该避免循环中的条件或指令，因为您的方法：

（查找带NPrimeFactors的数字）

你的算法的复杂度是每个循环的复杂度（findNumber的NPrimeFacors）X（迭代数）

如果你在循环中添加一个测试或一个影响，你会得到一个 1 （复杂性（findNumberWithNPrimeFactors） X （迭代编号））

黄高爽

2023-03-14

首先，测试中有两组操作：测试因素，并记录这些因素。当切换实现时，使用Set与使用ArrayList（在我的重写中，如下所示）相比，简单地计算因子将产生不同。

其次，我看到时间上有很大的变化。这是从Eclipse运行的。我不清楚是什么导致了巨大的变化。

我的“经验教训”是要注意它被测量的具体内容。目的是测量因子分解算法本身（while循环的成本加上算术运算）吗？是否应包括记录因素的时间？

一个次要的技术点：在这个实现中，强烈感觉到缺少lisp中可用的多重值设置。人们更愿意将余数和整数除法作为一个单独的操作来执行，而不是将它们写成两个不同的步骤。从语言和算法研究的角度来看，这是值得研究的。

以下是因式分解实现的三种变体的计时结果。第一个来自初始（未优化）实现，但更改为使用简单的 List，而不是更难计时的 Set 来存储因子。第二个是您的优化，但仍使用列表进行跟踪。第三个是你的优化，但包括对因素的改变。

  18 -  3790 1450 2410 (average of 10 iterations)
  64 -  1630 1220  260 (average of 10 iterations)
1091 - 16170 2850 1180 (average of 10 iterations)
1092 -  2720 1370  380 (average of 10 iterations)

4096210 - 28830 5430 9120 (average of  10 iterations, trial 1)
4096210 - 18380 6190 5920 (average of  10 iterations, trial 2)
4096210 - 10072 5816 4836 (average of 100 iterations, trial 1)
4096210 -  7202 5036 3682 (average of 100 iterations, trial 1)

---

Test value [ 18 ]
Warm-up count [ 2 ]
Test count [ 10 ]
Times [non-optimized]
Start [ 1621713914872600 (ns) ]
End   [ 1621713914910500 (ns) ]
Delta [ 37900 (ns) ]
Avg   [ 3790 (ns) ]
Factors: [2, 3, 3]
Times [optimized]
Start [ 1621713915343500 (ns) ]
End   [ 1621713915358000 (ns) ]
Delta [ 14500 (ns) ]
Avg   [ 1450 (ns) ]
Factors: [2, 3, 3]
Times [counting]
Start [ 1621713915550400 (ns) ]
End   [ 1621713915574500 (ns) ]
Delta [ 24100 (ns) ]
Avg   [ 2410 (ns) ]
Factors: 3
---
Test value [ 64 ]
Warm-up count [ 2 ]
Test count [ 10 ]
Times [non-optimized]
Start [ 1621747046013900 (ns) ]
End   [ 1621747046030200 (ns) ]
Delta [ 16300 (ns) ]
Avg   [ 1630 (ns) ]
Factors: [2, 2, 2, 2, 2, 2]
Times [optimized]
Start [ 1621747046337800 (ns) ]
End   [ 1621747046350000 (ns) ]
Delta [ 12200 (ns) ]
Avg   [ 1220 (ns) ]
Factors: [2, 2, 2, 2, 2, 2]
Times [counting]
Start [ 1621747046507900 (ns) ]
End   [ 1621747046510500 (ns) ]
Delta [ 2600 (ns) ]
Avg   [ 260 (ns) ]
Factors: 6
---
Test value [ 1091 ]
Warm-up count [ 2 ]
Test count [ 10 ]
Times [non-optimized]
Start [ 1621687024226500 (ns) ]
End   [ 1621687024388200 (ns) ]
Delta [ 161700 (ns) ]
Avg   [ 16170 (ns) ]
Factors: [1091]
Times [optimized]
Start [ 1621687024773200 (ns) ]
End   [ 1621687024801700 (ns) ]
Delta [ 28500 (ns) ]
Avg   [ 2850 (ns) ]
Factors: [1091]
Times [counting]
Start [ 1621687024954900 (ns) ]
End   [ 1621687024966700 (ns) ]
Delta [ 11800 (ns) ]
Avg   [ 1180 (ns) ]
Factors: 1
---
Test value [ 1092 ]
Warm-up count [ 2 ]
Test count [ 10 ]
Times [non-optimized]
Start [ 1621619636267500 (ns) ]
End   [ 1621619636294700 (ns) ]
Delta [ 27200 (ns) ]
Avg   [ 2720 (ns) ]
Factors: [2, 2, 3, 7, 13]
Times [optimized]
Start [ 1621619636657100 (ns) ]
End   [ 1621619636670800 (ns) ]
Delta [ 13700 (ns) ]
Avg   [ 1370 (ns) ]
Factors: [2, 2, 3, 7, 13]
Times [counting]
Start [ 1621619636895300 (ns) ]
End   [ 1621619636899100 (ns) ]
Delta [ 3800 (ns) ]
Avg   [ 380 (ns) ]
Factors: 5
---
Test value [ 4096210 ]
Warm-up count [ 2 ]
Test count [ 10 ]
Times [non-optimized]
Start [ 1621652753519800 (ns) ]
End   [ 1621652753808100 (ns) ]
Delta [ 288300 (ns) ]
Avg   [ 28830 (ns) ]
Factors: [2, 5, 19, 21559]
Times [optimized]
Start [ 1621652754116300 (ns) ]
End   [ 1621652754170600 (ns) ]
Delta [ 54300 (ns) ]
Avg   [ 5430 (ns) ]
Factors: [2, 5, 19, 21559]
Times [counting]
Start [ 1621652754323500 (ns) ]
End   [ 1621652754414700 (ns) ]
Delta [ 91200 (ns) ]
Avg   [ 9120 (ns) ]
Factors: 4

这是我对测试代码的重写。最感兴趣的是findFactors、findFacorsOpt和。

package my.tests;

import java.util.ArrayList;
import java.util.List;

public class PrimeFactorsTest {

    public static void main(String[] args) {
        if ( args.length < 2 ) {
            System.out.println("Usage: " + PrimeFactorsTest.class.getName() + " testValue warmupIterations testIterations");
            return;
        }

        int testValue = Integer.valueOf(args[0]);
        int warmCount = Integer.valueOf(args[1]);
        int testCount = Integer.valueOf(args[2]);

        if ( testValue <= 2 ) {
            System.out.println("Test value [ " + testValue + " ] must be at least 2.");
            return;
        } else {
            System.out.println("Test value [ " + testValue + " ]");
        }
        if ( warmCount <= 0 ) {
            System.out.println("Warm-up count [ " + testCount + " ] must be at least 1.");
        } else {
            System.out.println("Warm-up count [ " + warmCount + " ]");
        }
        if ( testCount <= 1 ) {
            System.out.println("Test count [ " + testCount + " ] must be at least 1.");
        } else {
            System.out.println("Test count [ " + testCount + " ]");
        }

        timedFactors(testValue, warmCount, testCount);
        timedFactorsOpt(testValue, warmCount, testCount);
        timedFactorsCount(testValue, warmCount, testCount);
    }

    public static void timedFactors(int testValue, int warmCount, int testCount) {
        List<Integer> factors = new ArrayList<Integer>();

        for ( int warmNo = 0; warmNo < warmCount; warmNo++ ) {
            factors.clear();
            findFactors(testValue, factors);
        }

        long startTime = System.nanoTime();
        for ( int testNo = 0; testNo < testCount; testNo++ ) {
            factors.clear();
            findFactors(testValue, factors);
        }
        long endTime = System.nanoTime();

        System.out.println("Times [non-optimized]");
        System.out.println("Start [ " + startTime + " (ns) ]");
        System.out.println("End   [ " + endTime + " (ns) ]");
        System.out.println("Delta [ " + (endTime - startTime) + " (ns) ]");
        System.out.println("Avg   [ " + (endTime - startTime) / testCount + " (ns) ]");
        System.out.println("Factors: " + factors);
    }

    public static void findFactors(int n, List<Integer> factors) {
        while ( n % 2 == 0 ) {
            n /= 2;
            factors.add( Integer.valueOf(2) );
        }

        for ( int factor = 3; factor <= Math.sqrt(n); factor += 2 ) {
            while ( n % factor == 0 ) {
                n /= factor;
                factors.add( Integer.valueOf(factor) );
            }
        }

        if ( n > 2 ) {
            factors.add( Integer.valueOf(n) );
        }
    }

    public static void timedFactorsOpt(int testValue, int warmCount, int testCount) {
        List<Integer> factors = new ArrayList<Integer>();
        for ( int warmNo = 0; warmNo < warmCount; warmNo++ ) {
            factors.clear();
            findFactorsOpt(testValue, factors);
        }

        long startTime = System.nanoTime();
        for ( int testNo = 0; testNo < testCount; testNo++ ) {
            factors.clear();
            findFactorsOpt(testValue, factors);
        }
        long endTime = System.nanoTime();

        System.out.println("Times [optimized]");
        System.out.println("Start [ " + startTime + " (ns) ]");
        System.out.println("End   [ " + endTime + " (ns) ]");
        System.out.println("Delta [ " + (endTime - startTime) + " (ns) ]");
        System.out.println("Avg   [ " + (endTime - startTime) / testCount + " (ns) ]");
        System.out.println("Factors: " + factors);
    }

    public static void findFactorsOpt(int n, List<Integer> factors) {
        if ( n % 2 == 0 ) {
            n /= 2;

            Integer factor = Integer.valueOf(2); 
            factors.add(factor);

            while (n % 2 == 0) {
                n /= 2;

                factors.add(factor);
            }
        }

        for ( int factorValue = 3; factorValue <= Math.sqrt(n); factorValue += 2) {
            if ( n % factorValue == 0 ) {
                n /= factorValue;

                Integer factor = Integer.valueOf(factorValue); 
                factors.add(factor);

                while ( n % factorValue == 0 ) {
                    n /= factorValue;
                    factors.add(factor);
                }
            }
        }

        if (n > 2) {
            factors.add( Integer.valueOf(n) );
        }
    }

    public static void timedFactorsCount(int testValue, int warmCount, int testCount) {
        int numFactors = 0;

        for ( int warmNo = 0; warmNo < warmCount; warmNo++ ) {
            numFactors = findFactorsCount(testValue);
        }

        long startTime = System.nanoTime();
        for ( int testNo = 0; testNo < testCount; testNo++ ) {
            numFactors = findFactorsCount(testValue);
        }
        long endTime = System.nanoTime();

        System.out.println("Times [counting]");
        System.out.println("Start [ " + startTime + " (ns) ]");
        System.out.println("End   [ " + endTime + " (ns) ]");
        System.out.println("Delta [ " + (endTime - startTime) + " (ns) ]");
        System.out.println("Avg   [ " + (endTime - startTime) / testCount + " (ns) ]");
        System.out.println("Factors: " + numFactors);
    }

    public static int findFactorsCount(int n) {
        int numFactors = 0;

        if ( n % 2 == 0 ) {
            n /= 2;
            numFactors++;

            while (n % 2 == 0) {
                n /= 2;
                numFactors++;
            }
        }

        for ( int factorValue = 3; factorValue <= Math.sqrt(n); factorValue += 2) {
            if ( n % factorValue == 0 ) {
                n /= factorValue;
                numFactors++;

                while ( n % factorValue == 0 ) {
                    n /= factorValue;
                    numFactors++;
                }
            }
        }

        if (n > 2) {
            numFactors++;
        }

        return numFactors;
    }
}

刘高峯

2023-03-14

我将您的示例转换为JMH基准测试以进行公平的测量，事实上set变体的出现速度是计数器的两倍：

Benchmark              Mode  Cnt     Score    Error   Units
PrimeFactors.counter  thrpt    5   717,976 ±  7,232  ops/ms
PrimeFactors.set      thrpt    5  1410,705 ± 15,894  ops/ms

为了找出原因，我重新运行了带有内置-prof x的分析器的基准测试。碰巧计数器方法花了60%以上的时间执行vsqrtsd指令——显然，是Math.sqrt（n）的编译对应物。

  0,02%   │  │  │     │  0x0000000002ab8f3e: vsqrtsd %xmm0,%xmm0,%xmm0    <-- Math.sqrt
 61,27%   │  │  │     │  0x0000000002ab8f42: vcvtsi2sd %r10d,%xmm1,%xmm1

同时，集合方法最热门的指令是 idiv，这是 n % i 编译的结果。

             │  │ ││  0x0000000002ecb9e7: idiv   %ebp               ;*irem
 55,81%      │  ↘ ↘│  0x0000000002ecb9e9: test   %edx,%edx

< code>Math.sqrt是一个缓慢的操作，这并不奇怪。但是为什么在第一种情况下执行得更频繁呢？

线索是您在优化期间对代码的转换。您将一个简单的而循环包装成一个额外的if块。这使得控制流变得更加复杂，因此JIT未能将Math.sqrt计算提升到循环之外，并且必须在每次迭代时重新计算它。

为了恢复性能，我们需要帮助JIT编译器一点。让我们手动将< code>Math.sqrt计算提升出循环。

    static int countPrimeFactorsSet(int n) {
        Set<Integer> primeFactorSet = new HashSet<>();
        while (n % 2 == 0) {
            primeFactorSet.add(2);
            n /= 2;
        }
        double sn = Math.sqrt(n);  // compute Math.sqrt out of the loop
        for (int i = 3; i <= sn; i += 2) {
            while (n % i == 0) {
                primeFactorSet.add(i);
                n /= i;
            }
            sn = Math.sqrt(n);     // recompute after n changes
        }
        if (n > 2) {
            primeFactorSet.add(n);
        }
        return primeFactorSet.size();
    }

    static int countPrimeFactorsCounter(int n) {
        int count = 0; // using simple int
        if (n % 2 == 0) {
            count ++; // only add on first division
            n /= 2;
            while (n % 2 == 0) {
                n /= 2;
            }
        }
        double sn = Math.sqrt(n);  // compute Math.sqrt out of the loop
        for (int i = 3; i <= sn; i += 2) {
            if (n % i == 0) {
                count++; // only add on first division
                n /= i;
                while (n % i == 0) {
                    n /= i;
                }
                sn = Math.sqrt(n);     // recompute after n changes
            }
        }
        if (n > 2) {
            count++;
        }
        return count;
    }

现在< code>counter方法变得很快！甚至比< code>set还要快一点(这是意料之中的，因为它做的计算量是相同的，不包括set开销)。

Benchmark              Mode  Cnt     Score    Error   Units
PrimeFactors.counter  thrpt    5  1513,228 ± 13,046  ops/ms
PrimeFactors.set      thrpt    5  1411,573 ± 10,004  ops/ms

请注意，集合性能没有变化，因为 JIT 本身能够执行相同的优化，这要归功于更简单的控制流图。

结论：Java性能是一件非常复杂的事情，尤其是在谈论微优化时。JIT优化是脆弱的，如果没有像JMH和分析器这样的专门工具，很难理解JVM的思想。

类似资料：

优化素数筛（Java）的速度

我正在Java中研究一种创建布尔数组isPrime的方法：其中质数用“真”标记，其余的用“假”标记。< br >同时，我还想数一数找到的素数: 基本思想是使用埃拉托斯特尼筛。到目前为止，我的方法看起来像这样：所以我的问题是因为筛子多次将一些非质数的值设置为“false”（例如45 bcs 3*15=45和9*5=45），所以不起作用那么，有人知道我如何重写这个程序，以便只将所有非质数设置为
优化素数的计算[重复]

我正在尝试欧拉项目的问题3，我的算法太慢了。有人知道如何优化它吗？我试图计算的数字是600851475143L。计算这个需要很长时间，所以我需要一种方法来加快计算速度。逻辑： > 把从3到1的所有数字通读一遍对于这些数字中的每一个，通过将它们除以中间的所有数字来检查它们是否为素数，如果它们不除以任何一个，则它们为素数如果为素数，则将其添加到数组中。 **********更新*********
在Java7下，jmap的运行速度较慢，JVM的运行速度较慢

我们已经使用jmap在Java 6下运行的大型多服务器应用程序上测量堆大小大约2年了。我们每分钟测量一次。每次测量所用的时间（经过的时间）不到1秒。我们现在正在Java 7下测试同一个应用程序。现在突然之间，jmap通常需要10秒、20秒，有时甚至更长时间，而且它似乎慢了下来（甚至可能停止！）在那段时间里，JVM。我们在jmap输出中看到的唯一区别（Java6和Java7之间）是关于有多少字符
为什么Docker中的算术运算较慢？

我有一个简单的程序来测量浮点乘法（和随机生成，编译g-O0）。在主机（Ubuntu 16.04）上运行时，每10000000次乘法得到约1.6秒，在图像“Ubuntu”的容器中运行时（无需重新编译），得到约3.6秒。有人能解释为什么它慢了2.5倍吗？ p、我多次运行程序来消除异常值。我不需要优化它，只需要详细解释那里发生了什么。编译要在构建后使用的容器内运行：
什么是hashCode计算的明智素数？

问题内容： Eclipse 3.5具有一个非常好的功能，可以生成Java hashCode（）函数。例如，它将生成（略微缩短：）（如果类中具有更多属性，则为每个其他属性重复此操作。对于ints，可以省略.hashCode（）。）这似乎很好，但是对于首选的31。它可能取自JavaString的hashCode实现，出于性能原因而使用该特性，在引入硬件乘法器之后就已经不复存在了。在这里，对于i和j
为什么dict.get(key)运行速度比dict[key]慢

在运行一个数字积分器时，我注意到根据我在字典中提取字段值的方式，速度有明显的不同在我的系统上，我注意到以下差异（使用 iPython）相比于这似乎是一个很小的差异，但是对于某些阵列来说，这种差异甚至更大。是什么导致了这种行为，有什么方法可以改变我对< code>get()函数的使用吗？

为什么优化的素因数计数算法运行速度较慢

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档