问题：

为什么这个Haskell代码在-O下运行得更慢？

吴和硕

2023-03-14

这段Haskell代码在使用-O时运行速度要慢得多，但是-O应该不会有危险。谁能告诉我发生了什么事？如果重要的话，这是为了解决这个问题，它使用二进制搜索和持久段树：

import Control.Monad
import Data.Array

data Node =
      Leaf   Int           -- value
    | Branch Int Node Node -- sum, left child, right child
type NodeArray = Array Int Node

-- create an empty node with range [l, r)
create :: Int -> Int -> Node
create l r
    | l + 1 == r = Leaf 0
    | otherwise  = Branch 0 (create l m) (create m r)
    where m = (l + r) `div` 2

-- Get the sum in range [0, r). The range of the node is [nl, nr)
sumof :: Node -> Int -> Int -> Int -> Int
sumof (Leaf val) r nl nr
    | nr <= r   = val
    | otherwise = 0
sumof (Branch sum lc rc) r nl nr
    | nr <= r   = sum
    | r  > nl   = (sumof lc r nl m) + (sumof rc r m nr)
    | otherwise = 0
    where m = (nl + nr) `div` 2

-- Increase the value at x by 1. The range of the node is [nl, nr)
increase :: Node -> Int -> Int -> Int -> Node
increase (Leaf val) x nl nr = Leaf (val + 1)
increase (Branch sum lc rc) x nl nr
    | x < m     = Branch (sum + 1) (increase lc x nl m) rc
    | otherwise = Branch (sum + 1) lc (increase rc x m nr)
    where m = (nl + nr) `div` 2

-- signature said it all
tonodes :: Int -> [Int] -> [Node]
tonodes n = reverse . tonodes' . reverse
    where
        tonodes' :: [Int] -> [Node]
        tonodes' (h:t) = increase h' h 0 n : s' where s'@(h':_) = tonodes' t
        tonodes' _ = [create 0 n]

-- find the minimum m in [l, r] such that (predicate m) is True
binarysearch :: (Int -> Bool) -> Int -> Int -> Int
binarysearch predicate l r
    | l == r      = r
    | predicate m = binarysearch predicate l m
    | otherwise   = binarysearch predicate (m+1) r
    where m = (l + r) `div` 2

-- main, literally
main :: IO ()
main = do
    [n, m] <- fmap (map read . words) getLine
    nodes <- fmap (listArray (0, n) . tonodes n . map (subtract 1) . map read . words) getLine
    replicateM_ m $ query n nodes
    where
        query :: Int -> NodeArray -> IO ()
        query n nodes = do
            [p, k] <- fmap (map read . words) getLine
            print $ binarysearch (ok nodes n p k) 0 n
            where
                ok :: NodeArray -> Int -> Int -> Int -> Int -> Bool
                ok nodes n p k s = (sumof (nodes ! min (p + s + 1) n) s 0 n) - (sumof (nodes ! max (p - s) 0) s 0 n) >= k

（这与代码审查的代码完全相同，但此问题解决了另一个问题。）

这是我的C语言输入生成器：

#include <cstdio>
#include <cstdlib>
using namespace std;
int main (int argc, char * argv[]) {
    srand(1827);
    int n = 100000;
    if(argc > 1)
        sscanf(argv[1], "%d", &n);
    printf("%d %d\n", n, n);
    for(int i = 0; i < n; i++)
        printf("%d%c", rand() % n + 1, i == n - 1 ? '\n' : ' ');
    for(int i = 0; i < n; i++) {
        int p = rand() % n;
        int k = rand() % n + 1;
        printf("%d %d\n", p, k);
    }
}

如果您没有可用的C编译器，这将是的结果/gen.exe 1000。

这是我的计算机上的执行结果：

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.3
$ ghc -fforce-recomp 1827.hs
[1 of 1] Compiling Main             ( 1827.hs, 1827.o )
Linking 1827.exe ...
$ time ./gen.exe 1000 | ./1827.exe > /dev/null
real    0m0.088s
user    0m0.015s
sys     0m0.015s
$ ghc -fforce-recomp -O 1827.hs
[1 of 1] Compiling Main             ( 1827.hs, 1827.o )
Linking 1827.exe ...
$ time ./gen.exe 1000 | ./1827.exe > /dev/null
real    0m2.969s
user    0m0.000s
sys     0m0.045s

这是堆配置文件摘要：

$ ghc -fforce-recomp -rtsopts ./1827.hs
[1 of 1] Compiling Main             ( 1827.hs, 1827.o )
Linking 1827.exe ...
$ ./gen.exe 1000 | ./1827.exe +RTS -s > /dev/null
      70,207,096 bytes allocated in the heap
       2,112,416 bytes copied during GC
         613,368 bytes maximum residency (3 sample(s))
          28,816 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)
                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       132 colls,     0 par    0.00s    0.00s     0.0000s    0.0004s
  Gen  1         3 colls,     0 par    0.00s    0.00s     0.0006s    0.0010s
  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.03s  (  0.03s elapsed)
  GC      time    0.00s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.03s  (  0.04s elapsed)
  %GC     time       0.0%  (14.7% elapsed)
  Alloc rate    2,250,213,011 bytes per MUT second
  Productivity 100.0% of total user, 83.1% of total elapsed
$ ghc -fforce-recomp -O -rtsopts ./1827.hs
[1 of 1] Compiling Main             ( 1827.hs, 1827.o )
Linking 1827.exe ...
$ ./gen.exe 1000 | ./1827.exe +RTS -s > /dev/null
   6,009,233,608 bytes allocated in the heap
     622,682,200 bytes copied during GC
         443,240 bytes maximum residency (505 sample(s))
          48,256 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)
                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10945 colls,     0 par    0.72s    0.63s     0.0001s    0.0004s
  Gen  1       505 colls,     0 par    0.16s    0.13s     0.0003s    0.0005s
  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.00s  (  2.13s elapsed)
  GC      time    0.87s  (  0.76s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.89s  (  2.90s elapsed)
  %GC     time      30.3%  (26.4% elapsed)
  Alloc rate    3,009,412,603 bytes per MUT second
  Productivity  69.7% of total user, 69.4% of total elapsed

共有1个答案

钦高峯

2023-03-14

让我放大您的主函数，并稍微重写它：

main :: IO ()
main = do
    [n, m] <- fmap (map read . words) getLine
    line <- getLine
    let nodes = listArray (0, n) . tonodes n . map (subtract 1) . map read . words $ line
    replicateM_ m $ query n nodes

显然，这里的意图是创建一次节点阵列，然后在查询的每个调用中使用。

不幸的是，GHC将这段代码有效地转换为，

main = do
    [n, m] <- fmap (map read . words) getLine
    line <- getLine
    replicateM_ m $ do
        let nodes = listArray (0, n) . tonodes n . map (subtract 1) . map read . words $ line
        query n nodes

您可以立即看到这里的问题。

原因是state hack，它（粗略地）说：“当某个对象的类型为IO a时，假设它只被调用一次。”。官方文件并不复杂：

fno状态黑客

关闭“state hack”，任何以state#token作为参数的lambda都被认为是单个条目，因此可以在其中内联内容。这可以提高IO和ST monad代码的性能，但它存在减少共享的风险。

大致上，想法如下：如果您定义一个带有IO类型和where子句的函数，例如。

foo x = do
    putStrLn y
    putStrLn y
  where y = ...x...

IO a类型的东西可以看作是RealWord类型的东西-

foo x = 
   let y = ...x... in 
   \world1 ->
     let (world2, ()) = putStrLn y world1
     let (world3, ()) = putStrLn y world2
     in  (world3, ())

对foo的调用（通常）如下所示。但是foo的定义只接受一个参数，而另一个参数只在稍后被本地lambda表达式使用！这将是一个非常慢的对foo的调用。如果html" target="_blank">代码如下所示，速度会快得多：

foo x world1 = 
   let y = ...x... in 
   let (world2, ()) = putStrLn y world1
   let (world3, ()) = putStrLn y world2
   in  (world3, ())

这被称为eta扩展，并基于各种理由进行（例如，通过分析函数的定义，通过检查函数的调用方式，以及在这种情况下，通过类型定向启发式）。

不幸的是，如果对foo的调用实际上采用了让fooArgument=foo argument的形式，即使用了一个参数，但没有传递（尚未传递），则会降低性能。在原始代码中，如果随后多次使用了fooArgument，则仍将只计算一次y，并共享。在修改后的代码中，每次都会重新计算y，这正是节点发生的情况。

可能地请参阅#9388以了解这样做的尝试。修复它的问题是，在很多转换正常的情况下，它会降低性能，即使编译器不可能确定这一点。而且可能有一些情况在技术上不可行，即共享丢失，但这仍然是有益的，因为更快的呼叫带来的加速超过了重新计算的额外成本。因此，现在还不清楚该怎么办。

类似资料：

为什么set（）使这段代码运行得更快？

我写了一些欧拉问题35的代码：我想知道为什么这个代码（上图）运行得这么快，当我设置在函数中。这段代码的运行时间约为8秒。最初，我没有设置，我的函数是这样的：我的初始代码（没有）运行了很长时间，我没有等它完成。我很好奇，为什么这两段代码在运行时间上存在如此大的差异，因为我不相信会有任何重复项，这会使迭代花费如此长的时间，以至于迭代。也许我对set（）的想法是错误的。欢迎任何帮助。
为什么此O（n ^ 2）代码比O（n）执行得更快？

问题内容：我编写了两种方法的代码，以找出LeetCode字符串中的第一个唯一字符。问题陈述：给定一个字符串，找到其中的第一个非重复字符并返回其索引。如果不存在，则返回-1。示例测试用例： s =“ leetcode”返回0。 s =“ loveleetcode”，返回2。方法1（O（n））（如果我错了，请纠正我）：方法2（O（n ^ 2））：在方法2中，我认为复杂度应为O（n ^ 2
异步I/O密集型代码比非异步代码运行得慢，为什么？

60次迭代的结果是：这次测试的结果是一致的。无论我在操作方法中对for循环进行了多少次迭代，异步程序的运行速度都比非异步程序慢1秒左右。（我连续多次运行测试，以解释抖动升温的原因。）我做错了什么，导致异步比非异步慢？我是否误解了编写异步代码的基本原理？
为什么代码在一个函数里运行的更快？

rank ▲ ✰ vote url 44 455 162 534 url 为什么代码在一个函数里运行的更快? def main(): for i in xrange(10**8): pass main() 在Python中运行速度: real 0m1.841s user 0m1.828s sys 0m0.012s 然而不把它放在函数里: for i
为什么字节码可能比本机代码运行得更快？

问题内容： Java很慢。这似乎不仅仅是一个“城市传奇”。由于延迟，您不将其用于实时编码，也不将其用于集群/并行计算。有成千上万的基准测试，特别是“ Java vs C＃vs C ++”。 http://benchmarksgame.alioth.debian.org/ 根据以上站点，不仅Java性能几乎与C一样好（远远没有C），而且Scala和Clojure（两种在JVM上运行的功能语言）都具
为什么这个Haskell数组填充操作这么慢？

对于特定的任务，我需要在可变数组中进行大量快速、单独的写操作。为了检查性能，我使用了以下测试：

为什么这个Haskell代码在-O下运行得更慢？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档