问题：

我可以确定性地和任意排列的浮点数的向量吗？

龙洛城

2023-03-14

假设我有一个（可能很大的）浮点数向量，由一些黑盒过程产生。是否可以计算这些数字的按位可重复求和？

如果黑盒进程总是以相同的顺序产生数字，那么按位可重复求和很容易：只需从左到右求和。

但是如果数字是以随机顺序生成的，也许是因为它们是从异步进程返回和收集的，那么就更难了：我必须对它们进行排序。

但如果有更多的数字，可能分布在不同的机器上，因此不需要移动它们呢？

还有一种方法可以对它们进行决定性的求和吗？

共有3个答案

贺君浩

2023-03-14

对

Ahrens、Demmel和Nguyen（2020）的论文“高效可再现浮点求和算法”中描述了一种实现这一点的方法。我在下面介绍了它的一个实现。

在这里，我们将比较三种算法：

传统的从左到右求和：这对于输入顺序中的排列来说是快速、不准确和不可复制的

传统的求和是不准确的，因为随着求和的增长，我们添加到其中的数字中的最低有效位数会被自动删除。Kahan求和通过保持一个保持最低有效位的连续“补偿”和来克服这一问题。可重复求和使用类似的策略来保持准确性，但保持多个对应于不同补偿水平的箱子。这些箱子还适当截断输入数据的重要性，以获得准确、可重复的总和。

为了研究算法，我们将对100万个数字进行测试，每个数字都从[-10001000]范围内均匀抽取。为了证明算法的答案范围及其再现性，输入将被洗牌并求和100次。

我要断言的一个结果是，对于所研究的每种数据类型，确定性算法对100个数据排序中的每一个都产生相同的结果，下面的代码对此进行了严格的演示。

我的实现包括一个类，该类可以复制传递给它的累加器值，但有几种方法可以传递输入数据。我们将对其中三个进行实验：

1ata测试一种模式，在这种模式下，数字一次传递给累加器一个，而事先不知道输入的最大幅度
许多测试是一种将许多数字传递给累加器的模式，但事先不知道输入的最大幅度
manyc测试一种模式，在这种模式下，许多数字被传递给累加器，并且我们事先知道输入的最大幅度

各种算法的结果如下：

Single-Precision Floats
==================================================================
Average Reproducible sum 1ata time   = 0.0115s (11.5x simple; 2.6x Kahan)
Average Reproducible sum many time   = 0.0047s (4.7x simple; 1.1x Kahan)
Average Reproducible sum manyc time  = 0.0036s (3.6x simple; 0.8x Kahan)
Average Kahan summation time         = 0.0044s
Average simple summation time        = 0.0010s
Reproducible Value                   = 767376.500000000 (0% diff vs Kahan)
Kahan long double accumulator value  = 767376.500000000
Error bound on RV                    = 15.542840004
Distinct Kahan (single accum) values = 1
Distinct Simple values               = 92


Double-Precision Floats
==================================================================
Average Reproducible sum 1ata time   = 0.0112s (10.5x simple; 2.6x Kahan)
Average Reproducible sum many time   = 0.0051s (4.7x simple; 1.2x Kahan)
Average Reproducible sum manyc time  = 0.0037s (3.4x simple; 0.8x Kahan)
Average simple summation time        = 0.0011s
Average Kahan summation time         = 0.0044s
Reproducible Value                   = 310844.70069914369378239 (0% diff vs Kahan)
Kahan long double accumulator value  = 310844.70069914369378239
Error bound on RV                    = 0.00000000048315059
Distinct Kahan (double accum) values = 2
Distinct Simple values               = 96

让我们尝试一个比均匀随机更具挑战性的测试！相反，让我们在[0，2π]范围内生成100万个数据点，并使用它们生成正弦波的y值。与之前一样，我们将这些数据点无序排列100次，看看我们得到的是什么类型的输出。计时值与上述非常相似，因此我们将其忽略。这给了我们：

Single-Precision Floats
==================================================================
Reproducible Value                   = 0.000000000
Kahan long double accumulator value  = -0.000000000
Error bound                          = 14.901161194
Distinct Kahan (single) values       = 96
Distinct Simple values               = 100
Kahan lower value                    = -0.000024229
Kahan upper value                    = 0.000025988
Simple lower value                   = -0.017674208
Simple upper value                   = 0.018800795
Kahan range                          = 0.000050217
Simple range                         = 0.036475003
Simple range / Kahan range           = 726

Double-Precision Floats
==================================================================
Reproducible Value                   = 0.00000000000002185
Kahan long double accumulator value  = 0.00000000000002185
Error bound                          = 0.00000000000000083
Distinct Kahan (double) values       = 93
Distinct Simple values               = 100
Kahan lower value                    = 0.00000000000000017
Kahan upper value                    = 0.00000000000006306
Simple lower value                   = -0.00000000004814216
Simple upper value                   = 0.00000000002462563
Kahan range                          = 0.00000000000006289
Simple range                         = 0.00000000007276779
Simple range / Kahan range           = 1157

这里我们看到，可再现值与我们用作真理来源的长双卡汉求和值完全匹配。

与我们之前的简单测试结果相比，有许多不同的Kahan求和值（其中Kahan求和使用与数据相同的浮点类型）；使用更困难的数据集破坏了我们在简单测试的Kahan求和中观察到的“部分决定论”，即使Kahan求和产生的值的范围比通过简单求和获得的值小几个数量级。

因此，在这种情况下，可再现求和比Kahan求和具有更好的准确性，并且产生单个按位可再现的结果，而不是大约100个不同的结果。

时间。可重复求和所需的时间比简单求和长约3.5倍，与Kahan求和所需的时间大致相同。

记忆力使用3折储物箱需要储能器的空间。即，单精度为24字节，双精度为48字节。

此外，还需要一个静态查找表，该表可以在给定数据类型和折叠的所有累加器之间共享。对于3倍装箱，此表为41个浮点（164字节）或103个双倍（824字节）。

那么，我们学到了什么？

标准求和给出了广泛的结果。我们对双精度类型进行了100次测试，得到了96个独特的结果。这显然是不可复制的，并且很好地说明了我们试图解决的问题。

另一方面，Kahan求和产生的结果很少（因此它是“更”确定的），并且大约需要传统求和的4倍时间。

如果我们对要添加的数字一无所知，那么可复制的算法需要比简单求和长约10倍的时间；然而，如果我们知道求和的最大值，那么可重复的算法只需要比简单求和长约3.5倍的时间。

与Kahan求和相比，可再现的算法有时需要更少的时间（！），同时确保我们得到的答案按位相同。此外，与本答案中详述的方法不同，在我们的简单测试中，可再现的结果与Kahan求和结果在单精度和双精度方面都匹配。我们还得到强有力的保证，即可再现和Kahan结果具有非常低的相对误差。

什么折叠级别最好？3，根据下图。

Fold-1非常糟糕。在这个测试中，2对双精度给出了很好的准确度，但对浮点数给出了10%的相对误差。Fold-3对双精度和浮点数给出了很好的准确度，但时间增加较小。所以我们只希望对双精度进行一次一次的重复求和。高于Fold-3只会带来边际收益。

完整的代码太长，无法在此处包含；但是，您可以在此处找到C版本；和ReproBLAS在这里。

周宏伯

2023-03-14

TL/DR：有一种简单而系统的方法，尽管它不是一种有效的方法：将所有内容转换为定点，并获得精确的值。因为只有转换回浮点的最后一步包含舍入，所以整个过程与求和的顺序无关。

该思想来源于精确点积，它是长间隔算法的基本工具。

观察到所有浮点数也可以表示为适当宽度的定点数。对于定点数，乘法和加法可以精确表示，只要添加一些额外的整数位来解释溢出。

例如，64位二进制IEE754浮点数可以表示为2131位定点数，前1056位，小数点后1075位（有关其他FP格式，请参阅链接中的表1，但请注意，引用的数字因数2太高，因为它们是两个浮点数的乘积）。

为溢出添加64位（假设一次最多可以添加2^64个数字），我们达到了2195位或275字节的固定点宽度。对于64位无符号整数，小数点前为18，小数点后为17。

对于float32，小数点前3位，小数点后3位，或者，如果允许在一个肢体之间使用小数点，则总共只需要5位。

这种算法很容易用多种语言实现。通常不需要动态内存管理，并且可以可靠地检测到无符号整数溢出时（例如在C

该原理可以应用于更复杂的浮点问题，在这些问题中可以获得正确的舍入。它不是很有效，但实现起来很简单，而且是robubst（在可测试性的意义上）。

编辑：更正了数字，因为它们是系数2过高。还添加了float32案例（建议来自njuffa）

归俊捷

2023-03-14

（请参见此处以获得更好的答案。）

是的，有一种方法。

Demmel和Nguyen（2013）的论文“快速可再现浮点求和”中描述了一种实现这一点的方法。我在下面介绍了它的串行和并行实现。

在这里，我们将比较三种算法：

传统的从左到右求和：这对于输入顺序中的排列来说是快速、不准确和不可复制的

传统的求和是不准确的，因为随着总和的增长，我们添加到其中的数字的最低有效位数会被静默删除。Kahan求和通过保持一个持续的“补偿”总和来克服这一点，该总和保留在最低有效位数上。确定性求和使用类似的策略来保持准确性，但在执行求和之前，将输入数字“预四舍五入”到一个公共基数，以便可以准确计算它们的总和而不会出现任何舍入误差。

为了研究算法，我们将对100万个数字进行测试，每个数字都从[-10001000]范围内均匀抽取。为了演示算法的答案范围及其确定性，输入将被洗牌并求和100次。并行算法使用12个线程运行。

“正确”求和（通过在长双模式下使用Kahan求和并选择最常出现的值确定）为

310844.700699143717685046795

我要断言的一个结果是，对于所研究的每种数据类型，确定性算法对100个数据顺序中的每一个都产生相同的结果，并且这些结果对于算法的串行和并行变体都是相同的，下面的代码对此进行了严格的演示。

各种算法的结果如下：

Serial Float
=========================================================
Simple summation accumulation type   = float
Average deterministic summation time = 0.00462s
Average simple summation time        = 0.00144s (3.20x faster than deterministic)
Average Kahan summation time         = 0.00290s (1.59x faster than deterministic)
Deterministic value                  = 256430592.000000000 (very different from actual)
Distinct Kahan values                = 3  (with single-precision accumulator)
Distinct Simple values               = 93 (with single-precision accumulator)
Distinct Simple values               = 1  (with double-precision accumulator)

Parallel Float
=========================================================
Simple summation accumulation type   = float
Average deterministic summation time = 0.00576s
Average simple summation time        = 0.00206s (2.79x faster than deterministic)
Average Kahan summation time         = 0.00599s (0.96x faster than deterministic)
Deterministic value                  = 256430592.000000000 (very different from actual)
Distinct Kahan values                = 3  (with single-precision accumulator)
Distinct Simple values               = 89 (with single-precision accumulator)
Distinct Simple values               = 1  (with double-precision accumulator)

Serial Double
=========================================================
Average deterministic summation time = 0.00600s
Average simple summation time        = 0.00171s (3.49x faster than deterministic)
Average Kahan summation time         = 0.00378s (1.58x faster than deterministic)
Deterministic value                  = 310844.70069914375199005 (epsilon difference from actual value)
Distinct Kahan values                = 4
Distinct Simple values               = 88

Parallel Double
=========================================================
Average deterministic summation time = 0.01074s
Average simple summation time        = 0.00289s (3.71x faster than deterministic)
Average Kahan summation time         = 0.00648s (1.65x faster than deterministic)
Deterministic value                  = 310844.70069914375199005 (epsilon difference from actual value)
Distinct Kahan values                = 2
Distinct Simple values               = 83

Serial Long Double
=========================================================
Average deterministic summation time = 0.01072s
Average simple summation time        = 0.00215s (4.96x faster than deterministic)
Average Kahan summation time         = 0.00448s (2.39x faster than deterministic)
Deterministic value                  = 310844.700699143717685046795 (no discernable difference from actual)
Distinct Kahan values                = 3
Distinct Simple values               = 94

Parallel Long Double
=========================================================
Average deterministic summation time = 0.01854s
Average simple summation time        = 0.00466s (3.97x faster than deterministic)
Average Kahan summation time         = 0.00635s (2.91x faster than deterministic)
Deterministic value                  = 310844.700699143717685046795 (no discernable difference from actual)
Distinct Kahan values                = 1
Distinct Simple values               = 82

那么，我们学到了什么？从单精度结果中，我们可以看到，使用双倍长度累加器可以使传统求和算法在该数据集上重现，尽管对于任意数据集几乎肯定不会出现这种情况。尽管如此，这仍然是一种在工作时提高精度的廉价方法。

当传统求和所用的累加器与所加的数字大小相同时，它的效果非常差：我们进行了100次测试，得到的结果几乎与传统求和相同。

另一方面，Kahan求和产生的结果很少（因此它是“更”确定的），并且大约需要传统求和的两倍时间。

确定性算法比简单求和要长约4倍，比Kahan求和要长约1.5-3倍，但对于双精度和长双精度类型，除了按位确定性外，都会得到大致相同的答案（无论输入数据如何排序，它总是得到相同的答案）。

然而，单精度浮点得到了一个非常糟糕的答案，不像单精度常规和Kahan求和，它们与实际答案非常接近。这是为什么？

我们一直在研究的论文确定，如果输入中有n个数字，我们使用k个折叠轮（论文建议使用2，这是我们在这里使用的），那么确定性和常规求和将提供类似的误差范围

n^k * e^(k-1) << 1

其中e是数据类型的浮点epsilon。这些epsilon值是：

Single-precision epsilon      = 0.000000119
Double-precision epsilon      = 0.00000000000000022
Long double-precision epsilon = 0.000000000000000000108

将这些与n=1M一起插入到方程中，我们得到：

Single condition      = 119000
Double condition      = 0.00022
Long double condition = 0.000000108

所以我们看到，对于这种大小的输入，单精度的性能很差。

另一点需要注意的是，虽然长双精度编码在我的机器上需要16个字节，但这只是为了对齐：用于计算的真正长度只有80位。因此，两个长双精度将在数字上进行相等的比较，但它们未使用的位是任意的。对于真正的逐位再现性，需要确定未使用的位并将其设置为已知值。

//Compile with:
//g++ -g -O3 repro_vector.cpp -fopenmp

//NOTE: Always comile with `-g`. It doesn't slow down your code, but does make
//it debuggable and improves ease of profiling

#include <algorithm>
#include <bitset>               //Used for showing bitwise representations
#include <cfenv>                //Used for setting floating-point rounding modes
#include <chrono>               //Used for timing algorithms
#include <climits>              //Used for showing bitwise representations
#include <iostream>
#include <omp.h>                //OpenMP
#include <random>
#include <stdexcept>
#include <string>
#include <typeinfo>
#include <unordered_map>
#include <vector>

constexpr int ROUNDING_MODE = FE_UPWARD;
constexpr int N = 1'000'000;
constexpr int TESTS = 100;

// Simple timer class for tracking cumulative run time of the different
// algorithms
struct Timer {
  double total = 0;
  std::chrono::high_resolution_clock::time_point start_time;

  Timer() = default;

  void start() {
    start_time = std::chrono::high_resolution_clock::now();
  }

  void stop() {
    const auto now = std::chrono::high_resolution_clock::now();
    const auto time_span = std::chrono::duration_cast<std::chrono::duration<double>>(now - start_time);
    total += time_span.count();
  }
};

//Simple class to enable directed rounding in floating-point math and to reset
//the rounding mode afterwards, when it goes out of scope
struct SetRoundingMode {
  const int old_rounding_mode;

  SetRoundingMode(const int mode) : old_rounding_mode(fegetround()) {
    if(std::fesetround(mode)!=0){
      throw std::runtime_error("Failed to set directed rounding mode!");
    }
  }

  ~SetRoundingMode(){
    if(std::fesetround(old_rounding_mode)!=0){
      throw std::runtime_error("Failed to reset rounding mode to original value!");
    }
  }

  static std::string get_rounding_mode_string() {
    switch (fegetround()) {
      case FE_DOWNWARD:   return "downward";
      case FE_TONEAREST:  return "to-nearest";
      case FE_TOWARDZERO: return "toward-zero";
      case FE_UPWARD:     return "upward";
      default:            return "unknown";
    }
  }
};

// Used to make showing bitwise representations somewhat more intuitive
template<class T>
struct binrep {
  const T val;
  binrep(const T val0) : val(val0) {}
};

// Display the bitwise representation
template<class T>
std::ostream& operator<<(std::ostream& out, const binrep<T> a){
  const char* beg = reinterpret_cast<const char*>(&a.val);
  const char *const end = beg + sizeof(a.val);
  while(beg != end){
    out << std::bitset<CHAR_BIT>(*beg++);
    if(beg < end)
      out << ' ';
  }
  return out;
}



//Simple serial summation algorithm with an accumulation type we can specify
//to more fully explore its behaviour
template<class FloatType, class SimpleAccumType>
FloatType serial_simple_summation(const std::vector<FloatType> &vec){
  SimpleAccumType sum = 0;
  for(const auto &x: vec){
    sum += x;
  }
  return sum;
}

//Parallel variant of the simple summation algorithm above
template<class FloatType, class SimpleAccumType>
FloatType parallel_simple_summation(const std::vector<FloatType> &vec){
  SimpleAccumType sum = 0;
  #pragma omp parallel for default(none) reduction(+:sum) shared(vec)
  for(size_t i=0;i<vec.size();i++){
    sum += vec[i];
  }
  return sum;
}



//Kahan's compensated summation algorithm for accurately calculating sums of
//many numbers with O(1) error
template<class FloatType>
FloatType serial_kahan_summation(const std::vector<FloatType> &vec){
  FloatType sum = 0.0f;
  FloatType c = 0.0f;
  for (const auto &num: vec) {
    const auto y = num - c;
    const auto t = sum + y;
    c = (t - sum) - y;
    sum = t;
  }
  return sum;
}

//Parallel version of Kahan's compensated summation algorithm (could be improved
//by better accounting for the compsenation during the reduction phase)
template<class FloatType>
FloatType parallel_kahan_summation(const std::vector<FloatType> &vec){
  //Parallel phase
  std::vector<FloatType> sum(omp_get_max_threads(), 0);
  FloatType c = 0;
  #pragma omp parallel for default(none) firstprivate(c) shared(sum,vec)
  for (size_t i=0;i<vec.size();i++) {
    const auto tid = omp_get_thread_num();
    const auto y = vec[i] - c;
    const auto t = sum.at(tid) + y;
    c = (t - sum[tid]) - y;
    sum[tid] = t;
  }

  //Serial reduction phase

  //This could be more accurate if it took the remaining compensation values
  //from above into account
  FloatType total_sum = 0.0f;
  FloatType total_c = 0.0f;
  for(const auto &num: sum){
    const auto y = num - total_c;
    const auto t = total_sum + y;
    total_c = (t - total_sum) - y;
    total_sum = t;
  }

  return total_sum;
}



// Error-free vector transformation. Algorithm 4 from Demmel and Nguyen (2013)
template<class FloatType>
FloatType ExtractVectorNew2(
  const FloatType M,
  const typename std::vector<FloatType>::iterator &begin,
  const typename std::vector<FloatType>::iterator &end
){
  // Should use the directed rounding mode of the parent thread

  auto Mold = M;
  for(auto v=begin;v!=end;v++){
    auto Mnew = Mold + (*v);
    auto q = Mnew - Mold;
    (*v) -= q;
    Mold = Mnew;
  }

  //This is the exact sum of high order parts q_i
  //v is now the vector of low order parts r_i
  return Mold - M;
}

template<class FloatType>
FloatType mf_from_deltaf(const FloatType delta_f){
  const int power = std::ceil(std::log2(delta_f));
  return static_cast<FloatType>(3.0) * std::pow(2, power);
}

//Implements the error bound discussed near Equation 6 of
//Demmel and Nguyen (2013).
template<class FloatType>
bool is_error_bound_appropriate(const size_t N, const int k){
  const auto eps = std::numeric_limits<FloatType>::epsilon();
  const auto ratio = std::pow(N, k) * std::pow(eps, k-1);
  //If ratio << 1, then the conventional non-reproducible sum and the
  //deterministic sum will have error bounds of the same order. We arbitrarily
  //choose 1e-4 to represent this
  return ratio < 1e-3;
}

//Serial bitwise deterministic summation.
//Algorithm 8 from Demmel and Nguyen (2013).
template<class FloatType>
FloatType serial_bitwise_deterministic_summation(
  std::vector<FloatType> vec,  // Note that we're making a copy!
  const int k
){
  constexpr FloatType eps = std::numeric_limits<FloatType>::epsilon();
  const auto n = vec.size();
  const auto adr = SetRoundingMode(ROUNDING_MODE);

  if(n==0){
    return 0;
  }

  if(!is_error_bound_appropriate<FloatType>(vec.size(), k)){
    std::cout<<"WARNING! Error bounds of deterministic sum are large relative to conventional summation!"<<std::endl;
  }

  FloatType m = std::abs(vec.front());
  for(const auto &x: vec){
    m = std::max(m, std::abs(x));
  }

  FloatType delta_f = n * m / (1 - 4 * (n + 1) * eps);
  FloatType Mf = mf_from_deltaf(delta_f);

  std::vector<FloatType> Tf(k);
  for(int f=0;f<k-1;f++){
    Tf[f] = ExtractVectorNew2<FloatType>(Mf, vec.begin(), vec.end());
    delta_f = n * (4 * eps * Mf / 3) / (1 - 4 * (n + 1) * eps);
    Mf = mf_from_deltaf(delta_f);
  }

  FloatType M = Mf;
  for(const FloatType &v: vec){
    M += v;
  }
  Tf[k-1] = M - Mf;

  FloatType T = 0;
  for(const FloatType &tf: Tf){
    T += tf;
  }

  return T;
}

//Parallel bitwise deterministic summation.
//Algorithm 9 from Demmel and Nguyen (2013).
template<class FloatType>
FloatType parallel_bitwise_deterministic_summation(
  std::vector<FloatType> vec,  // Note that we're making a copy!
  const int k
){
  constexpr FloatType eps = std::numeric_limits<FloatType>::epsilon();
  const auto n = vec.size();
  const auto adr = SetRoundingMode(ROUNDING_MODE);

  if(n==0){
    return 0;
  }

  if(!is_error_bound_appropriate<FloatType>(vec.size(), k)){
    std::cout<<"WARNING! Error bounds of deterministic sum are large relative to conventional summation!"<<std::endl;
  }

  std::vector<FloatType> Tf(k);

  // Note that this reduction would require communication if we had
  // implemented this to work on multiple nodes
  FloatType m = std::abs(vec.front());
  #pragma omp parallel for default(none) reduction(max:m) shared(vec)
  for(size_t i=0;i<vec.size();i++){
    m = std::max(m, std::abs(vec[i]));
  }

  // Note that this reduction would require communication if we had
  // implemented this to work on multiple nodes
  #pragma omp declare reduction(vec_plus : std::vector<FloatType> : \
    std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<FloatType>())) \
    initializer(omp_priv = decltype(omp_orig)(omp_orig.size()))

  #pragma omp parallel default(none) reduction(vec_plus:Tf) shared(k,m,n,vec,std::cout)
  {
    const auto adr = SetRoundingMode(ROUNDING_MODE);
    const auto threads = omp_get_num_threads();
    const auto tid = omp_get_thread_num();
    const auto values_per_thread = n / threads;
    const auto nlow = tid * values_per_thread;
    const auto nhigh = (tid<threads-1) ? ((tid+1) * values_per_thread) : n;

    FloatType delta_f = n * m / (1 - 4 * (n + 1) * eps);
    FloatType Mf = mf_from_deltaf(delta_f);

    for(int f=0;f<k-1;f++){
      Tf[f] = ExtractVectorNew2<FloatType>(Mf, vec.begin() + nlow, vec.begin() + nhigh);
      delta_f = n * (4 * eps * Mf / 3) / (1 - 4 * (n + 1) * eps);
      Mf = mf_from_deltaf(delta_f);
    }

    FloatType M = Mf;
    for(size_t i=nlow;i<nhigh;i++){
      M += vec[i];
    }
    Tf[k-1] = M - Mf;
  }

  FloatType T = 0;
  for(const FloatType &tf: Tf){
    T += tf;
  }

  return T;
}



//Convenience wrappers
template<bool Parallel, class FloatType>
FloatType bitwise_deterministic_summation(
  const std::vector<FloatType> &vec,  // Note that we're making a copy!
  const int k
){
  if(Parallel){
    return parallel_bitwise_deterministic_summation<FloatType>(vec, k);
  } else {
    return serial_bitwise_deterministic_summation<FloatType>(vec, k);
  }
}

template<bool Parallel, class FloatType, class SimpleAccumType>
FloatType simple_summation(const std::vector<FloatType> &vec){
  if(Parallel){
    return parallel_simple_summation<FloatType, SimpleAccumType>(vec);
  } else {
    return serial_simple_summation<FloatType, SimpleAccumType>(vec);
  }
}

template<bool Parallel, class FloatType>
FloatType kahan_summation(const std::vector<FloatType> &vec){
  if(Parallel){
    return serial_kahan_summation<FloatType>(vec);
  } else {
    return parallel_kahan_summation<FloatType>(vec);
  }
}



// Timing tests for the summation algorithms
template<bool Parallel, class FloatType, class SimpleAccumType>
FloatType PerformTestsOnData(
  const int TESTS,
  std::vector<FloatType> floats, //Make a copy so we use the same data for each test
  std::mt19937 gen               //Make a copy so we use the same data for each test
){
  Timer time_deterministic;
  Timer time_kahan;
  Timer time_simple;

  //Very precise output
  std::cout.precision(std::numeric_limits<FloatType>::max_digits10);
  std::cout<<std::fixed;

  std::cout<<"Parallel? "<<Parallel<<std::endl;
  if(Parallel){
    std::cout<<"Max threads = "<<omp_get_max_threads()<<std::endl;
  }
  std::cout<<"Floating type                        = "<<typeid(FloatType).name()<<std::endl;
  std::cout<<"Floating type epsilon                = "<<std::numeric_limits<FloatType>::epsilon()<<std::endl;
  std::cout<<"Simple summation accumulation type   = "<<typeid(SimpleAccumType).name()<<std::endl;
  std::cout<<"Number of tests                      = "<<TESTS<<std::endl;
  std::cout<<"Input sample = "<<std::endl;
  for(size_t i=0;i<10;i++){
    std::cout<<"\t"<<floats[i]<<std::endl;
  }

  //Get a reference value
  std::unordered_map<FloatType, uint32_t> simple_sums;
  std::unordered_map<FloatType, uint32_t> kahan_sums;
  const auto ref_val = bitwise_deterministic_summation<Parallel, FloatType>(floats, 2);
  for(int test=0;test<TESTS;test++){
    std::shuffle(floats.begin(), floats.end(), gen);

    time_deterministic.start();
    const auto my_val = bitwise_deterministic_summation<Parallel, FloatType>(floats, 2);
    time_deterministic.stop();
    if(ref_val!=my_val){
      std::cout<<"ERROR: UNEQUAL VALUES ON TEST #"<<test<<"!"<<std::endl;
      std::cout<<"Reference      = "<<ref_val                   <<std::endl;
      std::cout<<"Current        = "<<my_val                    <<std::endl;
      std::cout<<"Reference bits = "<<binrep<FloatType>(ref_val)<<std::endl;
      std::cout<<"Current   bits = "<<binrep<FloatType>(my_val) <<std::endl;
      throw std::runtime_error("Values were not equal!");
    }

    time_kahan.start();
    const auto kahan_sum = kahan_summation<Parallel, FloatType>(floats);
    kahan_sums[kahan_sum]++;
    time_kahan.stop();

    time_simple.start();
    const auto simple_sum = simple_summation<Parallel, FloatType, SimpleAccumType>(floats);
    simple_sums[simple_sum]++;
    time_simple.stop();
  }

  std::cout<<"Average deterministic summation time = "<<(time_deterministic.total/TESTS)<<std::endl;
  std::cout<<"Average simple summation time        = "<<(time_simple.total/TESTS)<<std::endl;
  std::cout<<"Average Kahan summation time         = "<<(time_kahan.total/TESTS)<<std::endl;
  std::cout<<"Ratio Deterministic to Simple        = "<<(time_deterministic.total/time_simple.total)<<std::endl;
  std::cout<<"Ratio Deterministic to Kahan         = "<<(time_deterministic.total/time_kahan.total)<<std::endl;

  std::cout<<"Reference value                      = "<<std::fixed<<ref_val<<std::endl;
  std::cout<<"Reference bits                       = "<<binrep<FloatType>(ref_val)<<std::endl;

  std::cout<<"Distinct Kahan values                = "<<kahan_sums.size()<<std::endl;
  std::cout<<"Distinct Simple values               = "<<simple_sums.size()<<std::endl;

  int count = 0;
  for(const auto &kv: kahan_sums){
    std::cout<<"\tKahan sum values (N="<<std::fixed<<kv.second<<") "<<kv.first<<" ("<<binrep<FloatType>(kv.first)<<")"<<std::endl;
    if(count++==10){
      break;
    }
  }

  count = 0;
  for(const auto &kv: simple_sums){
    std::cout<<"\tSimple sum values (N="<<std::fixed<<kv.second<<") "<<kv.first<<" ("<<binrep<FloatType>(kv.first)<<")"<<std::endl;
    if(count++==10){
      break;
    }
  }

  std::cout<<std::endl;

  return ref_val;
}

// Use this to make sure the tests are reproducible
template<class FloatType, class SimpleAccumType>
void PerformTests(
  const int TESTS,
  const std::vector<long double> &long_floats,
  std::mt19937 &gen
){
  std::vector<FloatType> floats(long_floats.begin(), long_floats.end());

  const auto serial_val = PerformTestsOnData<false, FloatType, SimpleAccumType>(TESTS, floats, gen);
  const auto parallel_val = PerformTestsOnData<true, FloatType, SimpleAccumType>(TESTS, floats, gen);

  //Note that the `long double` type may only use 12-16 bytes (to maintain
  //alignment), but only 80 bits, resulting in bitwise indeterminism in the last
  //few bits; however, the floating-point values themselves will be equal.
  std::cout<<"########################################"<<std::endl;
  std::cout<<"### Serial and Parallel values match for "
           <<typeid(FloatType).name()
           <<"? "
           <<(serial_val==parallel_val)
           <<std::endl;
  std::cout<<"########################################\n"<<std::endl;
}



int main(){
  std::random_device rd;
  // std::mt19937 gen(rd());   //Enable for randomness
  std::mt19937 gen(123456789); //Enable for reproducibility
  std::uniform_real_distribution<long double> distr(-1000, 1000);
  std::vector<long double> long_floats;
  for(int i=0;i<N;i++){
    long_floats.push_back(distr(gen));
  }

  PerformTests<double, double>(TESTS, long_floats, gen);
  PerformTests<long double, long double>(TESTS, long_floats, gen);
  PerformTests<float, float>(TESTS, long_floats, gen);
  PerformTests<float, double>(TESTS, long_floats, gen);

  return 0;
}

编辑埃里克·波斯皮奇尔

埃里克说：

我所描述的生成器将生成基本上具有相同量子的数字，所有数字都接近除数的倍数。这可能不包括样本中各种各样的指数，因此它可能无法很好地模拟一个分布，其中数字在添加时在有效位内部四舍五入。E、例如，如果我们添加许多123.45形式的数字，它们将在一段时间内添加得很好，尽管舍入误差会随着部分和的累积而增大。但是，如果我们将表格12345、123.45和1.2345的编号相加，不同的错误会更早发生

将1M值123.45相加，得到的单长双卡汉值为123450000.00000000 2837623469532。确定性长双精度值与此相差-0.00000000000727595761，而确定性双精度值相差-0.0000000 1206353772047，而确定性浮点数相差-3.68%（考虑到其较大的epsilon，如预期）。

从集合中随机选择1M值{1.2345、12.345、123.45、1234.5、12345}给出两个长的双卡汉值：A=2749592287.563000000780448317528（N=54）和B=2749592287.563000000547673874（N=46）。确定性长双精度值精确匹配（A）；确定性双精度值与（A）相差0.00000020139850676247；确定性浮点值与（A）的差异为-257%（同样，预期）。

我可以确定性地和任意排列的浮点数的向量吗？

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档