fast_gicp代码解读（一）

龚招

2023-12-01

fast_gicp系统介绍

fast_gicp的首次提出是2016年的文章Voxelized GICP for Fast and Accurate 3D Point Cloud Registration。
github代码链接为：https://github.com/SMRT-AIST/fast_gicp.git
这篇文章是16年提出的，定位精度较高，与目前先进的SLAM方法相比都具有较为明显的优势，但是不知道为什么没有引起广泛的重视，这两年提出来的很多最新的方法，包括MULLS等，是在LOAM的框架下，以及LiTAMIN2，是在概率的框架下。这几年激光SLAM的论文数量有下降的趋势。单纯基于激光的方法的发展方向也就是特征和概率这两个方向。未来在机器人领域，需要深入研究机器人的社会学问题了，而非单纯的基于机器人的信息感知和处理能力进行环境建模的问题了。问题的维度更加丰富了，也意味着不像SLAM这样有充分和可靠的评价体系了。同时，单纯的SLAM问题的研究趋势已经越来越向底层计算和边缘计算的方式发展，如何在硬件层面进行加速，以及如何适应硬件成本较低的移动机器人平台，都是未来重要的发展方向。单纯去做SLAM方法，已经是很难有很大的空间了，除非引入了新的优化方法，能够具有更高的弹性和兼容性。

理论方法分析

论文的核心方法主要围绕GICP的理论框架，相关解读可以参考博客
快速精确的体素GICP三维点云配准算法。

代码框架

本篇中，我们主要关注单线程和无cuda版本的代码实现。
首先看kitti.cpp的main()函数，其是直接能够跑kitti里程计数据的。

读取点云数据

// An highlighted block
KittiLoader kitti(argv[1]);

定义栅格化参数

// An highlighted block
double downsample_resolution = 0.25;
pcl::ApproximateVoxelGrid<pcl::PointXYZ> voxelgrid;
voxelgrid.setLeafSize(downsample_resolution, downsample_resolution, downsample_resolution);

定义gicp
设置最大的对应点距离为1m。

// An highlighted block
fast_gicp::FastGICP<pcl::PointXYZ, pcl::PointXYZ> gicp;
gicp.setMaxCorrespondenceDistance(1.0);
 // set initial frame as target
voxelgrid.setInputCloud(kitti.cloud(0));
pcl::PointCloud<pcl::PointXYZ>::Ptr target(new pcl::PointCloud<pcl::PointXYZ>);
voxelgrid.filter(*target);
gicp.setInputTarget(target);

这里对第一帧点云进行了定义，主要的函数是gicp.setinputTarget();

// An highlighted block
template <typename PointSource, typename PointTarget>
void FastGICP<PointSource, PointTarget>::setInputTarget(const PointCloudTargetConstPtr& cloud) {
if (target_ == cloud) {
 return;
}
pcl::Registration<PointSource, PointTarget, Scalar>::setInputTarget(cloud);
target_kdtree_->setInputCloud(cloud);
target_covs_.clear();
}

定义一些关键的系统变量
包括姿态向量，轨迹点云，以及计算帧率的时间变量。初始化pose[0]为单位阵。

// An highlighted block
std::vector<Eigen::Isometry3d, Eigen::aligned_allocator<Eigen::Isometry3d>> poses(kitti.size());
 poses[0].setIdentity();

 // trajectory for visualization
 pcl::PointCloud<pcl::PointXYZ>::Ptr trajectory(new pcl::PointCloud<pcl::PointXYZ>);
 trajectory->push_back(pcl::PointXYZ(0.0f, 0.0f, 0.0f));

 pcl::visualization::PCLVisualizer vis;
 vis.addPointCloud<pcl::PointXYZ>(trajectory, "trajectory");

 // for calculating FPS
 boost::circular_buffer<std::chrono::high_resolution_clock::time_point> stamps(30);
 stamps.push_back(std::chrono::high_resolution_clock::now());

激光里程计
开始进行scan-to-scan的点云匹配阶段。目前整个代码是没有后端的，但是能够达到这么高的精度，也说明了一点，对于点云这种稀疏和包含噪声的数据，利用概率或者分布进行对准的方式能够达到更高的精度上限，同时对环境具有更高的适应性。

// An highlighted block
 for (int i = 1; i < kitti.size(); i++) {
   // set the current frame as source
   voxelgrid.setInputCloud(kitti.cloud(i));
   pcl::PointCloud<pcl::PointXYZ>::Ptr source(new pcl::PointCloud<pcl::PointXYZ>);
   voxelgrid.filter(*source);
   gicp.setInputSource(source);

   // align and swap source and target cloud for next registration
   pcl::PointCloud<pcl::PointXYZ>::Ptr aligned(new pcl::PointCloud<pcl::PointXYZ>);
   gicp.align(*aligned);
   gicp.swapSourceAndTarget();

   // accumulate pose
   poses[i] = poses[i - 1] * gicp.getFinalTransformation().cast<double>();

   // FPS display
   stamps.push_back(std::chrono::high_resolution_clock::now());
   std::cout << stamps.size() / (std::chrono::duration_cast<std::chrono::nanoseconds>(stamps.back() - stamps.front()).count() / 1e9) << "fps" << std::endl;

   // visualization
   trajectory->push_back(pcl::PointXYZ(poses[i](0, 3), poses[i](1, 3), poses[i](2, 3)));
   vis.updatePointCloud<pcl::PointXYZ>(trajectory, pcl::visualization::PointCloudColorHandlerCustom<pcl::PointXYZ>(trajectory, 255.0, 0.0, 0.0), "trajectory");
   vis.spinOnce();
 }

给出第一个重要的函数gicp.align(*aligned);它的主要作用是计算source和target的相对变换信息.该函数是继承registration.hpp中的pcl::Registration<PointSource, PointTarget, Scalar>类中的align函数，其实现如下：

// An highlighted block
template <typename PointSource, typename PointTarget, typename Scalar> inline void
pcl::Registration<PointSource, PointTarget, Scalar>::align (PointCloudSource &output)
{
 align (output, Matrix4::Identity ());
}

对应的align (output, Matrix4::Identity ())的实现如下。也是在pcl::Registration<PointSource, PointTarget, Scalar>类中的

// An highlighted block
template <typename PointSource, typename PointTarget, typename Scalar> inline void
pcl::Registration<PointSource, PointTarget, Scalar>::align (PointCloudSource &output, const Matrix4& guess)
{
 if (!initCompute ()) 
   return;

 // Resize the output dataset
 if (output.points.size () != indices_->size ())
   output.points.resize (indices_->size ());
 // Copy the header
 output.header   = input_->header;
 // Check if the output will be computed for all points or only a subset
 if (indices_->size () != input_->points.size ())
 {
   output.width    = static_cast<uint32_t> (indices_->size ());
   output.height   = 1;
 }
 else
 {
   output.width    = static_cast<uint32_t> (input_->width);
   output.height   = input_->height;
 }
 output.is_dense = input_->is_dense;

 // Copy the point data to output
 for (size_t i = 0; i < indices_->size (); ++i)
   output.points[i] = input_->points[(*indices_)[i]];

 // Set the internal point representation of choice unless otherwise noted
 if (point_representation_ && !force_no_recompute_) 
   tree_->setPointRepresentation (point_representation_);

 // Perform the actual transformation computation
 converged_ = false;
 final_transformation_ = transformation_ = previous_transformation_ = Matrix4::Identity ();

 // Right before we estimate the transformation, we set all the point.data[3] values to 1 to aid the rigid 
 // transformation
 for (size_t i = 0; i < indices_->size (); ++i)
   output.points[i].data[3] = 1.0;

 computeTransformation (output, guess);

 deinitCompute ();
}

核心的函数在 computeTransformation (output, guess)中，其在子类LsqRegistration<PointTarget, PointSource>中有具体实现。这里是先进入fast_gicp类中的computeTransformation (output, guess)的函数的。

// An highlighted block
void LsqRegistration<PointTarget, PointSource>::computeTransformation(PointCloudSource& output, const Matrix4& guess) {
 Eigen::Isometry3d x0 = Eigen::Isometry3d(guess.template cast<double>());

 lm_lambda_ = -1.0;
 converged_ = false;
 //lm_debug_print_ = true;
 if (lm_debug_print_) {
   std::cout << "********************************************" << std::endl;
   std::cout << "*****************Debug optimize *****************" << std::endl;
   std::cout << "********************************************" << std::endl;
 }

 for (int i = 0; i < max_iterations_ && !converged_; i++) {
   nr_iterations_ = i;

   Eigen::Isometry3d delta;
   if (!step_optimize(x0, delta)) {
     std::cerr << "lm not converged!!" << std::endl;
     break;
   }
   converged_ = is_converged(delta);
 }

 final_transformation_ = x0.cast<float>().matrix();
 pcl::transformPointCloud(*input_, output, final_transformation_);
}

优化部分在step_optimize(x0, delta)函数中，具体如下：

// An highlighted block
bool LsqRegistration<PointTarget, PointSource>::step_optimize(Eigen::Isometry3d& x0, Eigen::Isometry3d& delta) {
  switch (lsq_optimizer_type_) {
    case LSQ_OPTIMIZER_TYPE::LevenbergMarquardt:
      return step_lm(x0, delta);
    case LSQ_OPTIMIZER_TYPE::GaussNewton:
      return step_gn(x0, delta);
  }

  return step_lm(x0, delta);
}

主要关注lm优化

// An highlighted block
bool LsqRegistration<PointTarget, PointSource>::step_lm(Eigen::Isometry3d& x0, Eigen::Isometry3d& delta) {
  Eigen::Matrix<double, 6, 6> H;
  Eigen::Matrix<double, 6, 1> b;
  double y0 = linearize(x0, &H, &b);

  if (lm_lambda_ < 0.0) {
    lm_lambda_ = lm_init_lambda_factor_ * H.diagonal().array().abs().maxCoeff();
  }

  double nu = 2.0;
  for (int i = 0; i < lm_max_iterations_; i++) {
    Eigen::LDLT<Eigen::Matrix<double, 6, 6>> solver(H + lm_lambda_ * Eigen::Matrix<double, 6, 6>::Identity());
    Eigen::Matrix<double, 6, 1> d = solver.solve(-b);

    delta.setIdentity();
    delta.linear() = so3_exp(d.head<3>()).toRotationMatrix();
    delta.translation() = d.tail<3>();

    Eigen::Isometry3d xi = delta * x0;
    double yi = compute_error(xi);
    double rho = (y0 - yi) / (d.dot(lm_lambda_ * d - b));

    if (lm_debug_print_) {
      if (i == 0) {
        std::cout << boost::format("--- LM optimization ---\n%5s %15s %15s %15s %15s %15s %5s\n") % "i" % "y0" % "yi" % "rho" % "lambda" % "|delta|" % "dec";
      }
      char dec = rho > 0.0 ? 'x' : ' ';
      std::cout << boost::format("%5d %15g %15g %15g %15g %15g %5c") % i % y0 % yi % rho % lm_lambda_ % d.norm() % dec << std::endl;
    }

    if (rho < 0) {
      if (is_converged(delta)) {
        return true;
      }

      lm_lambda_ = nu * lm_lambda_;
      nu = 2 * nu;
      continue;
    }

    x0 = xi;
    lm_lambda_ = lm_lambda_ * std::max(1.0 / 3.0, 1 - std::pow(2 * rho - 1, 3));
    final_hessian_ = H;
    return true;
  }

  return false;
}

首先看下函数double y0 = linearize(x0, &H, &b);的实现，是通过多线程对H矩阵和b矩阵进行计算的。在fast_gicp_impl.hpp中的FastGICP<PointSource, PointTarget>类中实现的

// An highlighted block
double FastGICP<PointSource, PointTarget>::linearize(const Eigen::Isometry3d& trans, Eigen::Matrix<double, 6, 6>* H, Eigen::Matrix<double, 6, 1>* b) {
 update_correspondences(trans);

 double sum_errors = 0.0;
 std::vector<Eigen::Matrix<double, 6, 6>, Eigen::aligned_allocator<Eigen::Matrix<double, 6, 6>>> Hs(num_threads_);
 std::vector<Eigen::Matrix<double, 6, 1>, Eigen::aligned_allocator<Eigen::Matrix<double, 6, 1>>> bs(num_threads_);
 for (int i = 0; i < num_threads_; i++) {
   Hs[i].setZero();
   bs[i].setZero();
 }

#pragma omp parallel for num_threads(num_threads_) reduction(+ : sum_errors) schedule(guided, 8)
 for (int i = 0; i < input_->size(); i++) {
   int target_index = correspondences_[i];
   if (target_index < 0) {
     continue;
   }

   const Eigen::Vector4d mean_A = input_->at(i).getVector4fMap().template cast<double>();
   const auto& cov_A = source_covs_[i];

   const Eigen::Vector4d mean_B = target_->at(target_index).getVector4fMap().template cast<double>();
   const auto& cov_B = target_covs_[target_index];

   const Eigen::Vector4d transed_mean_A = trans * mean_A;
   const Eigen::Vector4d error = mean_B - transed_mean_A;

   sum_errors += error.transpose() * mahalanobis_[i] * error;

   if (H == nullptr || b == nullptr) {
     continue;
   }

   Eigen::Matrix<double, 4, 6> dtdx0 = Eigen::Matrix<double, 4, 6>::Zero();
   dtdx0.block<3, 3>(0, 0) = skewd(transed_mean_A.head<3>());
   dtdx0.block<3, 3>(0, 3) = -Eigen::Matrix3d::Identity();

   Eigen::Matrix<double, 4, 6> jlossexp = dtdx0;

   Eigen::Matrix<double, 6, 6> Hi = jlossexp.transpose() * mahalanobis_[i] * jlossexp;
   Eigen::Matrix<double, 6, 1> bi = jlossexp.transpose() * mahalanobis_[i] * error;

   Hs[omp_get_thread_num()] += Hi;
   bs[omp_get_thread_num()] += bi;
 }

 if (H && b) {
   H->setZero();
   b->setZero();
   for (int i = 0; i < num_threads_; i++) {
     (*H) += Hs[i];
     (*b) += bs[i];
   }
 }

 return sum_errors;
}

这其中最重要的是对雅可比矩阵的求导，是直接对点对距离的误差公式进行求导的，b-Ta，注意由于是对-Ta进行求导，因此雅可比矩阵要加个负号。使用的是SE(3)上的李代数求导模型，具体可参见视觉SLAM14讲第85页。接下来就是正常的LM更新流程了，其中有李代数到李群的变换。优化部分的整体实现比lego-loam系列的SLAM系统更简洁。
update_correspondences(trans)是根据当前的trans给每个点寻找target点云中的对应点，并且计算相应的距离协方差，如下。

// An highlighted block
void FastGICP<PointSource, PointTarget>::update_correspondences(const Eigen::Isometry3d& trans) {
 assert(source_covs_.size() == input_->size());
 assert(target_covs_.size() == target_->size());

 Eigen::Isometry3f trans_f = trans.cast<float>();

 correspondences_.resize(input_->size());
 sq_distances_.resize(input_->size());
 mahalanobis_.resize(input_->size());

 std::vector<int> k_indices(1);
 std::vector<float> k_sq_dists(1);

#pragma omp parallel for num_threads(num_threads_) firstprivate(k_indices, k_sq_dists) schedule(guided, 8)
 for (int i = 0; i < input_->size(); i++) {
   PointTarget pt;
   pt.getVector4fMap() = trans_f * input_->at(i).getVector4fMap();

   target_kdtree_->nearestKSearch(pt, 1, k_indices, k_sq_dists);

   sq_distances_[i] = k_sq_dists[0];
   correspondences_[i] = k_sq_dists[0] < corr_dist_threshold_ * corr_dist_threshold_ ? k_indices[0] : -1;

   if (correspondences_[i] < 0) {
     continue;
   }

   const int target_index = correspondences_[i];
   const auto& cov_A = source_covs_[i];
   const auto& cov_B = target_covs_[target_index];

   Eigen::Matrix4d RCR = cov_B + trans.matrix() * cov_A * trans.matrix().transpose();
   RCR(3, 3) = 1.0;

   mahalanobis_[i] = RCR.inverse();
   mahalanobis_[i](3, 3) = 0.0f;
 }
}

然后给出第二个重要的函数gicp.swapSorceAndTarget();它的主要作用是交换source和target的数据信息.

// An highlighted block
template <typename PointSource, typename PointTarget>
void FastGICP<PointSource, PointTarget>::swapSourceAndTarget() {
 input_.swap(target_);
 source_kdtree_.swap(target_kdtree_);
 source_covs_.swap(target_covs_);

 correspondences_.clear();
 sq_distances_.clear();
}

至于为什么要交换数据信息，主要是为了方便下一步的计算。

然后就是gicp的位姿叠加操作了

// An highlighted block
poses[i] = poses[i - 1] * gicp.getFinalTransformation().cast<double>();

接下来就是帧率的计算和可视化部分了。

fast_gicp在kitti上的部分实验结果

一个简单的表格是这么创建的：

序列	ape (m)
01	152.4
07	247.9
09	390.1

总结

fast_gicp的核心思想很简单，就是将点云匹配的方法应用到里程计中，并进行效率方面的提升和优化。由于其侧重点不同，因此作者没有增加后端模块。
概率和特征的两条激光里程计之路，可能最终会走向融合。但是仍然在各自擅长的领域发挥作用。

fast_gicp代码解读（一）

fast_gicp系统介绍

理论方法分析

代码框架

fast_gicp在kitti上的部分实验结果

总结

相关阅读

相关文章

相关问答

相关文档