by Oxidane Lin
本文翻译自FLANN的官方文档,原文是LATEX,改成了md格式方便阅读。能力时间所限,只翻译了C++部分,C/MATLAB/PYTHON还请自行对应。
We can define the nearest neighbor search (NNS) problem in the
following way: given a set of points
P
=
p
1
,
p
2
,
…
,
p
n
P=p_1,p_2,\dots,p_n
P=p1,p2,…,pn in a metric
space
X
X
X, these points must be preprocessed in such a way that given a
new query point
q
∈
X
q \in X
q∈X, finding the point in
P
P
P that is nearest to
q
q
q can be done quickly.
定义“最邻近搜索”问题如下:在某个空间 X X X中任意给定一组点 P = p 1 , p 2 , … , p n P=p_1,p_2,\dots,p_n P=p1,p2,…,pn,这些点经过预处理后,需要达到以下效果:任意给定一个新的查询点 q ∈ X q \in X q∈X,可以快速找到距离 q q q最近的点 P P P
The problem of nearest neighbor search is one of major importance in a
variety of applications such as image recognition, data compression,
pattern recognition and classification, machine learning, document
retrieval systems, statistics and data analysis. However, solving this
problem in high dimensional spaces seems to be a very difficult task and
there is no algorithm that performs significantly better than the
standard brute-force search. This has lead to an increasing interest in
a class of algorithms that perform approximate nearest neighbor
searches, which have proven to be a good-enough approximation in most
practical applications and in most cases, orders of magnitude faster
that the algorithms performing the exact searches.
在图像识别、数据压缩、模式识别和分类、机器学习、文档检索系统、统计和数据分析等各种应用中,最近邻搜索问题是一个非常重要的问题。然而,在高维空间中解决这个问题似乎是一项非常困难的任务,而且没有任何算法能比标准的蛮力搜索表现得更好。这使得人们对使用近似最近邻搜索的一类算法越来越感兴趣,在大多数实际应用中,这已经证明是一个足够好的近似,在大多数情况下,比执行精确搜索的算法快一个数量级。
FLANN (Fast Library for Approximate Nearest Neighbors) is a library for
performing fast approximate nearest neighbor searches. FLANN is written
in the C++ programming language. FLANN can be easily used in many
contexts through the C, MATLAB, Python, and Ruby bindings provided with
the library.
FLANN就是用C++写的一个快速近似最邻近算法。
This section contains small examples of how to use the FLANN library
from different programming languages (C++, C, MATLAB, Python, and Ruby).
// file flann_example.cpp
// 最简单的例子
#include <flann/flann.hpp>
#include <flann/io/hdf5.h>
#include <stdio.h>
int main(int argc, char** argv)
{
int nn = 3;
flann::Matrix<float> dataset;
flann::Matrix<float> query;
flann::load_from_file(dataset, "dataset.hdf5","dataset");
flann::load_from_file(query, "dataset.hdf5","query");
flann::Matrix<int> indices(new int[query.rows*nn], query.rows, nn);
flann::Matrix<float> dists(new float[query.rows*nn], query.rows, nn);
// construct an randomized kd-tree index using 4 kd-trees
flann::Index<flann::L2<float> > index(dataset, flann::KDTreeIndexParams(4));
index.buildIndex();
// do a knn search, using 128 checks
index.knnSearch(query, indices, dists, nn, flann::SearchParams(128));
flann::save_to_file(indices,"result.hdf5","result");
delete[] dataset.ptr();
delete[] query.ptr();
delete[] indices.ptr();
delete[] dists.ptr();
return 0;
}
FLANN can be downloaded from the following address:
http://www.cs.ubc.ca/\simmariusm/flann
After downloading and unpacking, the following files and directories
should be present:
bin
: directory various for scripts and binary files
doc
: directory containg this documentation
examples
: directory containg examples of using FLANN
src
: directory containg the source files
test
: directory containg unit tests for FLANN
To compile the FLANN library the CMake[^1] build system is required.
Below is an example of how FLANN can be compiled on Linux (replace x.y.z
with the corresponding version number).
$ cd flann-x.y.z-src
$ mkdir build
$ cd build
$ cmake ..
$ make
On windows the steps are very similar:
> "C:\Program Files\Microsoft Visual Studio 9.0\VC\vcvarsall.bat"
> cd flann-x.y.z-src
> mkdir build
> cd build
> cmake ..
> nmake
There are several compile options that can be configured before FLANN is
compiled, for example the build type (Release, RelWithDebInfo, Debug) or
whether to compile the C, Python or the MATLAB bindings. To change any
of this options use the cmake-gui
application after cmake
has
finished (see figure [fig:cmake-gui]).
> cmake-gui .
Configuring the FLANN compile options
This section contains changes that you need to be aware of when
upgrading from a previous version of FLANN.
Due to changes in the library, the on-disk format of the saved
indexes has changed and it is not possible to load indexes
previously saved with an older version of the library.
A small breaking API change in flann::Matrix
requires client code
to be updated. In order to release the memory used by a matrix, use:
delete[] matrix.ptr();
instead of:
delete[] matrix.data;
or:
matrix.free();
The member data
of flann::Matrix
is not publicly accessible any
more, use the ptr()
method to obtain the pointer to the memory
buffer.
For taking advantage of multithreaded search, the project that uses FLANN needs to be compiled with a compiler that supports the OpenMP standard and the OpenMP support must be enabled. The number of cores to be used can be selected with the cores
field in the SearchParams
structure. By default a single core will be used. Setting the cores
field to zero will automatically use as many threads as cores available on the machine.
SearchParams
里把cores
设为零可以自动用最多的核实现快速多线程搜索。
The core of the FLANN library is written in C++. To make use of the full power and flexibility of the templated code one should use the C++ bindings if possible. To use the C++ bindings you only need to include the the library header file flann.hpp
. An example of the compile command that must be used will look something like this:
g++ flann_example.cpp -I $FLANN_ROOT/include -o flann_example_cpp
where $FLANN_ROOT
is the library main directory.
The following sections describe the public C++ API.
用C++最能体现模板代码的优势所在,最好用C++进行编程。编译过程如上↑。
The FLANN nearest neighbor index class. This class is used to abstract different types of nearest neighbor search indexes. The class is templated on the distance functor to be used for computing distances between pairs of features.
Index类。是各种最临近搜索索引的抽象。该类用到了距离因子的模版,以计算特征之间的距离。
namespace flann
{
template<typename Distance>
class Index
{
typedef typename Distance::ElementType ElementType;
typedef typename Distance::ResultType DistanceType;
public:
Index(const IndexParams& params, Distance distance = Distance() );
Index(const Matrix<ElementType>& points, const IndexParams& params,
Distance distance = Distance() );
~Index();
void buildIndex();
void buildIndex(const Matrix<ElementType>& points);
void addPoints(const Matrix<ElementType>& points,
float rebuild_threshold = 2);
void removePoint(size_t point_id);
ElementType* getPoint(size_t point_id);
int knnSearch(const Matrix<ElementType>& queries,
Matrix<int>& indices,
Matrix<DistanceType>& dists,
size_t knn,
const SearchParams& params);
int knnSearch(const Matrix<ElementType>& queries,
std::vector< std::vector<int> >& indices,
std::vector<std::vector<DistanceType> >& dists,
size_t knn,
const SearchParams& params);
int radiusSearch(const Matrix<ElementType>& queries,
Matrix<int>& indices,
Matrix<DistanceType>& dists,
float radius,
const SearchParams& params);
int radiusSearch(const Matrix<ElementType>& queries,
std::vector< std::vector<int> >& indices,
std::vector<std::vector<DistanceType> >& dists,
float radius,
const SearchParams& params);
void save(std::string filename);
int veclen() const;
int size() const;
IndexParams getParameters() const;
flann_algorithm_t getType() const;
};
}
The Distance functor
The distance functor is a class whose operator()
computes the distance between two features. If the distance is also a kd-tree compatible distance it should also provide an accum_dist()
method that computes the distance between individual feature dimensions. A typical distance functor looks like this (see the dist.h
file for more examples):
距离因子是一个类,它的operator()
用于计算特征之间的“距离”。如果该距离兼容k-d树(多维树),它同时会提供一个accum_dist()
方法,用于计算单独维度中的距离。一个典型的距离因子计算方法如下,更多请参考dist.h
:
template<class T>
struct L2
{
typedef bool is_kdtree_distance;//判断是否为多维树
typedef T ElementType;//元素的类型和模板类型一致
typedef typename Accumulator<T>::Type ResultType; //
template <typename Iterator1, typename Iterator2>
ResultType operator()(Iterator1 a, Iterator2 b, size_t size,
ResultType /*worst_dist*/ = -1) const
{
ResultType result = ResultType();
ResultType diff;
for(size_t i = 0; i < size; ++i ) {
diff = *a++ - *b++;//对应位置上的a-b为差值
result += diff*diff;//a-b的平方和为距离
}
return result;
}
template <typename U, typename V>
inline ResultType accum_dist(const U& a, const V& b, int) const
{
return (a-b)*(a-b);
}
};
In addition to operator()
and accum_dist()
, a distance functor
should also define the ElementType
and the ResultType
as the types
of the elements it operates on and the type of the result it computes.
程序还应该指明元素的类型 ElementType
和ResultType
。
If a distance functor can be used as a kd-tree distance (meaning that the full distance between a pair of features can be accumulated from the partial distances between the individual dimensions) a typedef is_kdtree_distance
should be present inside the distance functor. If the distance is not a kd-tree distance, but it’s a distance in a vector space (the individual dimensions of the elements it operates on can be accessed independently) a typedef is_vector_space_distance
should be defined inside the functor. If neither typedef is defined, the distance is assumed to be a metric distance and will only be used with indexes operating on generic metric distances.\
如果一个距离因子是kdtree的距离,也就是说,一对特征之间的总距离可以从各个独立维度的微分距离累加得到,那么变量is_kdtree_distance
应该存在;如果某个距离因子不是kdtree的距离,而是一个向量空间的距离,(操作的每个独立的维度都可以独立获取),is_vector_space_distance
变量应该定义。如果这两者都没有被定义,这个距离会被认为是一个度量距离,只会用通用的度量距离算子进行处理。
flann::Index::Index Constructs a nearest neighbor search index for a given dataset.
Index(const IndexParams& params, Distance distance = Distance() );
Index(const Matrix<ElementType>& points, const IndexParams& params,
Distance distance = Distance() );
points 需要建立索引的特征形成的矩阵,每行一个特征,矩阵大小是特征数×维度。
: Matrix containing the features(points) that should be indexed,
stored in a row-major order (one point on each row of the matrix).
The size of the matrix is $num\_features \times dimensionality$.
params 包含索引参数的结构,构建索引的类型取决于这个参数,可用的类型有:
: Structure containing the index parameters. The type of index that
will be constructed depends on the type of this parameter. The
possible parameter types are:
LinearIndexParams 传递一个该类型的对象时,索引会做线性暴力搜索。
When passing an object of this type, the index will perform a linear, brute-force search.
struct LinearIndexParams : public IndexParams
{
};
KDTreeIndexParams 传递一个该类型的对象时,索引将由一系列随机的kdtrees并行搜索。
When passing an object of this type the index constructed will consist of a set of randomized kd-trees which will be searched in parallel.
struct KDTreeIndexParams : public IndexParams
{
KDTreeIndexParams( int trees = 4 );
};
trees 并行kdtree的数量,好的取值在1-16之间。
: The number of parallel kd-trees to use. Good values are in the
range [1..16]
KMeansIndexParams 传递一个该类型的对象时,索引将被构建成一个层次k均值树。
k均值聚类:以空间中k个点为中心进行聚类,对最靠近他们的对象归类。通过迭代的方法,逐次更新各聚类中心的值,直至得到最好的聚类结果。
When passing an object of this type the index constructed will be a hierarchical k-means tree.
struct KMeansIndexParams : public IndexParams
{
KMeansIndexParams( int branching = 32,
int iterations = 11,
flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,
float cb_index = 0.2 );
};
branching 分支数
: <span> The branching factor to use for the hierarchical k-means
tree </span>
iterations 聚类过程中的最大迭代数。如果是-1,聚类将一直进行到收敛。
: <span> The maximum number of iterations to use in the k-means
clustering stage when building the k-means tree. If a value of
-1 is used here, it means that the k-means clustering should be
iterated until convergence</span>
centers\_init 进行k均值聚类的时候,挑选初始中心点的算法。可以用的参数有:
CENTERS\_RANDOM 随机选
CENTERS\_GONZALES Gonzales算法
CENTERS\_KMEANSPP Kmeanspp算法
: <span> The algorithm to use for selecting the initial centers
when performing a k-means clustering step. The possible values
are CENTERS\_RANDOM (picks the initial cluster centers
randomly), CENTERS\_GONZALES (picks the initial centers using
Gonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initial
centers using the algorithm suggested in @arthur_kmeanspp_2007)
</span>
cb\_index 聚类边界系数:该参数影响分层k均值树中“探索”进行的方式。如果是0,
下一个探索的区域就是最近中心点的区域;大于0的值,会把区域的大小也考虑在内。
: <span> This parameter (cluster boundary index) influences the
way exploration is performed in the hierarchical kmeans tree.
When `cb_index` is zero the next kmeans domain to be explored is
choosen to be the one with the closest center. A value greater
then zero also takes into account the size of the domain.</span>
CompositeIndexParams 组合索引,用这种索引时,随机kdtree和层次k均值树会被组合使用。
When using a parameters object of this type the index created combines the randomized kd-trees and the hierarchical k-means tree.
struct CompositeIndexParams : public IndexParams
{
CompositeIndexParams( int trees = 4,
int branching = 32,
int iterations = 11,
flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,
float cb_index = 0.2 );
};
KDTreeSingleIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,用于搜索低维度的数据,例如3d点云。
When passing an object of this type the index will contain a single kd-tree optimized for searching lower dimensionality data (for example 3D point clouds)
struct KDTreeSingleIndexParams : public IndexParams
{
KDTreeSingleIndexParams( int leaf_max_size = 10 );
};
max\_leaf\_size 每片叶子上包含点的最大数量。
: The maximum number of points to have in a leaf for not branching
the tree any more.
KDTreeCuda3dIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,并且在CUDA上运算。大量搜索和查询点时,搜索表现最好。
When passing an object of this type the index will be a single kd-tree that is built and performs searches on a CUDA compatible GPU. Search performance is best for large numbers of search and query points. For more information, see section [sec:flann::cuda]
struct KDTreeCuda3dIndexParams : public IndexParams
{
KDTreeCuda3dIndexParams( int leaf_max_size = 64 );
};
max\_leaf\_size 每片叶子上包含点的最大数量。
: The maximum number of points to have in a leaf for not branching
the tree any more.
HierarchicalClusteringIndexParams 传递一个该类型的对象时,索引类型会是一个层次聚类索引。这种索引适用于任何度量距离,并且可以用汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a hierarchical clustering index. This type of index works with any metric distance and can be used for matching binary features using Hamming distances.
struct HierarchicalClusteringIndexParams : public IndexParams
{
HierarchicalClusteringIndexParams(int branching = 32,
flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,
int trees = 4, int leaf_max_size = 100)
};
branching 分支数
: <span> The branching factor to use for the hierarchical
clustering tree </span>
centers\_init 中心点选取算法
: <span> The algorithm to use for selecting the initial centers
when performing a k-means clustering step. The possible values
are CENTERS\_RANDOM (picks the initial cluster centers
randomly), CENTERS\_GONZALES (picks the initial centers using
Gonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initial
centers using the algorithm suggested in @arthur_kmeanspp_2007)
</span>
trees 并行树,最好在3-8.
: The number of parallel trees to use. Good values are in the
range [3..8]
leaf\_size 叶片大小
: The maximum number of points a leaf node should contain.
LshIndexParams 传递一个该类型的对象时,索引将会是一种多探寻的局部敏感哈希方法,只适用于汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a multi-probe LSH (Locality-Sensitive Hashing) index. This type of index can only be used for matching binary features using Hamming distances.
struct LshIndexParams : public IndexParams
{
LshIndexParams(unsigned int table_number = 12,
unsigned int key_size = 20,
unsigned int multi_probe_level = 2);
};
table\_number 使用哈希表的数量
: <span> The number of hash tables to use </span>
key\_size 哈希表键的长度
: <span> The length of the key in the hash tables</span>
multi\_probe\_level 多探寻中用到的级别,0是标准LSH
: Number of levels to use in multi-probe (0 for standard LSH)
AutotunedIndexParams 传递一个该类型的对象时,自动选择最佳的索引类型和参数(随机kdtree,层次k均值,线性)
When passing an object of this type the index created is automatically tuned to offer the best performance, by choosing the optimal index type (randomized kd-trees, hierarchical kmeans, linear) and parameters for the dataset provided.
struct AutotunedIndexParams : public IndexParams
{
AutotunedIndexParams( float target_precision = 0.9,
float build_weight = 0.01,
float memory_weight = 0,
float sample_fraction = 0.1 );
};
target\_precision 0-1之间表征近似最临近搜索返回值是真正最临近值的百分比,
此值越大,结果越精确,但搜索时间更长。最佳值一般取决于应用场景。
: <span> Is a number between 0 and 1 specifying the percentage of
the approximate nearest-neighbor searches that return the exact
nearest-neighbor. Using a higher value for this parameter gives
more accurate results, but the search takes longer. The optimum
value usually depends on the application. </span>
build\_weight 表征索引构建时间相比于最临近搜索时间重要性的比例。有些应用场合,
构建索引花费很长时间,随后的搜索非常快是允许的;另一些场景中要求索引被快速建立,
即使这会使搜索时间更长。
: <span> Specifies the importance of the index build time raported
to the nearest-neighbor search time. In some applications it’s
acceptable for the index build step to take a long time if the
subsequent searches in the index can be performed very fast. In
other applications it’s required that the index be build as fast
as possible even if that leads to slightly longer search
times.</span>
memory\_weight 表征索引建立和搜索所需时间和消耗内存之间的权重关系,
小于1的值更在意时间,大于1的值更在意内存消耗。
: <span>Is used to specify the tradeoff between time (index build
time and search time) and memory used by the index. A value less
than 1 gives more importance to the time spent and a value
greater than 1 gives more importance to the memory usage.</span>
sample\_fraction 采样比例:0-1之间表征自动参数认证算法中使用的数据集的比例大小。
当数据集过大时,耗时会非常久,用一部分表示整体也能得出很好的评估结果。
: <span>Is a number between 0 and 1 indicating what fraction of
the dataset to use in the automatic parameter configuration
algorithm. Running the algorithm on the full dataset gives the
most accurate results, but for very large datasets can take
longer than desired. In such case using just a fraction of the
data helps speeding up this algorithm while still giving good
approximations of the optimum parameters.</span>
SavedIndexParams 从硬盘读取一个之前保存好的索引。
This object type is used for loading a previously saved index from the disk.
struct SavedIndexParams : public IndexParams
{
SavedIndexParams( std::string filename );
};
filename 文件名字
: <span> The filename in which the index was saved. </span>
构建最临近搜索索引,有两个重载,一个以点为参数,另一个以对象创建时提供的点为参数。
Builds the nearest neighbor search index. There are two versions of the
buildIndex
method, one that uses the points provided as argument and
one that uses the points provided to the constructor when the object was
constructed.
void buildIndex();
void buildIndex(const Matrix<ElementType>& points);
addPoints
添加点的方法,用于向建好的索引中加点,为了防止索引失衡,有阈值参数控制何时重建索引,默认为2.
The addPoints
method can be used to incrementally add points to the
index after the index was build. To avoid the index getting unbalanced,
the addPoints
method has the option of rebuilding the index after a
large number of points have been added. The rebuild_threshold
parameter controls when the index is rebuild, by default this is done
when it doubles in size (rebuild_threshold
= 2).
void addPoints(const Matrix<ElementType>& points, float rebuild_threshold = 2);
removePoint
移除点的方法,留下的点的索引号不会改变。
The removePoint
method removes one point with the specified point_id
from the index. The indices (of the remaining points) returned by the
nearest neighbor operations do not change when points are removed from
the index.
void removePoint(size_t point_id);
getPoint
返回一个点的id号,指向数据中的一个点。
The getPoint
method returns a pointer to the data point with the specified point_id
.
ElementType* getPoint(size_t point_id);
进行k最近邻搜索,就是指最接近的k个邻居(数据),即每个样本都可以由它的K个邻居来表达。这个方法有两个重载,一个是预先分配内存的flann::Matrix
对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a K-nearest neighbor search for a set of query points. There
are two signatures for this method, one that takes pre-allocated
flann::Matrix
objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector>
that
will re resized automatically as needed.
int Index::knnSearch(const Matrix<ElementType>& queries,
Matrix<int>& indices,
Matrix<DistanceType>& dists,
size_t knn,
const SearchParams& params);
int Index::knnSearch(const Matrix<ElementType>& queries,
std::vector< std::vector<int> >& indices,
std::vector<std::vector<DistanceType> >& dists,
size_t knn,
const SearchParams& params);
queries 包含查询点的矩阵,大小是查询点数×维度。
: <span>Matrix containing the query points. Size of matrix is
($num\_queries \times dimentionality $)</span>
indices 包含找到的k临近结果的索引号,对预分配版本,最少要查询点数×knn的值。
: <span>Matrix that will contain the indices of the K-nearest
neighbors found (size should be at least $num\_queries \times knn$
for the pre-allocated version).</span>
dists 包含找到的k临近结果的距离信息,对预分配版本,最少要查询点数×knn的值。
注意:对于欧式距离,`flann::L2`计算的是平方距离,所以传到此处的值也应该是平方值。
: <span>Matrix that will contain the distances to the K-nearest
neighbors found (size should be at least $num\_queries \times knn$
for the pre-allocated version).\
*Note:* For Euclidean distances, the `flann::L2` functor computes
squared distances, so the value passed here needs to be a squared
distance as well. </span>
knn 最临近搜索要找的点的数量
: <span>Number of nearest neighbors to search for.</span>
params 搜索参数,搜索中要用的参数的结构。
: <span>Search parameters.</span> Structure containing parameters used
during search.
SearchParameters 搜索参数
struct SearchParams
{
SearchParams(int checks = 32,
float eps = 0,
bool sorted = true);
int checks;
float eps;
bool sorted;
int max_neighbors;
tri_type use_heap;
int cores;
bool matrices_in_gpu_ram;
};
checks 搜索临近点时最大叶子的数量。该值越大,搜索精度越高,耗时越久。所有叶片都搜到,用
`FLANN_CHECKS_UNLIMITED`;如果是自动认证的情况,达到预期精度所需checks的数量也会被计算,
想用该值的话,设为`FLANN_CHECKS_AUTOTUNED`。
: specifies the maximum leafs to visit when searching for
neighbours. A higher value for this parameter would give better
search precision, but also take more time. For all leafs to be
checked use the value `FLANN_CHECKS_UNLIMITED`. If automatic
configuration was used when the index was created, the number of
checks required to achieve the specified precision was also
computed, to use that value specify `FLANN_CHECKS_AUTOTUNED`.
eps 搜索 eps-大致临近的点,大概是epsilon的意思?只用于KDTreeSingleIndex/KDTreeCuda3dIndex。
: Search for eps-approximate neighbors (only used by
KDTreeSingleIndex and KDTreeCuda3dIndex).
sorted 只用于半径搜索,表征返回的临近点是否按距离排序。
: Used only by radius search, specifies if the neighbors returned
should be sorted by distance.
max\_neighbours 表征半径搜索返回临近值的最大数量,仅用于半径搜索。
: Specifies the maximum number of neighbors radius search should
return (default: -1 = unlimited). Only used for radius search.
use\_heap 用堆排序。用堆的数据结构来管理内部的临近列表(可选值 FLANN\_False, FLANN\_True,
FLANN\_Undefined)。堆对于大量临近点更高效,对少量值的效果不明显。默认值是Undefined,
让代码根据临近点的数量自己选择最佳值。该选项只用与knn搜索。
: Use a heap data structure to manage the list of neighbors
internally (possible values: FLANN\_False, FLANN\_True,
FLANN\_Undefined). A heap is more efficient for a large number
of neighbors and less efficient for a small number of neighbors.
Default value is FLANN\_Undefined, which lets the code choose
the best option depending on the number of neighbors requested.
Only used for KNN search.
cores 搜索中内核的数量。
: How many cores to assign to the search (specify 0 for automatic
core selection).
matrices\_in\_gpu\_ram GPU搜索中,表征矩阵是否已经在GPU内存中。
: for GPU search indicates if matrices are already in GPU ram.
进行半径最近邻搜索。这个方法有两个重载,一个是预先分配内存的flann::Matrix
对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a radius nearest neighbor search for a set of query points.
There are two signatures for this method, one that takes pre-allocated
flann::Matrix
objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector>
that
will resized automatically as needed.
int Index::radiusSearch(const Matrix<ElementType>& queries,
Matrix<int>& indices,
Matrix<DistanceType>& dists,
float radius,
const SearchParams& params);
int Index::radiusSearch(const Matrix<ElementType>& queries,
std::vector< std::vector<int> >& indices,
std::vector<std::vector<DistanceType> >& dists,
float radius,
const SearchParams& params);
indices
matrix.flann::L2
functor computesflann::L2
functor computesThe method returns the number of nearest neighbors found.
保存索引到一个文件
Saves the index to a file.
void Index::save(std::string filename);
多层聚类:将给定的点构造成一个层次k均值树,并且选择其中使聚类方差最小的一支cut。
Clusters the given points by constructing a hierarchical k-means tree
and choosing a cut in the tree that minimizes the clusters’ variance.
template <typename Distance>
int hierarchicalClustering(const Matrix<typename Distance::ElementType>& points,
Matrix<typename Distance::ResultType>& centers,
const KMeansIndexParams& params,
Distance d = Distance())
features 要被聚类的点
: <span>The points to be clustered</span>
centers 聚类得到的中心点。这个矩阵中的行数表征了所需聚类数量。然而,基于层次树中分支的选择,计算得到的聚类数量会是
$(branching-1)*k+1$ 的最大值,该值会比所求聚类数小。$branching$是分支数。
: <span>The centers of the clusters obtained. The number of rows in
this matrix represents the number of clusters desired. However,
because of the way the cut in the hierarchical tree is choosen, the
number of clusters computed will be the highest number of the form
$(branching-1)*k+1$ that’s lower than the number of clusters
desired, where $branching$ is the tree’s branching factor (see
description of the KMeansIndexParams). </span>
params
: <span>Parameters used in the construction of the hierarchical
k-means tree</span>
该方法的返回值是计算得到聚类的数量。
The function returns the number of clusters computed.
提供CUDA接口,用于大量3d数据集的提速。
FLANN provides a CUDA implementation of the kd-tree build and search
algorithms to improve the build and query speed for large 3d data sets.
This section will provide all the necessary information to use the
KdTreeCuda3dIndex
index type.
编译:如果CMake检测到CUDA模块,libflann_cuda.so
库将会被编译。
Building: If CMake detects a CUDA install during the build (see
section [sec:downloadingandcompiling]), a library libflann_cuda.so
will be built.
基本用法:
Basic Usage: To be able to use the new index type, you have to include the FLANN header this way:
#define FLANN_USE_CUDA
#include <flann/flann.hpp>
如果在头文件之前定义了FLANN_USE_CUDA
,你需要手动链接libflann_cuda.so
到你的项目。这样的好处是你不用编译的时候加上nvcc
参数了,除非你还要用其它的CUDA代码。
If you define the symbol FLANN_USE_CUDA
before including the FLANN
header, you will have to link libflann_cuda.so
or libflann_cuda_s.a
with your project. However, you will not have to compile your source
code with nvcc
, except if you use other CUDA code, of course.
然后,可以用KDTreeCuda3dIndexParams
方法创建索引,这个索引会负责所有索引创建和搜索的数据拷贝工作。
You can then create your index by using the KDTreeCuda3dIndexParams
to
create the index. The index will take care of copying all the data from
and to the GPU for you, both for index creation and search.
当心:关于选用CUDA还是多线程CPU进行kdtree运算,很难给出一条通用的指导意见。最好是尝试一下,看看效果决定。
A word of caution: A general guideline for deciding whether to use the
CUDA kd tree or a (multi-threaded) CPU implementation is hard to give,
since it depends on the combination of CPU and GPU in each system and
the data sets. For example, on a system with a Phenom II 955 CPU and a
Geforce GTX 260 GPU, the maximum search speedup on a synthetic (random)
data set is a factor of about 8-9 vs the single-core CPU search, and
starts to be reached at about 100k search and query points. (This
includes transfer times.) Build time does not profit as much from the
GPU acceleration; here the benefit is about 2x at 200k points, but this
largely depends on the data set. The only way to know which
implementation is best suited is to try it.
高级用法:一些情况下,GPU里已经有了缓存的数据,这时可以直接用缓存,而不必再从系统缓存中拷贝。具体的实现方法是,把GPU的指针传给flann::Matrix
的输入,并且告诉索引按照GPU的指针去处理。
Advanced Usage: In some cases, you might already have your data in a
buffer on the GPU. In this case, you can re-use these buffers instead of
copying the buffers back to system RAM for index creation and search.
The way to do this is to pass GPU pointers via the flann::Matrix
inputs and tell the index via the index and search params to treat the
pointers as GPU pointers.
thrust::device_vector<float4> points_vector( n_points );
// ... fill vector here...
float* gpu_pointer = (float*)thrust::raw_pointer_cast(&points_vector[0]);
flann::Matrix<float> matrix_gpu(gpu_pointer,n_points,3, 4);
flann::KDTreeCuda3dIndexParams params;
params["input_is_gpu_float4"]=true;
flann::Index<flann::L2<float> > flannindex( matrix_gpu, params );
flannindex.buildIndex();
首先,创建并填充一个格式为float4的GPU缓存
First, a GPU buffer of float4 values is created and filled with points.[^2]
然后,把一个指向该缓存的GPU指针存放在一个FLANN的矩阵中,3列,步长为4(·float4的最后一个元素应当为0
Then, a GPU pointer to the buffer is stored in a flann matrix with 3 columns and a stride of 4 (the last element in the float4
should be zero).
最后,创建索引,input_is_gpu_float4
标识表明以GPU缓存的形式去处理矩阵。
Last, the index is created. The input_is_gpu_float4
flag tells the index to treat the matrix as a gpu buffer.
同样的,返回值为flann矩阵的搜索过程也可以用GPU缓存来处理,但返回值为向量的不可以。要实现这种效果,索引中的指针和距离矩阵一定要指向GPU缓存,并且cols
的值必须设为3(因为我们在处理三维点)。这里的步长值stride
可以任意设置。
Similarly, you can specify GPU buffers for the search routines that
return the result via flann matrices (but not for those that return them
via std::vector
s). To do this, the pointers in the index and dist
matrices have to point to GPU buffers and the cols
value has to be set
to 3 (as we are dealing with 3d points). Here, any value for stride
can be used.
flann::Matrix<int> indices_gpu(gpu_pointer_indices,n_points, knn, stride);
flann::Matrix<float> dists_gpu(gpu_pointer_dists,n_points, knn, stride);
flann::SearchParams params;
params.matrices_in_gpu_ram = true;
flannindex.knnSearch( queries_gpu ,indices_gpu,dists_gpu,knn, params);
注意:这里千万不能混用CPU里的矩阵信息。
Note that you cannot mix matrices in system and CPU ram here!
Search Parameters: 搜索参数
The search routines support three parameters:搜索支持以下参数:
eps
- used for approximate knn search. The maximum possible error
is
e
=
d
b
e
s
t
∗
e
p
s
e= d_{best} * eps
e=dbest∗eps, i.e. the distance of the returned neighbor
is at maximum
e
p
s
eps
eps times larger than the distance to the real best
neighbor.
用于大致k临近搜索。最大可能误差值是 e = d b e s t ∗ e p s e= d_{best} * eps e=dbest∗eps,也就是返回的临近点的距离最多也就比实际最短距离多出了 e e e
use_heap
- used in knn and radius search. If set to true, a heap
structure will be used in the search to keep track of the distance
to the farthest neighbor. Beneficial with about
k
>
64
k>64
k>64 elements.
knn和半径搜索中用的参数,堆结构。堆结构主要用来跟踪最远临近点,元素数多于64比较有益。
sorted
- if set to true, the results of the radius and knn
searches will be sorted in ascending order by their distance to the
query point.
设为true时,半径搜索和knn搜索会按照到查询点距离的升序排列
matrices_in_gpu_ram
- set to true to indicate that all (input and
output) matrices are located in GPU RAM.
设为true表示所有矩阵信息都是在GPU内存里的。