为什么scipy csr矩阵的行索引比numpy数组慢

华欣荣

2023-03-14

问题内容：

我不确定自己在做什么错，但csr_matrix与numpy数组相比，对scipy进行行索引似乎要慢大约2倍（请参见下面的代码）。

csr矩阵的行索引是否应该比稠密矩阵快，因为像下面的情况一样，只有很少的非零元素被提取出来？

是否有技巧使scipy csr矩阵的行索引更快？

import numpy as np
import timeit
from scipy.sparse import csr_matrix

# Generate random matrix
A = np.random.rand(5000, 1000)

# Make A sparse
A[:, 4:] =0

# Create sparse matrix
A_sparse = csr_matrix(A)

# Create row indexing functions
def index_A_dense():
    A[4]

def index_A_dense_copy():
    a = A[4].copy()

def index_A_sparse():
    A_sparse[4]

timeit.timeit(index_A_sparse, number=10000)
Out: 0.6668063667304978
timeit.timeit(index_A_dense, number=10000)
Out: 0.0029007720424942818
timeit.timeit(index_A_dense_copy, number=10000)
Out: 0.00979283193282754

提前致谢！

问题答案：

我在下面演示的简短答案是，构造新的稀疏矩阵非常昂贵。开销很大，不依赖于行数或特定行中非零元素的数量。

稀疏矩阵的数据表示形式与密集阵列的数据表示形式完全不同。数组将数据存储在一个连续的缓冲区中，并有效地使用shape和strides迭代选定的值。这些值加上索引定义了将在缓冲区中找到数据的确切位置。将这些N字节从一个位置复制到另一个是整个操作中相对较小的部分。

稀疏矩阵将数据存储在几个包含索引和数据的数组（或其他结构）中。然后，选择一行需要查找相关索引，并使用选定的索引和数据构建一个新的稀疏矩阵。稀疏包中有已编译的代码，但是底层代码不如numpy数组那么多。

为了说明这一点，我将制作一个小的矩阵，而不是那么密集，所以我们没有很多空行：

In [259]: A = (sparse.rand(5,5,.4,'csr')*20).floor()
In [260]: A
Out[260]: 
<5x5 sparse matrix of type '<class 'numpy.float64'>'
    with 10 stored elements in Compressed Sparse Row format>

密集等效项，以及一个行副本：

In [262]: Ad=A.A
In [263]: Ad
Out[263]: 
array([[  0.,   0.,   0.,   0.,  10.],
       [  0.,   0.,   0.,   0.,   0.],
       [ 17.,  16.,  14.,  19.,   6.],
       [  0.,   0.,   1.,   0.,   0.],
       [ 14.,   0.,   9.,   0.,   0.]])
In [264]: Ad[4,:]
Out[264]: array([ 14.,   0.,   9.,   0.,   0.])
In [265]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.58 µs per loop

矩阵行：

In [266]: A[4,:]
Out[266]: 
<1x5 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>

查看此csr矩阵（3个1d数组）的数据表示形式：

In [267]: A.data
Out[267]: array([  0.,  10.,  17.,  16.,  14.,  19.,   6.,   1.,  14.,   9.])  
In [268]: A.indices
Out[268]: array([3, 4, 0, 1, 2, 3, 4, 2, 0, 2], dtype=int32)
In [269]: A.indptr
Out[269]: array([ 0,  2,  2,  7,  8, 10], dtype=int32)

这是选择行的方式（但在已编译的代码中）：

In [270]: A.indices[A.indptr[4]:A.indptr[5]]
Out[270]: array([0, 2], dtype=int32)
In [271]: A.data[A.indptr[4]:A.indptr[5]]
Out[271]: array([ 14.,   9.])

“行”是另一个稀疏矩阵，具有相同类型的数据数组：

In [272]: A[4,:].indptr
Out[272]: array([0, 2])
In [273]: A[4,:].indices
Out[273]: array([0, 2])
In [274]: timeit A[4,:]

是的，稀疏矩阵的时序很慢。我不知道实际选择数据花费了多少时间，以及构造新矩阵花费了多少时间。

10000 loops, best of 3: 145 µs per loop
In [275]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.56 µs per loop

lil 格式可能更容易理解，因为数据和索引存储在子列表中，每行一个。

In [276]: Al=A.tolil()
In [277]: Al.data
Out[277]: array([[0.0, 10.0], [], [17.0, 16.0, 14.0, 19.0, 6.0], [1.0], [14.0, 9.0]], dtype=object)
In [278]: Al.rows
Out[278]: array([[3, 4], [], [0, 1, 2, 3, 4], [2], [0, 2]], dtype=object)
In [279]: Al[4,:].data
Out[279]: array([[14.0, 9.0]], dtype=object)
In [280]: Al[4,:].rows
Out[280]: array([[0, 2]], dtype=object)

这样的速度比较在处理紧密的编译代码时很有意义，在这种情况下，字节从内存的一部分到另一部分的移动是大量的时间消耗者。结合使用Python和已编译的代码numpy，scipy您不仅可以计算O(n)操作数。

============================

这是从中选择行所需的时间A，以及返回新的稀疏矩阵所需的时间：

只需获取数据：

In [292]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
   .....: 
100000 loops, best of 3: 4.92 µs per loop

加上制作矩阵所需的时间：

In [293]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.csr_matrix((d1,([0,0],i1)),shape=(1,5))
   .....: 
1000 loops, best of 3: 445 µs per loop

尝试更简单的coo矩阵

In [294]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.coo_matrix((d1,([0,0],i1)),shape=(1,5))
   .....: 
10000 loops, best of 3: 135 µs per loop

为什么scipy csr矩阵的行索引比numpy数组慢

相关阅读

相关文章

相关问答

相关工具

相关文档