Let’s say that you have a sparse matrix:

import numpy as np
from scipy.sparse import

x = csr_matrix(np.array([[1, 0, 2, 0, 3], 
                         [0, 4, 0, 5, 0]]))
print(x)


<2x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>



One of the most common things that you might want to do is to make a conditional selection from the matrix and then set those particular elements of the matrix to, say, zero. For example, we can take our matrix from above and set all elements that have a value that are less than three to zero. Naively, one could do:

x[x < 3] = 0



This works and is fine for small matrices. However, you’ll likely encounter a warning message such as the following:

/home/miniconda3/lib/python3.6/site-packages/scipy/sparse/compressed.py:282: SparseEfficiencyWarning: Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead.
  warn(bad_scalar_msg, SparseEfficiencyWarning)



The problem here is that for large sparse matrices, the majority of the matrix is full of zeros and so the < comparison becomes highly inefficient. Instead, you really only want to perform your comparison only with the nonzero elements of the matrix. However, this takes a little more work and a few more lines of code to accomplish the same thing. Additionally, we want to avoid converting our sparse matrices into costly dense arrays.
First, we’ll create a nonzero mask that keeps track of all of the nonzero elements that are less than 3 and returns the indices relative to the set of nonzero indices (and not the sparse or dense arrays).

nonzero_mask = np.array(x[x.nonzero()] < 3)[0]



Next, with the appropriate nonzero mask, we can obtain the corresponding row and column indices (for the sparse matrix) for all of the nonzero elements that are less than 3.

rows = x.nonzero()[0][nonzero_mask]
cols = x.nonzero()[1][nonzero_mask]



Finally, with the proper sparse matrix indices in hand (that correspond to elements that are less than 3), we can set those elements’ value to zero:

x[rows, cols] = 0

print(x.todense())



And our array will end up as we had expected

[[0 0 0 0 3]
 [0 4 0 5 0]]



Of course, for a sufficiently large sparse matrix, we’d want to remove those zero elements

x.eliminate_zeros()  # This happens inplace



I hope this helps you when you are looking to manipulate your sparse matrix and leave a comment below!


Published

Feb 27, 2019