Differences between revisions 116 and 130 (spanning 14 versions)

NumPy/SciPy Cheat Sheet

This cheat sheet is a quick reference for NumPy / SciPy beginners and gives an overview about the most important commands and functions of NumPy and SciPy that you might need on solving the exercise sheets about Linear Algebra in Information Retrieval. It doesn't claim to be complete and will be extended continuously. If you think that some important thing is missing or if you find any errors, please let us know.

Contents

NumPy/SciPy Cheat Sheet

General

What is NumPy?

A library that allows to work with arrays and matrices in Python.

What is SciPy?

Another library built upon NumPy that provides advanced Linear Algebra stuff.

Install

The routine to install NumPy and SciPy depends on your operating system.

Linux (Ubuntu, Debian)

apt-get install python3-numpy python3-scipy

Other systems (Windows, Mac, etc.)

For all other systems (Windows, Mac, etc.) see the instructions given on the offical SciPy website.

Matrix construction

We distinguish between dense matrices and sparse matrices (Note: The color code will be used conistently throughout this cheat sheet).

Dense matrices store every entry in the matrix, while sparse matrices only store the non-zero entries (together with their row and column index). Dense matrices are more feature-rich, but may consume more memory space than sparse matrices (in particular if most of the entries in a matrix are zero).

Dense matrices

In NumPy, there are two concepts of dense matrices: matrices and arrays. Matrices are strictly 2-dimensional, while arrays are n-dimensional (the term array is a bit misleading here).

Construct a matrix:

Dense

numpy.matrix(arg, dtype=None)  Reference

arg:
   The data to construct the matrix from, given as
     (1) a standard Python array; or
     (2) a string with columns separated by commas or spaces and rows separated by semicolons.
dtype (str, optional):
   The type of the entries in the matrix (e.g., 'integer', 'float', 'string', etc.).


Examples:

>>> numpy.matrix("1 2; 3 4")
[[1 2]
 [3 4]]

>>> numpy.matrix([[1, 2], [3, 4]], dtype='float')
[[1.0 2.0]
 [3.0 4.0]]

Construct an array:

Dense

numpy.array(arg, dtype=None, ndmin=0)  Reference

arg:
   The data to construct the matrix from, given as:
      (1) a standard array; or
      (2) a function that returns an array.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).
ndmin (int, optional):
   The minimum number of dimensions that the array should have.


Examples:

>>> numpy.array([[1, 2], [3, 4]])
[[1 2]
 [3 4]]

>>> numpy.array([[1, 2], [3, 4]], dtype='float')
[[1.0 2.0]
 [3.0 4.0]]

>>> numpy.array([[1, 2], [3, 4]], ndmin=3)
[[[1 2]
  [3 4]]]

Sparse matrices

There are two principle concepts of sparse matrices:

Compressed Sparse Row matrix (CSR matrix): entries are stored row by row (sorted by row index first)
Compressed Sparse Column matrix (CSC matrix): entries are stored column by column (sorted by column index first)

Construct a CSR/CSC matrix:

Sparse

scipy.sparse.csr_matrix(arg, shape=None, dtype=None)  Reference
scipy.sparse.csc_matrix(arg, shape=None, dtype=None)  Reference

arg:
   The data to create the CSR matrix from, given as 
     * a dense matrix; or
     * another sparse matrix; or
     * a tuple (m, n), to construct an empty matrix with shape (n, m); or
     * a tuple (data, (rows, cols), to construct a matrix A where A[rows[k], cols[k]] = data[k]; or
     * a tuple (data, indices, indptr)
shape (int or sequence of ints):
   The dimensions of the matrix to create.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).


Examples:

>>> scipy.sparse.csr_matrix([[1, 2, 3], [0, 0, 1], [0, 1, 3]])
[[1 2 3]
 [0 0 1]
 [0 1 3]]  # (transformed to a dense matrix for visualization).

>>> scipy.sparse.csc_matrix([[1, 2, 3], [0, 0, 1], [0, 1, 3]])
[[1 2 3]
 [0 0 1]
 [0 1 3]]  # (transformed to a dense matrix for visualization).

>>> values = [1, 2, 3]
>>> rows   = [0, 0, 1]
>>> cols   = [0, 1, 3]
>>> scipy.sparse.csr_matrix((values, (rows, columns)), shape=[5, 5], dtype=int)
[[1 2 0 0]
 [0 0 0 3]
 [0 0 0 0]
 [0 0 0 0]]  # (transformed to a dense matrix for visualization).

>>> values = [1, 2, 3]
>>> rows   = [0, 0, 1]
>>> cols   = [0, 1, 3]
>>> scipy.sparse.csc_matrix((values, (rows, columns)), shape=[5, 5], dtype=int)
[[1 2 0 0]
 [0 0 0 3]
 [0 0 0 0]
 [0 0 0 0]]  # (transformed to a dense matrix for visualization).

Special matrices

There are some utility functions to create special matrices/arrays:

(1) Construct an empty array, without initializing the entries (an array with random entries):

Dense

numpy.empty(shape, dtype=float)  Reference

shape (int or sequence of ints):
   The dimensions of the array to create.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).


Examples:

>>> numpy.empty(3)
[6.95052181e-310 1.74512682e-316 1.58101007e-322]

>>> numpy.empty([3, 2], dtype='int')
[[140045355821992 140045355821992]
 [140045136216840 140045136244784]
 [140045125643544 140045153116544]]

(2) Construct an array filled with zeros:

Dense

numpy.zeros(shape, dtype=float)  Reference

shape (int or sequence of ints):
   The dimensions of the array to create.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).


Examples

>>> numpy.zeros(3)
[0.0, 0.0, 0.0]

>>> numpy.zeros([3, 2], dtype='int')
[[0 0]
 [0 0]
 [0 0]]

(3) Construct an array filled with ones:

Dense

numpy.ones(shape, dtype=float)  Reference

shape (int or sequence of ints):
   The dimensions of the array to create.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).


Examples:

>>> numpy.ones(3)
[1.0, 1.0, 1.0]

>>> numpy.ones([3, 2], dtype='int')
[[1 1]
 [1 1]
 [1 1]]

(4) Construct a diagonal array, a (usually square) array in which all entries are 0, except on the main diagonal:

Dense

numpy.diag(arg, k=0)  Reference

arg (1-dim array):
   The entries of the diagonal.
k (int, optional):
   The diagonal in question. Use k > 0 for diagonals above the main diagonal, and k < 0 for diagonals below the main diagonal. 


Examples:

>>> numpy.diag([1, 2, 3])
[[1 0 0]
 [0 2 0]
 [0 0 3]]

>>> numpy.diag([1, 2, 3], k=1)
[[0 1 0 0]
 [0 0 2 0]
 [0 0 0 3]
 [0 0 0 0]]

>>> numpy.diag([1, 2, 3], k=-1)
[[0 0 0 0]
 [1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]]

Sparse

scipy.sparse.diags(diagonals, offsets=0, dtype=None)  Reference

diagonals (sequence of arrays):
   The entries of the matrix diagonals.
offsets (sequence of ints or int, optional):
   The diagonals in question. k = 0 is the main diagonal; k > 0 is the k-th upper diagonal; k < 0 is the k-th lower diagonal
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).


Examples:

>>> scipy.sparse.diags([1, 2, 3])
[[1.0 0.0 0.0]
 [0.0 2.0 0.0]
 [0.0 0.0 3.0]]  # (transformed to a dense matrix for visualization).

>>> scipy.sparse.diags([[1, 2, 3], [4, 5, 6]], offsets=[0, 1])
[[1.0 4.0 0.0]
 [0.0 2.0 5.0]
 [0.0 0.0 3.0]]  # (transformed to a dense matrix for visualization).

(5) Construct an identity array, a square array in which all entries on the main diagonal are 1 and all other entries are 0:

Dense

numpy.identity(n, dtype=float)  Reference

n (int):
   The dimension of the array to create (the output is a n x n array).
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).


Examples:

>>> numpy.identity(3)
[[1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0]]

>>> numpy.identity(3, dtype=int)
[[1, 0, 0]
 [0, 1, 0]
 [0, 0, 1]]

Sparse

scipy.sparse.identity(n, dtype=float, format="csr")  Reference

n (int):
   The dimension of the array to create.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).
format (str, optional)
   The sparse format of the array, e.g. "csr" or "csc".


Examples:

>>> scipy.sparse.identity(3)
[[1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0]]  # (transformed to a dense matrix for visualization).

>>> scipy.sparse.identity(3, dtype=int)
[[1, 0, 0]
 [0, 1, 0]
 [0, 0, 1]]  # (transformed to a dense matrix for visualization).

(6) Construct an triangular array, a square array in which all entries below (upper triangle) or above (lower triangle) the main diagonal are zero:

Dense

numpy.triu(arg, k=0)  # Zero entries in the upper triangle of an array.  Reference
numpy.tril(arg, k=0)  # Zero entries in the lower triangle of an array.  Reference

arg (array):
   The original array.
k (int, optional):
   Diagonal above which to zero entries. k = 0 is the main diagonal, k < 0 is below it and k > 0 is above.


Examples:

>>> numpy.triu([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
[[1 2 3]
 [0 5 6]
 [0 0 9]]

>>> numpy.triu([[1, 2, 3], [4, 5, 6], [7, 8, 9]], k=1)
[[0 2 3]
 [0 0 6]
 [0 0 0]]

>>> numpy.tril([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
[[1 0 0]
 [4 5 0]
 [7 8 9]]

>>> numpy.tril([[1, 2, 3], [4, 5, 6], [7, 8, 9]], k=-1)
[[0 0 0]
 [4 0 0]
 [7 8 0]]

Sparse

scipy.sparse.triu(arg, k=0, format="csr")  # Zero entries in the upper triangle of an array.  Reference
scipy.sparse.tril(arg, k=0, format="csr")  # Zero entries in the lower triangle of an array.  Reference

arg (array):
   The original array.
k (int, optional):
   Diagonal above which to zero entries. k = 0 is the main diagonal, k < 0 is below it and k > 0 is above.
format (str, optional)
   The sparse format of the array, e.g. "csr" or "csc".


Examples:

>>> scipy.sparse.triu([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
[[1 2 3]
 [0 5 6]
 [0 0 9]]  # (transformed to a dense matrix for visualization).

>>> scipy.sparse.triu([[1, 2, 3], [4, 5, 6], [7, 8, 9]], k=1)
[[0 2 3]
 [0 0 6]
 [0 0 0]]  # (transformed to a dense matrix for visualization).

>>> scipy.sparse.tril([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
[[1 0 0]
 [4 5 0]
 [7 8 9]]  # (transformed to a dense matrix for visualization).

>>> scipy.sparse.tril([[1, 2, 3], [4, 5, 6], [7, 8, 9]], k=-1)
[[0 0 0]
 [4 0 0]
 [7 8 0]]  # (transformed to a dense matrix for visualization).

Accessing elements

TODO: crazy element access magic, single elements, entire rows, sub-matrices

Matrix operations

Adding a constant

The addition of a constant adds the constant to every element of a matrix (only available for dense matrices).

Dense

numpy.tril(arg, k=0)  # Zero entries in the lower triangle of an array.  Reference
A + c

A (matrix or array):
   The matrix/array.
c (constant):
   The constant.

Examples:

>>> A = np.matrix([[2, 1], [3, 5]], dtype=float)
>>> A + 4
[[6 5]
 [7 9]]

Multiplying by a constant

Multiplying by a constant multiplies every element of a matrix by that constant (both for sparse and dense matrices).

Dense

A * c

A (matrix or array):
   The matrix/array.
c (constant):
   The constant.

Examples:

>>> A = np.matrix([[2, 1], [3, 5]], dtype=float)
>>> A * 4
[[8 4]
 [12 20]]

Sparse

A * c

A (sparse matrix):
   The matrix.
c (constant):
   The constant.

Examples:

>>> A = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float)
>>> A * 4
[[4 0]
 [0 4]
 [12 8]] # (transformed to a dense matrix for visualization).

Multiplying two matrices

A * B
A.dot(B)   Reference

A (python array, numpy array or sparse matrix):
   The first array.
B (python array, numpy array or sparse matrix):
   The second array.

Examples:

>>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float)
>>> A_dense = A_sparse.todense()
>>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float)
>>> B_sparse = csr_matrix(B_dense)

## Sparse with sparse
>>> X = A_sparse * B_sparse  # multiplying 3x2 matrix with 2x2 matrix
>>> X.todense()  # result is a sparse matrix.
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]

>>> X = A_sparse.dot(B_sparse)  # same as A_sparse * B_sparse
>>> X.todense()  # result is a sparse matrix.
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])

## Sparse with dense
>>> A_sparse * B_dense  # multiplying 3x2 matrix with 2x2 matrix
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]])  # result is a dense matrix.
>>> A_sparse.dot(B_dense)  #  same as A_sparse * B_dense
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]  # result is a dense matrix.

## Dense with sparse
>>> A_dense * B_sparse  # multiplying 3x2 matrix with 2x2 matrix
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]  # result is a dense matrix.

>>> A_dense.dot(B_sparse)
matrix([[ <2x2 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in Compressed Sparse Row format>,
         <2x2 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in Compressed Sparse Row format>],
        [ <2x2 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in Compressed Sparse Row format>,
         <2x2 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in Compressed Sparse Row format>],
        [ <2x2 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in Compressed Sparse Row format>,
         <2x2 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in Compressed Sparse Row format>]], dtype=object)

## Dense with dense
>>> A_dense * B_dense  # (normal matrix multiplication, 3x2 matrix with 2x2 matrix)
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]
>>> A_dense.dot(B_dense)  # (normal matrix multiplication, 3x2 matrix with 2x2 matrix)
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]

Useful methods

TODO

-  ⇤ ← Revision 116 as of 2017-12-14 18:43:28 → 
  Size: 17418
  Editor: adpult
  Comment:
+   ← Revision 130 as of 2018-12-11 14:19:17 → ⇥
  Size: 22911
  Editor: adpult
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 23:
-apt-get install python-numpy python-scipy
+apt-get install python3-numpy python3-scipy
 Line 35:
-We distinguish between <span style="font-weight: bold; background-color: #E5E5FF; border: 1pt solid #AEBDCC; padding: 2pt;">dense matrices</span> and <span style="font-weight: bold; background-color: #E5FFE5; border: 1pt solid #AEBDCC; padding: 2pt;">sparse matrices</span> (Note: This color code will be used conistently throughout this cheat sheet).
+We distinguish between <span style="font-weight: bold; background-color: #E5E5FF; border: 1pt solid #AEBDCC; padding: 2pt;">dense matrices</span> and <span style="font-weight: bold; background-color: #E5FFE5; border: 1pt solid #AEBDCC; padding: 2pt;">sparse matrices</span> (Note: The color code will be used conistently throughout this cheat sheet).
 Line 148:
-[[1 1 0 0]
+[[1 2 0 0]
 Line 157:
-[[1 1 0 0]
+[[1 2 0 0]
 Line 461:
+=== Adding a constant ===
The addition of a constant adds the constant to every element of a matrix (only available for dense matrices).

{{{#!html
<div style="background-color: #E5E5FF; padding: 5pt; border: 1pt solid #AEBDCC; margin: 0pt 0pt 25pt 0pt;">
<span style="background-color: #7F7FFF; padding: 2pt 5pt; float: right;">Dense</span>
<pre style="background-color: #E5E5FF; border: none; margin: 0; padding: 0">
<span style="font-weight: bold;">numpy.tril(arg, k=0)</span>  # Zero entries in the lower triangle of an array.  <a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.tril.html" class="https">Reference</a>
<span style="font-weight: bold;">A + c</span>

A (matrix or array):
   The matrix/array.
c (constant):
   The constant.

<span style="font-weight: bold;">Examples:</span>

>>> A = np.matrix([[2, 1], [3, 5]], dtype=float)
>>> A + 4
[[6 5]
 [7 9]]
</pre>
</div>
}}}

=== Multiplying by a constant ===
Multiplying by a constant multiplies every element of a matrix by that constant (both for sparse and dense matrices).
{{{#!html
<div style="background-color: #E5E5FF; padding: 5pt; border: 1pt solid #AEBDCC; margin: 0pt 0pt 25pt 0pt;">
<span style="background-color: #7F7FFF; padding: 2pt 5pt; float: right;">Dense</span>
<pre style="background-color: #E5E5FF; border: none; margin: 0; padding: 0">
<span style="font-weight: bold;">A * c</span>

A (matrix or array):
   The matrix/array.
c (constant):
   The constant.

<span style="font-weight: bold;">Examples:</span>

>>> A = np.matrix([[2, 1], [3, 5]], dtype=float)
>>> A * 4
[[8 4]
 [12 20]]
</pre>
</div>
}}}

{{{#!html
<div style="background-color: #E5FFE5; padding: 5pt; border: 1pt solid #AEBDCC; margin: 0pt 0pt 25pt 0pt;">
<span style="background-color: #009900; padding: 2pt 5pt; float: right;">Sparse</span>
<pre style="background-color: #E5FFE5; border: none; margin: 0; padding: 0">
<span style="font-weight: bold;">A * c</span>

A (sparse matrix):
   The matrix.
c (constant):
   The constant.

<span style="font-weight: bold;">Examples:</span>

>>> A = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float)
>>> A * 4
[[4 0]
 [0 4]
 [12 8]] # (transformed to a dense matrix for visualization).
</pre>
</div>
}}}


=== Multiplying two matrices ===

{{{#!html
<div style="background-color: #EEEEEE; padding: 5pt; border: 1pt solid #AEBDCC; margin: 0pt 0pt 25pt 0pt;">
<pre style="background-color: #EEEEEE; border: none; margin: 0; padding: 0">
<span style="font-weight: bold;">A * B</span>
<span style="font-weight: bold;">A.dot(B)</span>   <a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dot.html" class="https">Reference</a>

A (python array, numpy array or sparse matrix):
   The first array.
B (python array, numpy array or sparse matrix):
   The second array.

<span style="font-weight: bold;">Examples:</span>

>>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float)
>>> A_dense = A_sparse.todense()
>>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float)
>>> B_sparse = csr_matrix(B_dense)

<span style="font-weight: bold;">## Sparse with sparse</span>
>>> X = A_sparse * B_sparse  # multiplying 3x2 matrix with 2x2 matrix
>>> X.todense()  # result is a <span style="font-weight: bold;">sparse</span> matrix.
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]

>>> X = A_sparse.dot(B_sparse)  # same as A_sparse * B_sparse
>>> X.todense()  # result is a <span style="font-weight: bold;">sparse</span> matrix.
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])

<span style="font-weight: bold;">## Sparse with dense</span>
>>> A_sparse * B_dense  # multiplying 3x2 matrix with 2x2 matrix
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]])  # result is a <span style="font-weight: bold;">dense</span> matrix.
>>> A_sparse.dot(B_dense)  #  same as A_sparse * B_dense
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]  # result is a <span style="font-weight: bold;">dense</span> matrix.

<span style="font-weight: bold;">## Dense with sparse</span>
>>> A_dense * B_sparse  # multiplying 3x2 matrix with 2x2 matrix
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]  # result is a <span style="font-weight: bold;">dense</span> matrix.

>>> A_dense.dot(B_sparse)
matrix([[ &lt;2x2 sparse matrix of type '&lt;class 'numpy.float64'&gt;'
  with 4 stored elements in Compressed Sparse Row format&gt,
         &lt;2x2 sparse matrix of type '&lt;class 'numpy.float64'&gt'
  with 4 stored elements in Compressed Sparse Row format&gt],
        [ &lt;2x2 sparse matrix of type '&lt;class 'numpy.float64'&gt'
  with 4 stored elements in Compressed Sparse Row format&gt,
         &lt;2x2 sparse matrix of type '&lt;class 'numpy.float64'&gt'
  with 4 stored elements in Compressed Sparse Row format&gt],
        [ &lt;2x2 sparse matrix of type '&lt;class 'numpy.float64'&gt'
  with 4 stored elements in Compressed Sparse Row format&gt,
         &lt;2x2 sparse matrix of type '&lt;class 'numpy.float64'&gt'
  with 4 stored elements in Compressed Sparse Row format&gt]], dtype=object)

<span style="font-weight: bold;">## Dense with dense</span>
>>> A_dense * B_dense  # (normal matrix multiplication, 3x2 matrix with 2x2 matrix)
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]
>>> A_dense.dot(B_dense)  # (normal matrix multiplication, 3x2 matrix with 2x2 matrix)
[[  2.,   1.],
 [  3.,   4.],
 [ 12.,  11.]]

</pre>
</div>
}}}

------

== Useful methods ==
-Line 462:
+Line 614:
-------

== Useful methods ==

TODO