#acl All:read

= NumPy/SciPy Cheat Sheet =

This cheat sheet is a quick reference for !NumPy / !SciPy beginners and gives an overview about the most important commands and functions of !NumPy and !SciPy that you might need on solving the exercise sheets about Linear Algebra in Information Retrieval. It doesn't claim to be complete and will be extended continuously. If you think that some important thing is missing or if you find any errors, please let us know.

== General ==

=== What is NumPy? ===
A library that allows to work with arrays and matrices in Python.

=== What is SciPy? ===
Another library built upon !NumPy that provides advanced Linear Algebra stuff.

== Install ==
The routine to install !NumPy and !SciPy depends on your operating system.

=== Linux (Ubuntu, Debian) ===
{{{
apt-get install python-numpy python-scipy
}}}

=== Other systems (Windows, Mac, etc.) ===

For all other systems (Windows, Mac, etc.) see the instructions given on the offical [[https://scipy.org/install.html|SciPy website]].

------

== Matrix construction ==

We distinguish between '''dense matrices''' and '''sparse matrices'''. Dense matrices store every entry in the matrix, while sparse matrices only store the non-zero entries (together with their row and column index). Dense matrices are more feature-rich, but may consume more memory space than sparse matrices (in particular if most of the entries in a matrix are zero).

=== Dense matrices ===
In !NumPy, there are two concepts of dense matrices: '''matrices''' and '''arrays'''. Matrices are strictly 2-dimensional, while arrays are n-dimensional (the term ''array'' is a bit misleading here).

Construct a matrix:
{{{
numpy.matrix(arg, dtype=None)

arg:
   The data to construct the matrix from, given as
     * a standard Python array; or
     * a string with columns separated by commas or spaces and rows separated by semicolons.
dtype (str, optional):
   The type of the entries in the matrix (e.g., 'integer', 'float', 'string', etc.).

----------
Examples:

>>> numpy.matrix("1 2; 3 4")
[[1 2]
 [3 4]]

>>> numpy.matrix([[1, 2], [3, 4]], dtype='float')
[[1.0 2.0]
 [3.0 4.0]]
}}}

Construct an array:
{{{
numpy.array(arg, dtype=None, ndmin=0)

arg:
   The data to construct the matrix from, given as
      * a standard array; or
      * a function that returns an array.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).
ndmin (int, optional):
   The minimum number of dimensions that the array should have.

----------
Examples:

>>> numpy.array([[1, 2], [3, 4]])
[[1 2]
 [3 4]]

>>> numpy.array([[1, 2], [3, 4]], dtype='float')
[[1.0 2.0]
 [3.0 4.0]]

>>> numpy.array([[1, 2], [3, 4]], ndmin=3)
[[[1 2]
  [3 4]]]
}}}

There are some utility functions to create special-structured arrays:

(1) Construct an array filled with zeros:
{{{
numpy.zeros(shape, dtype=float)

shape (int or sequence of ints):
   The dimensions of the array to create.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).

----------
Examples:

>>> numpy.zeros(3)
[0.0, 0.0, 0.0]

>>> numpy.zeros([3, 2], dtype='int')
[[0 0]
 [0 0]
 [0 0]]
}}}

(2) Construct an array without initializing the entries (an array with random entries):
{{{
numpy.empty(shape, dtype=float)

shape (int or sequence of ints):
   The dimensions of the array to create.
dtype (str, optional):
   The type of the entries in the matrix ('integer', 'float', 'string', etc.).

----------
Examples:

>>> numpy.empty(3)
[6.95052181e-310 1.74512682e-316 1.58101007e-322]

>>> numpy.empty([3, 2], dtype='int')
[[140045355821992 140045355821992]
 [140045136216840 140045136244784]
 [140045125643544 140045153116544]]
}}}

=== Sparse matrices ===
Construct a ''Compressed Sparse Row matrix'':
{{{
scipy.sparse.csr_matrix(arg, shape=None, dtype=None, copy=False)

arg:
   * A dense matrix; or
   * Another sparse matrix; or
   * A tuple (m, n), to construct an empty matrix with shape (n, m); or
   * A tuple (data, (rows, cols), to construct a matrix A where A[rows[k], cols[k]] = data[k]; or
   * A tuple (data, indices, indptr)
}}}

Examples:
{{{
from scipy.sparse import csr_matrix


}}}


== Accessing elements ==

TODO (Hannah): crazy element access magic, single elements, entire rows, sub-matrices

== Matrix operations ==

=== Constant addition ===
Addition of a constant adds it to every element of the matrix (only for dense matrices)

{{{
>>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float)
>>> B_dense + 10
matrix([[ 12.,  11.],
        [ 13.,  14.]])
}}}

=== Multiplication by a constant ===
Multiplication by a constant multiplies every element of the matrix by that constant (both for sparse and dense matrices)

{{{
>>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float)
>>> (A_sparse * 10).todense()
matrix([[ 10.,   0.],
        [  0.,  10.],
        [ 30.,  20.]])
}}}

=== Multiplication ===
* produces the '''normal''' matrix multiplication between a csr_matrix (sparse) and a numpy matrix (dense).<<BR>>
* produces the '''element-wise''' matrix multiplication for numpy arrays (also dense). In these cases Python broadcasts the operands in case their dimensions mismatch.

matrix.dot() produces the normal matrix multiplication between a csr_matrix and a numpy matrix '''except''' in the case of a dense.dot(sparse) matrix multiplication.

The result of a matrix multiplication between:
 *a sparse and a sparse matrix is sparse
 *a sparse and a dense matrix is dense
 *a dense and a dense matrix is dense

https://docs.scipy.org/doc/scipy/reference/sparse.html <<BR>>
http://www.scipy-lectures.org/intro/numpy/operations.html

{{{
>>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float)
>>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float)

>>> A_dense = A_sparse.todense()
>>> B_sparse = csr_matrix(B_dense)


## Sparse with sparse
>>> C_sparse = A_sparse * B_sparse #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix)
>>> C_sparse.todense()
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])

>>> C_sparse = A_sparse.dot(B_sparse) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix)
>>> C_sparse.todense()
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])


## Sparse with dense
>>> C_dense = A_sparse * B_dense #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix)
>>> C_dense
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])
>>> C_dense = A_sparse.dot(B_dense) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix)
>>> C_dense
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])


## Dense with sparse
>>> C_dense = A_dense * B_sparse
>>> C_dense
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])

>>> A_dense.dot(B_sparse)
matrix([[ <2x2 sparse matrix of type '<class 'numpy.float64'>'
 	with 4 stored elements in Compressed Sparse Row format>,
         <2x2 sparse matrix of type '<class 'numpy.float64'>'
 	with 4 stored elements in Compressed Sparse Row format>],
        [ <2x2 sparse matrix of type '<class 'numpy.float64'>'
 	with 4 stored elements in Compressed Sparse Row format>,
         <2x2 sparse matrix of type '<class 'numpy.float64'>'
 	with 4 stored elements in Compressed Sparse Row format>],
        [ <2x2 sparse matrix of type '<class 'numpy.float64'>'
 	with 4 stored elements in Compressed Sparse Row format>,
         <2x2 sparse matrix of type '<class 'numpy.float64'>'
 	with 4 stored elements in Compressed Sparse Row format>]], dtype=object)


## Dense with dense
>>> C_dense = A_dense.dot(B_dense) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix)
>>> C_dense
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])
>>> C_dense = A_dense * B_dense #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix)
>>> C_dense
matrix([[  2.,   1.],
        [  3.,   4.],
        [ 12.,  11.]])

}}}

{{{
## numpy.ndarray
>>> A_ndarray = numpy.array([[1, 0], [0, 1], [3, 2]])
>>> B_ndarray = numpy.array([[2, 1], [3, 4]])
>>> C_ndarray = numpy.array([2, 1])

>>> B_ndarray * B_ndarray #(Element-wise Matrix multiplication, 2x2 matrix with 2x2 matrix)
array([[ 4,  1],
       [ 9, 16]])
>>> B_ndarray.dot(B_ndarray) #(Normal Matrix multiplication, 2x2 matrix with 2x2 matrix)
array([[ 7,  6],
       [18, 19]])
>>> A_ndarray.dot(B_ndarray) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix)
array([[ 2,  1],
       [ 3,  4],
       [12, 11]])
>>> C_ndarray * B_ndarray #(Broadcasting)
array([[4, 1],
       [6, 4]])
}}}

TODO (Claudius): Element-wise operations like taking log, sqrt. Multiplying two m*n matrices element-wise (for example, to square the entries in a matrix etc...)

== Row- or column-wise operations ==

TODO (Claudius): summing of rows or columns, sorting rows / columns etc

== Useful methods ==

=== numpy.round ===
Takes an array and rounds its values to the given number of decimals. Note that for values exactly halfway between rounded decimal values, Numpy rounds to the nearest even value. [[https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.around.html|numpy.around]]

{{{
>>> numpy.round([1.98, 2.34, 4.76], 1)
[ 2.   2.3  4.8]
}}}
{{{
>>> numpy.round([1.5, 0.5, 3.5, 4.5], 0)
[ 2.  0.  4.  4.]
}}}

=== numpy.min ===
Takes an array and returns its minimum value. If an axis is specified, returns the minimum along the axis. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html|numpy.amin]]
{{{
>>> numpy.min([[5, 0, 1], [4, 3, 2]])
0
}}}
{{{
>>> numpy.min([[5, 0, 1], [4, 3, 2]], axis=0)
[4 0 1]
}}}

=== numpy.argmin ===
Takes an array and returns the index of the minimum value of the flattened array. If an axis is specified, returns the indices of the minimum values along the axis. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmin.html|numpy.argmin]]
{{{
>>> numpy.argmin([[5, 0, 1], [4, 3, 2]])
1
}}}
{{{
>>> numpy.argmin([[5, 0, 1], [4, 3, 2]], axis=0)
[1 0 0]
}}}

=== numpy.argsort ===
Takes an array a and returns an array of indices that sort a. Optionally, you can specify the axis along which a will be sorted. By default the axis is -1. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html|numpy.argsort]]

{{{
>>> numpy.argsort([[0, 4, 0], [4, 3, 2]], axis=0)
[[0 1 0]
 [1 0 1]]
}}}
{{{
>>> numpy.argsort([[0, 4, 0], [4, 3, 2]], axis=1)
[[0 2 1]
 [2 1 0]]
}}}

=== numpy.where ===
Takes a condition and optionally two array-like objects x and y. If x and y are specified, returns an array that contains elements from x where condition is true and elements from y elsewhere. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html|numpy.where]]
{{{
>>> x = numpy.array([[5, 4, 3], [2, 1, 0]])
>>> y = numpy.array([[0, 1, 2], [3, 4, 5]])
>>> numpy.where(x > 3, x, y)
[[5 4 2]
 [3 4 5]]
}}}

== Special matrices ==

=== Diagonal matrix ===

Matrix (usually square) in which all entries are zero, except on the main diagonal. Use [[https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.diag.html | numpy.diag]] to either create a diagonal matrix from a given main diagonal, or extract the diagonal matrix from a given matrix.

{{{
>>> numpy.diag([1,2,3])
array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])
}}}
{{{
>>> numpy.diag([[1, 5, 4],
                [7, 2, 4],
                [4, 7, 3]])
array([1, 2, 3])
}}}

For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.spdiags.html#scipy.sparse.spdiags|scipy.spare.spdiags]].

=== Identity matrix ===

Special diagonal ''m''*''m'' matrix where all elements on the main diagonal are 1. Read as the '1' of matrix world. For example, a ''n''*''m'' matrix ''A'' multiplied with an ''m''*''m'' identity matrix yields ''A'' again. Use [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.identity.html|numpy.identity(k)]] to create a ''k''*''k'' identity matrix.

{{{
>>> numpy.identity(4)
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])
}}}

{{{
>>> numpy.array([[1, 2, 3],
                 [3, 4, 3]]).dot(numpy.identity(3))
array([[ 1.,  2.,  3.],
       [ 3.,  4.,  3.]])
}}}

For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.identity.html|scipy.sparse.identity]].

=== Triangular matrix ===

A (square) matrix where all elements below (upper triangle) or above (lower triangle) the main diagonal are zero. 
[[https://docs.scipy.org/doc/numpy/reference/generated/numpy.triu.html|numpy.triu]] creates the upper ({{{u}}}), [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.tril.html|numpy.tril]] the lower ({{{l}}}) triangular matrix from a given matrix.

{{{
>>> numpy.triu([[1, 5, 4],
                [7, 2, 4],
                [4, 7, 3]])
array([[1, 5, 4],
       [0, 2, 4],
       [0, 0, 3]])
}}}


{{{
>>> numpy.tril([[1, 5, 4],
                [7, 2, 4],
                [4, 7, 3]])
array([[1, 0, 0],
       [7, 2, 0],
       [4, 7, 3]])
}}}

For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.triu.html|scipy.sparse.triu]] and [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.tril.html|scipy.sparse.tril]].


== Matrix decomposition ==

=== Singular Value Decompostion (SVD) ===

Factorize a matrix ''A'' (''m''*''n'') into three matrices ''U'' (''m'' * ''r''), ''S'' (''r'' * ''r'') and ''V'' (''r'' * ''n'') such that ''A'' = ''U'' * ''S'' * ''V''. Here ''r'' is the [[https://en.wikipedia.org/wiki/Rank_(linear_algebra)| rank]] of ''A''.

Use [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.svd.html|numpy.linalg.svd]] to do a singular value decomposition for a dense matrix. Use [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html|scipy.sparse.linalg.svds]] for sparse matrices (computes the largest ''k'' singular values for a sparse matrix).

{{{
>>> Uk, Sk, Vk = svds(csr_matrix([[1, 2, 3], [3, 4, 5], [5, 6, 4]], dtype=float), 2)
>>> print("Uk:\n", Uk, "\nSk:\n", Sk, "\nVk:\n", Vk)
Uk:
 [[ 0.56475636 -0.30288472]
 [ 0.51457155 -0.59799935]
 [-0.64518709 -0.74206309]] 
Sk:
 [  2.13530566  11.67829513] 
Vk:
 [[-0.52332762 -0.32001209  0.78975975]
 [-0.49726421 -0.63794803 -0.58800563]]
}}}