4421
Comment:
|
12072
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= numpy cheat sheet = | #acl All:read = NumPy/SciPy Cheat Sheet = This cheat sheet is a quick reference for !NumPy / !SciPy beginners and gives an overview about the most important commands and functions of !NumPy and !SciPy that you might need on solving the exercise sheets about Linear Algebra in Information Retrieval. It doesn't claim to be complete and will be extended continuously. If you think that some important thing is missing or if you find any errors, please let us know. |
Line 5: | Line 9: |
=== What is NumPy? === A Python library that allows to work with arrays and matrices efficiently. === What is SciPy? === TODO === What is the difference between NumPy and SciPy? === TODO == Install == The routine to install !NumPy and !SciPy depends on your operating system. <<BR>> === Linux (Ubuntu, Debian) === {{{ apt-get install python-numpy python-scipy }}} === Other systems (Windows, Mac, etc.) === For all other systems (Windows, Mac, etc.) see the instructions given on the offical [[https://scipy.org/install.html|SciPy website]]. ------ |
|
Line 7: | Line 34: |
TODO (Hannah): for dense matrices (matrix vs. array) as well as sparse matrices (csr_matrix((data, indices, indptr)) | === Dense matrices === TODO: matrix vs. array === Sparse matrices === Construct a ''Compressed Sparse Row matrix'': {{{ scipy.sparse.csr_matrix(arg, shape=None, dtype=None, copy=False) arg: * A dense matrix; or * Another sparse matrix; or * A tuple (m, n), to construct an empty matrix with shape (n, m); or * A tuple (data, (rows, cols), to construct a matrix A where A[rows[k], cols[k]] = data[k]; or * A tuple (data, indices, indptr) }}} Examples: {{{ from scipy.sparse import csr_matrix }}} |
Line 15: | Line 65: |
TODO (Raghu): examples of dot product (dense * dense, dense * sparse, sparse * sparse), usage of both matrix.dot() and * (and how it behaves in different contexts), constant factor adding / multiplication | === Constant addition === Addition of a constant adds it to every element of the matrix (only for dense matrices) {{{ >>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float) >>> B_dense + 10 matrix([[ 12., 11.], [ 13., 14.]]) }}} === Multiplication by a constant === Multiplication by a constant multiplies every element of the matrix by that constant (both for sparse and dense matrices) {{{ >>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float) >>> (A_sparse * 10).todense() matrix([[ 10., 0.], [ 0., 10.], [ 30., 20.]]) }}} === Multiplication === * produces the '''normal''' matrix multiplication between a csr_matrix (sparse) and a numpy matrix (dense).<<BR>> * produces the '''element-wise''' matrix multiplication for numpy arrays (also dense). In these cases Python broadcasts the operands in case their dimensions mismatch. matrix.dot() produces the normal matrix multiplication between a csr_matrix and a numpy matrix '''except''' in the case of a dense.dot(sparse) matrix multiplication. The result of a matrix multiplication between: *a sparse and a sparse matrix is sparse *a sparse and a dense matrix is dense *a dense and a dense matrix is dense https://docs.scipy.org/doc/scipy/reference/sparse.html <<BR>> http://www.scipy-lectures.org/intro/numpy/operations.html {{{ >>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float) >>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float) >>> A_dense = A_sparse.todense() >>> B_sparse = csr_matrix(B_dense) ## Sparse with sparse >>> C_sparse = A_sparse * B_sparse #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_sparse.todense() matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> C_sparse = A_sparse.dot(B_sparse) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_sparse.todense() matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) ## Sparse with dense >>> C_dense = A_sparse * B_dense #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> C_dense = A_sparse.dot(B_dense) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) ## Dense with sparse >>> C_dense = A_dense * B_sparse >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> A_dense.dot(B_sparse) matrix([[ <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>, <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>], [ <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>, <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>], [ <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>, <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>]], dtype=object) ## Dense with dense >>> C_dense = A_dense.dot(B_dense) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> C_dense = A_dense * B_dense #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) }}} {{{ ## numpy.ndarray >>> A_ndarray = numpy.array([[1, 0], [0, 1], [3, 2]]) >>> B_ndarray = numpy.array([[2, 1], [3, 4]]) >>> C_ndarray = numpy.array([2, 1]) >>> B_ndarray * B_ndarray #(Element-wise Matrix multiplication, 2x2 matrix with 2x2 matrix) array([[ 4, 1], [ 9, 16]]) >>> B_ndarray.dot(B_ndarray) #(Normal Matrix multiplication, 2x2 matrix with 2x2 matrix) array([[ 7, 6], [18, 19]]) >>> A_ndarray.dot(B_ndarray) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) array([[ 2, 1], [ 3, 4], [12, 11]]) >>> C_ndarray * B_ndarray #(Broadcasting) array([[4, 1], [6, 4]]) }}} |
Line 25: | Line 200: |
TODO (Natalie): numpy.where, numpy.argsort, numpy.min, numpy.argmin, numpy.round (useful for tests) | === numpy.round === Takes an array and rounds its values to the given number of decimals. Note that for values exactly halfway between rounded decimal values, Numpy rounds to the nearest even value. [[https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.around.html|numpy.around]] {{{ >>> numpy.round([1.98, 2.34, 4.76], 1) [ 2. 2.3 4.8] }}} {{{ >>> numpy.round([1.5, 0.5, 3.5, 4.5], 0) [ 2. 0. 4. 4.] }}} === numpy.min === Takes an array and returns its minimum value. If an axis is specified, returns the minimum along the axis. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html|numpy.amin]] {{{ >>> numpy.min([[5, 0, 1], [4, 3, 2]]) 0 }}} {{{ >>> numpy.min([[5, 0, 1], [4, 3, 2]], axis=0) [4 0 1] }}} === numpy.argmin === Takes an array and returns the index of the minimum value of the flattened array. If an axis is specified, returns the indices of the minimum values along the axis. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmin.html|numpy.argmin]] {{{ >>> numpy.argmin([[5, 0, 1], [4, 3, 2]]) 1 }}} {{{ >>> numpy.argmin([[5, 0, 1], [4, 3, 2]], axis=0) [1 0 0] }}} === numpy.argsort === Takes an array a and returns an array of indices that sort a. Optionally, you can specify the axis along which a will be sorted. By default the axis is -1. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html|numpy.argsort]] {{{ >>> numpy.argsort([[0, 4, 0], [4, 3, 2]], axis=0) [[0 1 0] [1 0 1]] }}} {{{ >>> numpy.argsort([[0, 4, 0], [4, 3, 2]], axis=1) [[0 2 1] [2 1 0]] }}} === numpy.where === Takes a condition and optionally two array-like objects x and y. If x and y are specified, returns an array that contains elements from x where condition is true and elements from y elsewhere. [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html|numpy.where]] {{{ >>> x = numpy.array([[5, 4, 3], [2, 1, 0]]) >>> y = numpy.array([[0, 1, 2], [3, 4, 5]]) >>> numpy.where(x > 3, x, y) [[5 4 2] [3 4 5]] }}} |
Line 30: | Line 262: |
Matrix (usually square) in which all entries are zero, except on the main diagonal. Use [[https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.diag.html | numpy.diag]] to either create a diagonal matrix from a givin main diagonal, or extract the diagonal matrix from a given matrix. | Matrix (usually square) in which all entries are zero, except on the main diagonal. Use [[https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.diag.html | numpy.diag]] to either create a diagonal matrix from a given main diagonal, or extract the diagonal matrix from a given matrix. |
Line 45: | Line 277: |
For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.spdiags.html#scipy.sparse.spdiags|scipy.spare.spdiags]] | For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.spdiags.html#scipy.sparse.spdiags|scipy.spare.spdiags]]. |
Line 49: | Line 281: |
Special diagonal ''m''*''m'' matrix where all elements on the main diagonal are 1. Sometimes denoted as '''1'''. Read as the '1' of matrix world. For example, a ''n''*''m'' matrix ''A'' multiplied with an ''m''*''m'' identity matrix yields ''A'' again. Use [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.identity.html|numpy.identity(k)]] to create a ''k''*''k'' identity matrix. | Special diagonal ''m''*''m'' matrix where all elements on the main diagonal are 1. Read as the '1' of matrix world. For example, a ''n''*''m'' matrix ''A'' multiplied with an ''m''*''m'' identity matrix yields ''A'' again. Use [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.identity.html|numpy.identity(k)]] to create a ''k''*''k'' identity matrix. |
Line 66: | Line 298: |
For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.identity.html|scipy.sparse.identity]] | For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.identity.html|scipy.sparse.identity]]. |
Line 71: | Line 303: |
{{{numpy.triu}}} creates the upper ({{{u}}}), {{{numpy.triu}}} the lower ({{{l}}}) triangular matrix from a given matrix. | [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.triu.html|numpy.triu]] creates the upper ({{{u}}}), [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.tril.html|numpy.tril]] the lower ({{{l}}}) triangular matrix from a given matrix. |
Line 92: | Line 324: |
For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.triu.html|scipy.sparse.triu]] and [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.tril.html|scipy.sparse.tril]] | For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.triu.html|scipy.sparse.triu]] and [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.tril.html|scipy.sparse.tril]]. |
Line 106: | Line 338: |
U: | Uk: |
Line 110: | Line 342: |
S: | Sk: |
Line 112: | Line 344: |
V: | Vk: |
NumPy/SciPy Cheat Sheet
This cheat sheet is a quick reference for NumPy / SciPy beginners and gives an overview about the most important commands and functions of NumPy and SciPy that you might need on solving the exercise sheets about Linear Algebra in Information Retrieval. It doesn't claim to be complete and will be extended continuously. If you think that some important thing is missing or if you find any errors, please let us know.
General
What is NumPy?
A Python library that allows to work with arrays and matrices efficiently.
What is SciPy?
TODO
What is the difference between NumPy and SciPy?
TODO
Install
The routine to install NumPy and SciPy depends on your operating system.
Linux (Ubuntu, Debian)
apt-get install python-numpy python-scipy
Other systems (Windows, Mac, etc.)
For all other systems (Windows, Mac, etc.) see the instructions given on the offical SciPy website.
Matrix construction
Dense matrices
TODO: matrix vs. array
Sparse matrices
Construct a Compressed Sparse Row matrix:
scipy.sparse.csr_matrix(arg, shape=None, dtype=None, copy=False) arg: * A dense matrix; or * Another sparse matrix; or * A tuple (m, n), to construct an empty matrix with shape (n, m); or * A tuple (data, (rows, cols), to construct a matrix A where A[rows[k], cols[k]] = data[k]; or * A tuple (data, indices, indptr)
Examples:
from scipy.sparse import csr_matrix
Accessing elements
TODO (Hannah): crazy element access magic, single elements, entire rows, sub-matrices
Matrix operations
Constant addition
Addition of a constant adds it to every element of the matrix (only for dense matrices)
>>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float) >>> B_dense + 10 matrix([[ 12., 11.], [ 13., 14.]])
Multiplication by a constant
Multiplication by a constant multiplies every element of the matrix by that constant (both for sparse and dense matrices)
>>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float) >>> (A_sparse * 10).todense() matrix([[ 10., 0.], [ 0., 10.], [ 30., 20.]])
Multiplication
* produces the normal matrix multiplication between a csr_matrix (sparse) and a numpy matrix (dense).
* produces the element-wise matrix multiplication for numpy arrays (also dense). In these cases Python broadcasts the operands in case their dimensions mismatch.
matrix.dot() produces the normal matrix multiplication between a csr_matrix and a numpy matrix except in the case of a dense.dot(sparse) matrix multiplication.
The result of a matrix multiplication between:
- a sparse and a sparse matrix is sparse
- a sparse and a dense matrix is dense
- a dense and a dense matrix is dense
https://docs.scipy.org/doc/scipy/reference/sparse.html
http://www.scipy-lectures.org/intro/numpy/operations.html
>>> A_sparse = csr_matrix([[1, 0], [0, 1], [3, 2]], dtype=float) >>> B_dense = numpy.matrix([[2, 1], [3, 4]], dtype=float) >>> A_dense = A_sparse.todense() >>> B_sparse = csr_matrix(B_dense) ## Sparse with sparse >>> C_sparse = A_sparse * B_sparse #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_sparse.todense() matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> C_sparse = A_sparse.dot(B_sparse) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_sparse.todense() matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) ## Sparse with dense >>> C_dense = A_sparse * B_dense #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> C_dense = A_sparse.dot(B_dense) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) ## Dense with sparse >>> C_dense = A_dense * B_sparse >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> A_dense.dot(B_sparse) matrix([[ <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>, <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>], [ <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>, <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>], [ <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>, <2x2 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>]], dtype=object) ## Dense with dense >>> C_dense = A_dense.dot(B_dense) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]]) >>> C_dense = A_dense * B_dense #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) >>> C_dense matrix([[ 2., 1.], [ 3., 4.], [ 12., 11.]])
## numpy.ndarray >>> A_ndarray = numpy.array([[1, 0], [0, 1], [3, 2]]) >>> B_ndarray = numpy.array([[2, 1], [3, 4]]) >>> C_ndarray = numpy.array([2, 1]) >>> B_ndarray * B_ndarray #(Element-wise Matrix multiplication, 2x2 matrix with 2x2 matrix) array([[ 4, 1], [ 9, 16]]) >>> B_ndarray.dot(B_ndarray) #(Normal Matrix multiplication, 2x2 matrix with 2x2 matrix) array([[ 7, 6], [18, 19]]) >>> A_ndarray.dot(B_ndarray) #(Normal Matrix multiplication, 3x2 matrix with 2x2 matrix) array([[ 2, 1], [ 3, 4], [12, 11]]) >>> C_ndarray * B_ndarray #(Broadcasting) array([[4, 1], [6, 4]])
TODO (Claudius): Element-wise operations like taking log, sqrt. Multiplying two m*n matrices element-wise (for example, to square the entries in a matrix etc...)
Row- or column-wise operations
TODO (Claudius): summing of rows or columns, sorting rows / columns etc
Useful methods
numpy.round
Takes an array and rounds its values to the given number of decimals. Note that for values exactly halfway between rounded decimal values, Numpy rounds to the nearest even value. numpy.around
>>> numpy.round([1.98, 2.34, 4.76], 1) [ 2. 2.3 4.8]
>>> numpy.round([1.5, 0.5, 3.5, 4.5], 0) [ 2. 0. 4. 4.]
numpy.min
Takes an array and returns its minimum value. If an axis is specified, returns the minimum along the axis. numpy.amin
>>> numpy.min([[5, 0, 1], [4, 3, 2]]) 0
>>> numpy.min([[5, 0, 1], [4, 3, 2]], axis=0) [4 0 1]
numpy.argmin
Takes an array and returns the index of the minimum value of the flattened array. If an axis is specified, returns the indices of the minimum values along the axis. numpy.argmin
>>> numpy.argmin([[5, 0, 1], [4, 3, 2]]) 1
>>> numpy.argmin([[5, 0, 1], [4, 3, 2]], axis=0) [1 0 0]
numpy.argsort
Takes an array a and returns an array of indices that sort a. Optionally, you can specify the axis along which a will be sorted. By default the axis is -1. numpy.argsort
>>> numpy.argsort([[0, 4, 0], [4, 3, 2]], axis=0) [[0 1 0] [1 0 1]]
>>> numpy.argsort([[0, 4, 0], [4, 3, 2]], axis=1) [[0 2 1] [2 1 0]]
numpy.where
Takes a condition and optionally two array-like objects x and y. If x and y are specified, returns an array that contains elements from x where condition is true and elements from y elsewhere. numpy.where
>>> x = numpy.array([[5, 4, 3], [2, 1, 0]]) >>> y = numpy.array([[0, 1, 2], [3, 4, 5]]) >>> numpy.where(x > 3, x, y) [[5 4 2] [3 4 5]]
Special matrices
Diagonal matrix
Matrix (usually square) in which all entries are zero, except on the main diagonal. Use numpy.diag to either create a diagonal matrix from a given main diagonal, or extract the diagonal matrix from a given matrix.
>>> numpy.diag([1,2,3]) array([[1, 0, 0], [0, 2, 0], [0, 0, 3]])
>>> numpy.diag([[1, 5, 4], [7, 2, 4], [4, 7, 3]]) array([1, 2, 3])
For a sparse matrix, use scipy.spare.spdiags.
Identity matrix
Special diagonal m*m matrix where all elements on the main diagonal are 1. Read as the '1' of matrix world. For example, a n*m matrix A multiplied with an m*m identity matrix yields A again. Use numpy.identity(k) to create a k*k identity matrix.
>>> numpy.identity(4) array([[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]])
>>> numpy.array([[1, 2, 3], [3, 4, 3]]).dot(numpy.identity(3)) array([[ 1., 2., 3.], [ 3., 4., 3.]])
For a sparse matrix, use scipy.sparse.identity.
Triangular matrix
A (square) matrix where all elements below (upper triangle) or above (lower triangle) the main diagonal are zero. numpy.triu creates the upper (u), numpy.tril the lower (l) triangular matrix from a given matrix.
>>> numpy.triu([[1, 5, 4], [7, 2, 4], [4, 7, 3]]) array([[1, 5, 4], [0, 2, 4], [0, 0, 3]])
>>> numpy.tril([[1, 5, 4], [7, 2, 4], [4, 7, 3]]) array([[1, 0, 0], [7, 2, 0], [4, 7, 3]])
For a sparse matrix, use scipy.sparse.triu and scipy.sparse.tril.
Matrix decomposition
Singular Value Decompostion (SVD)
Factorize a matrix A (m*n) into three matrices U (m * r), S (r * r) and V (r * n) such that A = U * S * V. Here r is the rank of A.
Use numpy.linalg.svd to do a singular value decomposition for a dense matrix. Use scipy.sparse.linalg.svds for sparse matrices (computes the largest k singular values for a sparse matrix).
>>> Uk, Sk, Vk = svds(csr_matrix([[1, 2, 3], [3, 4, 5], [5, 6, 4]], dtype=float), 2) >>> print("Uk:\n", Uk, "\nSk:\n", Sk, "\nVk:\n", Vk) Uk: [[ 0.56475636 -0.30288472] [ 0.51457155 -0.59799935] [-0.64518709 -0.74206309]] Sk: [ 2.13530566 11.67829513] Vk: [[-0.52332762 -0.32001209 0.78975975] [-0.49726421 -0.63794803 -0.58800563]]