Size: 4421
Comment:
|
Size: 6400
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= numpy cheat sheet = | #acl All:read = NumPy/SciPy Cheat Sheet = This cheat sheet is a quick reference for !NumPy / !SciPy beginners and gives an overview about the most important commands and functions of !NumPy and !SciPy that you might need on solving the exercise sheets about Linear Algebra in Information Retrieval. It doesn't claim to be complete and will be extended continuously. If you think that some important thing is missing or if you find any errors, please let us know. |
Line 5: | Line 9: |
=== What is NumPy? === A library that allows to work with arrays and matrices in Python. === What is SciPy? === Another library built upon !NumPy that provides advanced Linear Algebra stuff. == Install == The routine to install !NumPy and !SciPy depends on your operating system. === Linux (Ubuntu, Debian) === {{{ apt-get install python-numpy python-scipy }}} === Other systems (Windows, Mac, etc.) === For all other systems (Windows, Mac, etc.) see the instructions given on the offical [[https://scipy.org/install.html|SciPy website]]. ------ |
|
Line 7: | Line 31: |
TODO (Hannah): for dense matrices (matrix vs. array) as well as sparse matrices (csr_matrix((data, indices, indptr)) | We distinguish between '''dense matrices''' and '''sparse matrices'''. Dense matrices store every entry in the matrix, while sparse matrices only store the non-zero entries (together with their row and column index). Dense matrices are more feature-rich, but may consume more memory space than sparse matrices (in particular if most of the entries in a matrix are zero). |
Line 9: | Line 33: |
== Accessing elements == | === Dense matrices === In !NumPy, there are two concepts of dense matrices: '''matrices''' and '''arrays'''. Matrices are strictly 2-dimensional, while arrays are n-dimensional (the term ''array'' is a bit misleading here). |
Line 11: | Line 36: |
TODO (Hannah): crazy element access magic, single elements, entire rows, sub-matrices | Construct a matrix: {{{ numpy.matrix(arg, dtype=None) |
Line 13: | Line 40: |
== Matrix operations == | arg: The data to construct the matrix from, given as * a standard Python array; or * a string with columns separated by commas or spaces and rows separated by semicolons. dtype (str, optional): The type of the entries in the matrix (e.g., 'integer', 'float', 'string', etc.). |
Line 15: | Line 47: |
TODO (Raghu): examples of dot product (dense * dense, dense * sparse, sparse * sparse), usage of both matrix.dot() and * (and how it behaves in different contexts), constant factor adding / multiplication | ---------- Examples: |
Line 17: | Line 50: |
TODO (Claudius): Element-wise operations like taking log, sqrt. Multiplying two m*n matrices element-wise (for example, to square the entries in a matrix etc...) | >>> numpy.matrix("1 2; 3 4") [[1 2] [3 4]] |
Line 19: | Line 54: |
== Row- or column-wise operations == | >>> numpy.matrix([[1, 2], [3, 4]], dtype='float') [[1.0 2.0] [3.0 4.0]] }}} [[https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matrix.html|numpy.matrix]] <<BR>><<BR>> |
Line 21: | Line 60: |
TODO (Claudius): summing of rows or columns, sorting rows / columns etc | Construct an array: {{{ numpy.array(arg, dtype=None, ndmin=0) |
Line 23: | Line 64: |
== Useful methods == | arg: The data to construct the matrix from, given as * a standard array; or * a function that returns an array. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ndmin (int, optional): The minimum number of dimensions that the array should have. |
Line 25: | Line 73: |
TODO (Natalie): numpy.where, numpy.argsort, numpy.min, numpy.argmin, numpy.round (useful for tests) == Special matrices == |
---------- Examples: |
Line 28: | Line 76: |
=== Diagonal matrix === | >>> numpy.array([[1, 2], [3, 4]]) [[1 2] [3 4]] |
Line 30: | Line 80: |
Matrix (usually square) in which all entries are zero, except on the main diagonal. Use [[https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.diag.html | numpy.diag]] to either create a diagonal matrix from a givin main diagonal, or extract the diagonal matrix from a given matrix. | >>> numpy.array([[1, 2], [3, 4]], dtype='float') [[1.0 2.0] [3.0 4.0]] |
Line 32: | Line 84: |
>>> numpy.array([[1, 2], [3, 4]], ndmin=3) [[[1 2] [3 4]]] }}} [[https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.array.html|numpy.array]] <<BR>><<BR>> There are some utility functions to create special-structured arrays: (1) Construct an array filled with zeros: |
|
Line 33: | Line 94: |
>>> numpy.diag([1,2,3]) array([[1, 0, 0], [0, 2, 0], [0, 0, 3]]) |
numpy.zeros(shape, dtype=float) shape (int or sequence of ints): The dimensions of the array to create. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ---------- Examples: >>> numpy.zeros(3) [0.0, 0.0, 0.0] >>> numpy.zeros([3, 2], dtype='int') [[0 0] [0 0] [0 0]] |
Line 38: | Line 112: |
[[https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.zeros.html|numpy.zeros]] <<BR>><<BR>> (2) Construct an array without initializing the entries (an array with random entries): |
|
Line 39: | Line 116: |
>>> numpy.diag([[1, 5, 4], [7, 2, 4], [4, 7, 3]]) array([1, 2, 3]) |
numpy.empty(shape, dtype=float) shape (int or sequence of ints): The dimensions of the array to create. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ---------- Examples: >>> numpy.empty(3) [6.95052181e-310 1.74512682e-316 1.58101007e-322] >>> numpy.empty([3, 2], dtype='int') [[140045355821992 140045355821992] [140045136216840 140045136244784] [140045125643544 140045153116544]] |
Line 44: | Line 134: |
[[https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.empty.html|numpy.empty]] <<BR>><<BR>> | |
Line 45: | Line 136: |
For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.spdiags.html#scipy.sparse.spdiags|scipy.spare.spdiags]] | === Sparse matrices === |
Line 47: | Line 138: |
=== Identity matrix === | There are two principle concepts of sparse matrices: * ''Compressed Sparse Row'' matrix (CSR matrix): entries are stored row by row (sorted by row index first) * ''Compressed Sparse Column'' matrix (CSC matrix): entries are stored column by column (sorted by column index first) |
Line 49: | Line 142: |
Special diagonal ''m''*''m'' matrix where all elements on the main diagonal are 1. Sometimes denoted as '''1'''. Read as the '1' of matrix world. For example, a ''n''*''m'' matrix ''A'' multiplied with an ''m''*''m'' identity matrix yields ''A'' again. Use [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.identity.html|numpy.identity(k)]] to create a ''k''*''k'' identity matrix. | Construct a CSR/CSC matrix: {{{ scipy.sparse.csr_matrix(arg, shape=None, dtype=None) scipy.sparse.csc_matrix(arg, shape=None, dtype=None) |
Line 51: | Line 147: |
{{{ >>> numpy.identity(4) array([[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]]) |
arg: The data to create the CSR matrix from, given as * a dense matrix; or * another sparse matrix; or * a tuple (m, n), to construct an empty matrix with shape (n, m); or * a tuple (data, (rows, cols), to construct a matrix A where A[rows[k], cols[k]] = data[k]; or * a tuple (data, indices, indptr) shape (int or sequence of ints): The dimensions of the matrix to create. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ---------- Examples: >>> scipy.sparse.csr_matrix([[1, 2, 3], [0, 0, 1], [0, 1, 3]]) [[1 2 3] [0 0 1] [0 1 3]] # (transformed to a dense matrix for visualization). >>> scipy.sparse.csc_matrix([[1, 2, 3], [0, 0, 1], [0, 1, 3]]) [[1 2 3] [0 0 1] [0 1 3]] # (transformed to a dense matrix for visualization). >>> values = [1, 2, 3] >>> rows = [0, 0, 1] >>> cols = [0, 1, 3] >>> scipy.sparse.csr_matrix((values, (rows, columns)), shape=[5, 5], dtype=int) [[1 1 0 0] [0 0 0 3] [0 0 0 0] [0 0 0 0]] # (transformed to a dense matrix for visualization). >>> values = [1, 2, 3] >>> rows = [0, 0, 1] >>> cols = [0, 1, 3] >>> scipy.sparse.csc_matrix((values, (rows, columns)), shape=[5, 5], dtype=int) [[1 1 0 0] [0 0 0 3] [0 0 0 0] [0 0 0 0]] # (transformed to a dense matrix for visualization). |
Line 58: | Line 190: |
{{{ >>> numpy.array([[1, 2, 3], [3, 4, 3]]).dot(numpy.identity(3)) array([[ 1., 2., 3.], [ 3., 4., 3.]]) }}} For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.identity.html|scipy.sparse.identity]] === Triangular matrix === A (square) matrix where all elements below (upper triangle) or above (lower triangle) the main diagonal are zero. {{{numpy.triu}}} creates the upper ({{{u}}}), {{{numpy.triu}}} the lower ({{{l}}}) triangular matrix from a given matrix. {{{ >>> numpy.triu([[1, 5, 4], [7, 2, 4], [4, 7, 3]]) array([[1, 5, 4], [0, 2, 4], [0, 0, 3]]) }}} {{{ >>> numpy.tril([[1, 5, 4], [7, 2, 4], [4, 7, 3]]) array([[1, 0, 0], [7, 2, 0], [4, 7, 3]]) }}} For a sparse matrix, use [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.triu.html|scipy.sparse.triu]] and [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.tril.html|scipy.sparse.tril]] == Matrix decomposition == === Singular Value Decompostion (SVD) === Factorize a matrix ''A'' (''m''*''n'') into three matrices ''U'' (''m'' * ''r''), ''S'' (''r'' * ''r'') and ''V'' (''r'' * ''n'') such that ''A'' = ''U'' * ''S'' * ''V''. Here ''r'' is the [[https://en.wikipedia.org/wiki/Rank_(linear_algebra)| rank]] of ''A''. Use [[https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.svd.html|numpy.linalg.svd]] to do a singular value decomposition for a dense matrix. Use [[https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html|scipy.sparse.linalg.svds]] for sparse matrices (computes the largest ''k'' singular values for a sparse matrix). {{{ >>> Uk, Sk, Vk = svds(csr_matrix([[1, 2, 3], [3, 4, 5], [5, 6, 4]], dtype=float), 2) >>> print("Uk:\n", Uk, "\nSk:\n", Sk, "\nVk:\n", Vk) U: [[ 0.56475636 -0.30288472] [ 0.51457155 -0.59799935] [-0.64518709 -0.74206309]] S: [ 2.13530566 11.67829513] V: [[-0.52332762 -0.32001209 0.78975975] [-0.49726421 -0.63794803 -0.58800563]] }}} |
[[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.html|scipy.sparse.csr_matrix]]<<BR>> [[https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.html|scipy.sparse.csc_matrix]]<<BR>><<BR>> ... to be continued ... |
NumPy/SciPy Cheat Sheet
This cheat sheet is a quick reference for NumPy / SciPy beginners and gives an overview about the most important commands and functions of NumPy and SciPy that you might need on solving the exercise sheets about Linear Algebra in Information Retrieval. It doesn't claim to be complete and will be extended continuously. If you think that some important thing is missing or if you find any errors, please let us know.
General
What is NumPy?
A library that allows to work with arrays and matrices in Python.
What is SciPy?
Another library built upon NumPy that provides advanced Linear Algebra stuff.
Install
The routine to install NumPy and SciPy depends on your operating system.
Linux (Ubuntu, Debian)
apt-get install python-numpy python-scipy
Other systems (Windows, Mac, etc.)
For all other systems (Windows, Mac, etc.) see the instructions given on the offical SciPy website.
Matrix construction
We distinguish between dense matrices and sparse matrices. Dense matrices store every entry in the matrix, while sparse matrices only store the non-zero entries (together with their row and column index). Dense matrices are more feature-rich, but may consume more memory space than sparse matrices (in particular if most of the entries in a matrix are zero).
Dense matrices
In NumPy, there are two concepts of dense matrices: matrices and arrays. Matrices are strictly 2-dimensional, while arrays are n-dimensional (the term array is a bit misleading here).
Construct a matrix:
numpy.matrix(arg, dtype=None) arg: The data to construct the matrix from, given as * a standard Python array; or * a string with columns separated by commas or spaces and rows separated by semicolons. dtype (str, optional): The type of the entries in the matrix (e.g., 'integer', 'float', 'string', etc.). ---------- Examples: >>> numpy.matrix("1 2; 3 4") [[1 2] [3 4]] >>> numpy.matrix([[1, 2], [3, 4]], dtype='float') [[1.0 2.0] [3.0 4.0]]
Construct an array:
numpy.array(arg, dtype=None, ndmin=0) arg: The data to construct the matrix from, given as * a standard array; or * a function that returns an array. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ndmin (int, optional): The minimum number of dimensions that the array should have. ---------- Examples: >>> numpy.array([[1, 2], [3, 4]]) [[1 2] [3 4]] >>> numpy.array([[1, 2], [3, 4]], dtype='float') [[1.0 2.0] [3.0 4.0]] >>> numpy.array([[1, 2], [3, 4]], ndmin=3) [[[1 2] [3 4]]]
There are some utility functions to create special-structured arrays:
(1) Construct an array filled with zeros:
numpy.zeros(shape, dtype=float) shape (int or sequence of ints): The dimensions of the array to create. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ---------- Examples: >>> numpy.zeros(3) [0.0, 0.0, 0.0] >>> numpy.zeros([3, 2], dtype='int') [[0 0] [0 0] [0 0]]
(2) Construct an array without initializing the entries (an array with random entries):
numpy.empty(shape, dtype=float) shape (int or sequence of ints): The dimensions of the array to create. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ---------- Examples: >>> numpy.empty(3) [6.95052181e-310 1.74512682e-316 1.58101007e-322] >>> numpy.empty([3, 2], dtype='int') [[140045355821992 140045355821992] [140045136216840 140045136244784] [140045125643544 140045153116544]]
Sparse matrices
There are two principle concepts of sparse matrices:
Compressed Sparse Row matrix (CSR matrix): entries are stored row by row (sorted by row index first)
Compressed Sparse Column matrix (CSC matrix): entries are stored column by column (sorted by column index first)
Construct a CSR/CSC matrix:
scipy.sparse.csr_matrix(arg, shape=None, dtype=None) scipy.sparse.csc_matrix(arg, shape=None, dtype=None) arg: The data to create the CSR matrix from, given as * a dense matrix; or * another sparse matrix; or * a tuple (m, n), to construct an empty matrix with shape (n, m); or * a tuple (data, (rows, cols), to construct a matrix A where A[rows[k], cols[k]] = data[k]; or * a tuple (data, indices, indptr) shape (int or sequence of ints): The dimensions of the matrix to create. dtype (str, optional): The type of the entries in the matrix ('integer', 'float', 'string', etc.). ---------- Examples: >>> scipy.sparse.csr_matrix([[1, 2, 3], [0, 0, 1], [0, 1, 3]]) [[1 2 3] [0 0 1] [0 1 3]] # (transformed to a dense matrix for visualization). >>> scipy.sparse.csc_matrix([[1, 2, 3], [0, 0, 1], [0, 1, 3]]) [[1 2 3] [0 0 1] [0 1 3]] # (transformed to a dense matrix for visualization). >>> values = [1, 2, 3] >>> rows = [0, 0, 1] >>> cols = [0, 1, 3] >>> scipy.sparse.csr_matrix((values, (rows, columns)), shape=[5, 5], dtype=int) [[1 1 0 0] [0 0 0 3] [0 0 0 0] [0 0 0 0]] # (transformed to a dense matrix for visualization). >>> values = [1, 2, 3] >>> rows = [0, 0, 1] >>> cols = [0, 1, 3] >>> scipy.sparse.csc_matrix((values, (rows, columns)), shape=[5, 5], dtype=int) [[1 1 0 0] [0 0 0 3] [0 0 0 0] [0 0 0 0]] # (transformed to a dense matrix for visualization).
scipy.sparse.csr_matrix
scipy.sparse.csc_matrix
... to be continued ...