Humanities Data Analysis “125 85018 Karsdrop Humanities ch01 3p” — 2020/8/19 — 11 01 — page 122 — #45 122 • Chapter 3 Figure 3 11 Visualization of the axis ordering in two dimensional NumPy arrays pri[.]
“125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 122 — #45 122 • Chapter Figure 3.11 Visualization of the axis ordering in two-dimensional NumPy arrays print(document_term_matrix.sum()) 5356076 NumPy provides many other aggregating functions, such as numpy.min() and numpy.max() to compute the minimum/maximum of an array or along an axis, or numpy.mean() to compute the arithmetic mean (cf chapter 5) However, it is beyond the scope of this brief introduction into NumPy to discuss any of these functions in more detail, and for information we refer the reader to NumPy’s excellent online documentation 3.5.4 Array broadcasting In what preceded, we have briefly touched upon the concept of array arithmetic We conclude this introduction into NumPy with a slightly more detailed account of this concept, and introduce the more advanced concept of “array broadcasting,” which refers to the way NumPy handles arrays with different shapes during arithmetic operations Without broadcasting, array arithmetic would only be allowed when two arrays, for example a and b, have exactly “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 123 — #46 Exploring Texts Using the Vector Space Model Figure 3.12 Visualization of NumPy’s broadcasting mechanism Here the scalar is stretched into an array with size the same shape This is required, because arithmetic operators are evaluated “element-wise,” meaning that the operation is performed on each item in array a and its corresponding item (i.e., with the same positional index) in array b An example is given in the following code block, in which we multiply the numbers in array a with the numbers in array b: a = np.array([1, 2, 3]) b = np.array([2, 4, 6]) print(a * b) array([ 2, 8, 18]) Essentially, array broadcasting provides the means to perform array arithmetic on arrays with different shapes by “stretching” the smaller array to match the shape of the larger array, thus making their shapes compatible The observant reader might have noticed that we already encountered an example of array broadcasting when the concept of vectorized arithmetic operations was explained A similar example is given by: a = np.array([1, 2, 3]) print(a * 2) array([2, 4, 6]) In this example, the numbers in array a are multiplied by the scalar 2, which, strictly speaking, breaks the “rule” of two arrays having exactly the same shape Yet, the computation proceeds correctly, working as if we had multiplied a by np.array([2, 2, 2]) NumPy’s broadcasting mechanism stretches the number into an array with the same shape as array a This stretching process is illustrated by figure 3.12: Broadcasting operations are “parsimonious” and avoid allocating intermediate arrays (e.g., np.array([2, 2, 2])) to perform the computation However, conceptualizing array broadcasting as a stretching operation helps to better understand when broadcasting is applied, and when it cannot be applied To determine whether or not array arithmetic can be applied to two arrays, NumPy assesses the compatibility of the dimensions of two arrays Two dimensions • 123 “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 124 — #47 Figure 3.13 Visualization of NumPy’s broadcasting mechanism in the context of multiplying a two-dimensional array with a one-dimensional array Here the one-dimensional array [1, 2, 3] is stretched vertically to fit the dimensions of the other array Figure 3.14 Visualization of the inapplicability of NumPy’s broadcasting mechanism in the context of multiplying a one-dimensional array whose size mismatches the outermost dimension of a two-dimensional array “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 125 — #48 Exploring Texts Using the Vector Space Model are compatible if and only if (i) they have the same size, or (ii) one dimension equals NumPy compares the shapes of two arrays element-wise, starting with the innermost dimensions, and then working outwards Consider figure 3.13, in which the upper half visualizes the multiplication of a × array by a one-dimensional array with items Because the number of items of the one-dimensional array matches the size of the innermost dimension of the larger array (i.e., and 3), the smaller × array can be broadcast across the larger × array so that their shapes match (cf the lower half of the figure) Another example would be to multiply a × array by a × array However, as visualized by figure 3.14, array broadcasting cannot be applied for this combination of arrays, because the innermost dimension of the left array (i.e., 3) is incompatible with the number of items of the one-dimensional array (i.e., 4) As a rule of thumb, one should remember that in order to multiply a two-dimensional array with a one-dimensional array, the number of items in the latter should match the outermost dimension of the former • 125 ...“125-85018_Karsdrop _Humanities_ ch01_3p” — 2020/8/19 — 11:01 — page 123 — #46 Exploring Texts Using the Vector Space... assesses the compatibility of the dimensions of two arrays Two dimensions • 123 “125-85018_Karsdrop _Humanities_ ch01_3p” — 2020/8/19 — 11:01 — page 124 — #47 Figure 3.13 Visualization of NumPy’s broadcasting... array whose size mismatches the outermost dimension of a two-dimensional array “125-85018_Karsdrop _Humanities_ ch01_3p” — 2020/8/19 — 11:01 — page 125 — #48 Exploring Texts Using the Vector Space