10702_9789813229686_tp.indd 2/5/18 10:45 AM b2530 International Strategic Relations and China’s National Security: World at the Crossroads This page intentionally left blank b2530_FM.indd 01-Sep-16 11:03:06 AM 10702_9789813229686_tp.indd 2/5/18 10:45 AM Published by World Scientific Publishing Co Pte Ltd Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE Library of Congress Cataloging-in-Publication Data Names: Simovici, Dan A., author Title: Mathematical analysis for machine learning and data mining / by Dan Simovici (University of Massachusetts, Boston, USA) Description: [Hackensack?] New Jersey : World Scientific, [2018] | Includes bibliographical references and index Identifiers: LCCN 2018008584 | ISBN 9789813229686 (hc : alk paper) Subjects: LCSH: Machine learning Mathematics | Data mining Mathematics Classification: LCC Q325.5 S57 2018 | DDC 006.3/101515 dc23 LC record available at https://lccn.loc.gov/2018008584 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Copyright © 2018 by World Scientific Publishing Co Pte Ltd All rights reserved This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA In this case permission to photocopy is not required from the publisher For any available supplementary material, please visit http://www.worldscientific.com/worldscibooks/10.1142/10702#t=suppl Desk Editors: V Vishnu Mohan/Steven Patt Typeset by Stallion Press Email: enquiries@stallionpress.com Printed in Singapore Vishnu Mohan - 10702 - Mathematical Analysis for Machine Learning.indd 23-04-18 2:48:01 PM May 2, 2018 11:28 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page v v Making mathematics accessible to the educated layman, while keeping high scientific standards, has always been considered a treacherous navigation between the Scylla of professional contempt and the Charybdis of public misunderstanding Gian-Carlo Rota b2530 International Strategic Relations and China’s National Security: World at the Crossroads This page intentionally left blank b2530_FM.indd 01-Sep-16 11:03:06 AM May 2, 2018 11:28 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page vii Preface Mathematical Analysis can be loosely described as is the area of mathematics whose main object is the study of function and of their behaviour with respect to limits The term “function” refers to a broad collection of generalizations of real functions of real arguments, to functionals, operators, measures, etc There are several well-developed areas in mathematical analysis that present a special interest for machine learning: topology (with various flavors: point-set topology, combinatorial and algebraic topology), functional analysis on normed and inner product spaces (including Banach and Hilbert spaces), convex analysis, optimization, etc Moreover, disciplines like measure and integration theory which play a vital role in statistics, the other pillar of machine learning are absent from the education of a computer scientists We aim to contribute to closing this gap, which is a serious handicap for people interested in research The machine learning and data mining literature is vast and embraces a diversity of approaches, from informal to sophisticated mathematical presentations However, the necessary mathematical background needed for approaching research topics is usually presented in a terse and unmotivated manner, or is simply absent This volume contains knowledge that complements the usual presentations in machine learning and provides motivations (through its application chapters that discuss optimization, iterative algorithms, neural networks, regression, and support vector machines) for the study of mathematical aspects Each chapter ends with suggestions for further reading Over 600 exercises and supplements are included; they form an integral part of the material Some of the exercises are in reality supplemental material For these, we include solutions The mathematical background required for vii May 2, 2018 11:28 viii Mathematical Analysis for Machine Learning 9in x 6in b3234-main page viii Mathematical Analysis for Machine Learning and Data Mining making the best use of this volume consists in the typical sequence calculus — linear algebra — discrete mathematics, as it is taught to Computer Science students in US universities Special thanks are due to the librarians of the Joseph Healy Library at the University of Massachusetts Boston whose diligence was essential in completing this project I also wish to acknowledge the helpfulness and competent assistance of Steve Patt and D Rajesh Babu of World Scientific Lastly, I wish to thank my wife, Doina, a steady source of strength and loving support Dan A Simovici Boston and Brookline January 2018 May 2, 2018 11:28 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page ix Contents Preface vii Part I Set-Theoretical and Algebraic Preliminaries Preliminaries 1.1 Introduction 1.2 Sets and Collections 1.3 Relations and Functions 1.4 Sequences and Collections of Sets 1.5 Partially Ordered Sets 1.6 Closure and Interior Systems 1.7 Algebras and σ-Algebras of Sets 1.8 Dissimilarity and Metrics 1.9 Elementary Combinatorics Exercises and Supplements Bibliographical Comments Linear Spaces 2.1 Introduction 2.2 Linear Spaces and Linear Independence 2.3 Linear Operators and Functionals 2.4 Linear Spaces with Inner Products 2.5 Seminorms and Norms 2.6 Linear Functionals in Inner Product Spaces 2.7 Hyperplanes Exercises and Supplements Bibliographical Comments ix 3 16 18 28 34 43 47 54 64 65 65 65 74 85 88 107 110 113 116 May 2, 2018 11:28 954 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page 954 Mathematical Analysis for Machine Learning and Data Mining [100] Leshno, M., Lin, V Y., Pinkus, A and Schocken, S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks 6, 6, pp 861–867 [101] Lindenstrauss, J and Tzafriri, L (1973) Classical Banach Spaces (Springer Verlag, New York) [102] Luenberger, D G (1969) Optimization by Vector Space Methods (Wiley, New York) [103] Mairal, J and Yu, B (2012) Complexity analysis of the lasso regularization path, in Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26–July 1, 2012 [104] Mangasarian, O L (1965) Pseudo-convex functions, SIAM Journal on Control 3, pp 281–290 [105] Mangasarian, O L (1969) Nonlinear Programming (McGraw-Hill, New York) [106] McShane, E T (1973) The Lagrange multiplier rule, The American Mathematical Monthly 80, pp 922–925 [107] Niculescu, C and Persson, L E (2006) Convex Functions and Their Applications — A Contemporary Approach (Springer, New York) [108] Novikoff, A B J (1962) On convergence proofs on perceptrons, in Proceedings of the Symposium on Mathematical Theory of Automata (Polytechnic Institute of Brooklyn, Brooklyn, NY), pp 615–622 [109] Pagel, B.-U., Korn, F and Faloutsos, C (2000) Deflating the dimensionality curse using multiple fractal dimensions, in International Conference on Data Engineering, pp 589–598 [110] Pervin, W J (1964) Foundations of General Topology (Academic Press, New York) [111] Rockafellar, R T (1970) Convex Analysis (Princeton University Press, Princeton, NJ) [112] Rosenblatt, F (1958) The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review 65, pp 386–407 [113] Royden, H L (1988) Real Analysis, 3rd edn (Prentice-Hall, Englewood Cliffs, NJ) [114] Rudin, W (1986) Real and Complex Analysis, 3rd edn (McGraw-Hill, New York) [115] Schneider, R (1993) Convex Bodies: The Brun-Minkowski Theory (Cambridge University Press, Cambridge, UK) [116] Shafarevich, I R and Remizov, A O (2013) Linear Algebra and Geometry (Springer, Heidelberg) [117] Shalev-Shwartz, S and Ben-David, S (2016) Understanding Machine Learning — From Theory to Practice (Cambridge University Press, Cambridge, UK) [118] Shalev-Shwartz, S., Singer, Y and Srebro, N (2007) Pegasos: Primal Estimated sub-GrAdient solver for SVM, in Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, pp 807–814 May 2, 2018 11:28 Mathematical Analysis for Machine Learning Bibliography 9in x 6in b3234-main page 955 955 [119] Shawe-Taylor, J and Cristianini, N (2004) Kernel Methods for Pattern Analysis (Cambridge, Cambridge, UK) [120] Simon, B (2011) Convexity: An Analytic Viewpoint (Cambridge University Press, Cambridge, UK) [121] Simovici, D A (2012) Linear Algebra Tools for Data Mining (World Scientific, New Jersey) [122] Slater, M (1950) Lagrange multipliers revisited, Cowles Foundation for Research in Economics at Yale University, Discussion Paper Mathematics 403, pp 1–13 [123] Sokal, A D (2011) A really simple elementary proof of the uniform boundedness theorem, American Mathematical Monthly 118, pp 450–452 [124] Steinwart, I and Christman, A (2008) Support Vector Machines (Springer, New York) [125] Stewart, J (1976) Positive definite functions and generalizations, an historical survey, Rocky Mountain Journal of Mathematics 6, pp 409–484 [126] Stoer, J and Bulirsh, R (2010) Introduction to Numerical Analysis, 3rd edn (Springer, New York) [127] Sylvester, J J (1857) A question in the geometry of situation, Quarterly Journal in Pure and Applied Mathematics 1, p 19 [128] Tax, D M J and Duin, R P W (1999) Support vector domain description, Pattern Recognition Letters 20, 11-13, pp 1191–1199 [129] Tax, D M J and Duin, R P W (2004) Support vector data description, Machine Learning 54, 1, pp 45–66 [130] Taylor, M E (2006) Measure Theory and Integration (American Mathematical Society, Providence, RI) [131] Tibshirani, R (1996) Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological) 58, 1, pp 267–288 [132] Trefethen, L N and III, D B (1997) Numerical Linear Algebra (SIAM, Philadelphia) [133] Werner, J (1984) Optimization — Theory and Applications (Vieweg, Braunschweig) [134] White, H., Gallant, A R., Hornik, K., Stinchcombe, M and Woolridge, J (1992) Artificial Neural Networks — Approximation and Learning Theory (Blackwell, Oxford, UK) [135] Willard, S (2004) General Topology (Dover, Mineola, New York) [136] Zhu, J., Rosset, S., Hastie, T and Tibshirani, R (2003) 1-Norm support vector machines, in Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada], pp 49–56 b2530 International Strategic Relations and China’s National Security: World at the Crossroads This page intentionally left blank b2530_FM.indd 01-Sep-16 11:03:06 AM May 2, 2018 11:28 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page 957 Index A barycentric coordinates, 144 Borel set, 388 Borel-Cantelli Lemma, 416 Fatou’s Lemma, 495 absolutely convex set, 138 absolutely summable sequence, 308, 313 absorbing set, 139, 140 accumulation point, 179 active constraint, 826, 836 additivity property of signed measures, 456 affine basis, 125 affine combination, 120 affine mapping, 120 affine subspace, 121 affinely dependent set, 123 algebra of sets, 34 almost everywhere property, 401 almost surely true statement, 469 angle between vectors, 99 antimonotonic mapping, 27 ascent direction, 820 assumable set, 942 asymptotic density of a set, 587 Axiom of Choice, 12 C canonical form of a hyperplane relative to a sample, 927 Carath´eodory outer measure, 453 carrier of a point in a simplex, 145 Cartesian product, 13 projection of a, 13 Cauchy sequence in measure, 553 Chain Rule for Radon-Nikodym Derivatives, 532 characteristic function of a subset, 10 Chow parameters, 941 closed function, 221 closed interval in Rn , 390 closed segment, 117 closed set, 165 generated by a subset, 31 closed sphere, 46 closed-open interval in Rn , 390 closed-open segment, 117 closure operator, 29 closure system on a set S, 28 cluster point, 179 cofinal set, 204 collection intersection of a, B Baire set, 451 Baire space, 170 balanced set, 138 Banach space reflexive, 614 957 May 2, 2018 11:28 958 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page 958 Mathematical Analysis for Machine Learning and Data Mining union of a, collection of neighborhoods, 174 collection of sets restriction of a, 39 compact linear operator, 344 compact set, 189 complementary slack condition, 833 complete simplex, 148 complex measure total variation of a, 464 variation of a, 464 complex numbers extended set of, 258 complex-valued function integrable, 505, 548 concave function, 748 subgradient of a, 781 cone, 130 connected component of an element, 223 constraint qualification, 852 contingent cone, 370 continuity from above, 415 continuity from below, 415 continuous function, 213 continuous function at a point, 210 contracting sequence of sets, 16 contraction, 296, 639 contraction constant, 296, 639 convergence in measure, 552 convergent sequence of sets, 17 convex closure of a set, 124 convex combination, 120 convex function, 748 closed, 751 subgradient of a, 781 convex hull of a set, 124 convex set, 118 face of a, 133 Minkowski functional of a, 140 proper face of a, 133 relative border of a, 355 relative interior of a, 352 support function of a, 810 supporting set of a, 373 core intrinsic of a set, 147 cosine squasher, 894 countable additivity property of measures, 398 covariance of random variables, 678 cover subcover of a, 189 critical point, 656, 664 cycle, 49 trivial, 49 D data fitting, 920 data sample, 926 decreasing mapping, 27 derived set, 179 descent direction, 820 design matrix, 909 dimension of an affine subspace, 122 directed set, 204 directional derivative of a function, 630 discontinuity point jump of a function in a, 203 of first type, 203 of second type, 203 dissimilarities definiteness of, 43 evenness of, 43 dissimilarity, 43 space, 43 dissimilarity space amplitude of a sequence in a, 43 distance, 44 Hamming, 45 distribution marginal distributions of a, 577 doubly stochastic matrix, 48 dual optimization problem, 843 duality, 113 E epigraph of a function, 237 May 2, 2018 11:28 Mathematical Analysis for Machine Learning Index equality constraints, 826 equicontinuity of a collection of functions, 315 essential infimum, 446 essential lower bound, 446 essential supremum, 446 essential upper bound, 446 event, 464 exposed faces, 134 extended dissimilarity on a set, 43 extended dissimilarity space, 43 extreme point, 132 F F -closed subset of a set, 33 Fδ -set, 275 face k-, 133 feasible direction, 820 feasible region, 826 feature map, 932 feature space, 932 Fenchel conjugate of a function, 808 filter, 201 basis, 201 sub-basis, 201 finite additivity property of measures, 399 finite intersection property, 190 fixed point, 298 flat, 121 Fourier expansion, 107 function, 10 a-strongly convex, 814 absolutely continuous, 571 affine, 750 bijective, 11 choice, 12 coercive, 252 continuous in a point, 265 continuity modulus of a, 320 continuously differentiable, 626 differential of a, 625 discontinuous in a point, 202 9in x 6in b3234-main page 959 959 discriminatory relative to a regular Borel measure, 896 domain of a, 10 Fermi logistic, 894 gradient of a, 633 Heaviside, 894 Hessian matrix of a, 655 injective, 11 integrable, 500 left inverse of a, 11 left limit of a, 262 left-continuous, 211 limit of a, 261 Lipschitz constant for a, 296 locally Lipschitz, 296 lower semicontinuous, 230 measurable complex-valued, 398 with complex values, 414 measurable on a set, 483 negative part of, 411 onto, 11 positive part of, 411 pseudo-convex, 773 ramp, 894 range of a, 10 restriction of a, 11 right inverse of a, 12 right limit of a, 262 right-continuous, 211 semicontinuous, 230 sesquilinear, 86 sigmoidal, 894 squashing, 894 strictly pseudo-convex, 773 strongly convex, 814 support of a, 215 surjective, 11 total, 11 uniformly continuous, 265 upper semicontinuous, 230 function of positive type, 721 functional, 134 functions equivalent in measure, 562 May 2, 2018 11:28 960 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page 960 Mathematical Analysis for Machine Learning and Data Mining G Gδ -set, 275 Gibbs distribution, 859 Gram-Schmidt algorithm, 100 greatest element, 19 greatest lower bound, 20 H Hausdorff metric hyperspace, 302 Hilbert space, 677 hinge function, 794 homeomorphic topological spaces, 218 homeomorphism, 218 homogeneous polynomial, 631 hyperplane, 110 vector normal to a, 112 vertical, 112 hypograph of a function, 237 I I-open subsets of a set, 34 identity complex polarization, 89 real polarization, 90 image of a set, 14 inclusion-exclusion equality, 406 increasing mapping, 27 indefinite integral, 549 independent σ-algebras, 467 independent events, 466 indicator function of a subset, 10 inequality constraints, 826 inequality Bessel, 689 Buneman, 43 Cauchy-Schwarz, 558 Chebyshev, 511 Finite Bessel, 688 Hă older, 557 Markov, 511 inmum, 20 inmum of a function, 445 infinite arithmetic progression, 185 infinite sequence on a set, 13 inner product, 85 inner product space, 85 instance of the least square problem, 910 integrable function, 492 integral on a measurable subset, 499 interior of a set, 169 interior operator, 33 interior system, 33 intersection associativity of, commutativity of, idempotency of, isolated point, 179 isometric metric spaces, 47 isometry, 47 iteration of a function, 297 J Jacobi’s method, 885 Jacobian determinant of a function, 635 Jacobian matrix, 634 K K-closed subsets of a set, 30 Kantorovich’s inequality, 860 Karush-Kuhn-Tucker conditions, 837 Karush-Kuhn-Tucker sufficient conditions, 842 L Lagrange multipliers, 833 Lagrangian function Fritz John, 828 Kuhn-Tucker, 828 lateral limits, 202 learning rate, 939 least element, 19 least upper bound of a set, 20 Lebesgue’s Lemma, 269 May 2, 2018 11:28 Mathematical Analysis for Machine Learning Index level set of a function, 238 limit of a function along a filter base, 202 limit of a sequence of sets, 17 limit point, 179 line determined by a point and a vector, 118 determined by two points, 118 linear functional, 74, 134 linear operator, 74 compact, 610 eigenvalue of a, 84 eigenvector of a, 84 invariant subspace, 84 resolvent of a, 84 linear operator associated to a matrix, 81 linear regression, 909 linear space basis of a, 69 bounded set in a, 333 endomorphism of a, 75 Hamel basis of a, 69 linear combination of a subset of a, 68 linear operator on a, 75 Minkowski sum of subsets of a, 329 partially ordered, 132 reflection in a, 77 set spanning a, 69 set that generates a, 69 subspace of a, 69 translation in a, 77 trivial subspace of a, 70 zero element of a, 66 linearly dependent set, 68 linearly independent set, 68 linearly separable, 357, 926 linearly separable set, 939 Lipschitz function, 296, 639 local basis of a point, 178 local extremum, 656, 820 local maximum, 656 local maximum of a functional, 819 local minimum, 656 9in x 6in b3234-main page 961 961 local minimum of a functional, 819 locally convex, 119 locally convex linear space, 338, 339 logistic function, 894 lower bound, 19 lower limit of a sequence of sets, 17 M μ-measurable set, 419 mapping bilinear, 113 matrix Hermitian conjugate, 66 transpose, 67 matrix associated to a linear mapping, 80 maximal element, 21 measurable function, 393 Borel, 394 measurable set Lebesgue, 423 measurable space, 385 measure, 398 σ-finite, 400 absolutely continuous measure with respect to another, 525 Borel, 450 complete, 401 complex, 462 counting, 399 density of a, 551 Dirac, 399 finite, 400 Fourier transform of a, 566 image, 399 inner regular, 450 Lebesgue outer measure, 418 Lebesgue-Stieltjes, 571 locally finite, 450 monotonicity of, 400 outer regular, 450 probability, 464 Radon, 450 semi-finite, 400 tight, 452 May 2, 2018 11:28 962 Mathematical Analysis for Machine Learning 9in x 6in b3234-main Mathematical Analysis for Machine Learning and Data Mining measure space, 398 complete, 401 completion of a, 401, 404 extension of a, 403 Hahn decomposition of a, 459 property of elements in a, 401 regular set in a, 452 measures modularity property of, 400 mutually singular, 459 mesh of a triangulation, 146 metric, 44 discrete, 45 Hausdorff, 302 Minkowski, 95 topology induced by a, 255 metric space, 44 bounded set in a, 46 complete, 276 diameter of a, 46 diameter of a subset of a, 46 distance between an element and a set in a, 266 precompact, 291 separate sets in a, 268 topological, 256 totally bounded, 291 minimal element, 21 minimum bounding sphere, 847 monotonic mapping, 27 monotonicity of measures, 400 morphism of posets, 27 multinomial coefficient, 52 multinomial formula, 53 convergent, 205 eventually bounded, 333 eventually in a set, 205 finite, 291 frequently in a set, 205 monotonic, 205 r-, 291 subnet of a, 204 net is eventually in a set, 205 net is frequently in a set, 205 neural network, 895 neuron, 893 activation function of a, 893 threshold value of a, 893 weights vector of a, 893 non-negative combination, 120 non-negative measurable function Lebesgue integral of a, 491 non-negative orthant, 118 norm Euclidean, 94 metric induced by a, 95 Minkowski, 94 supremum, 286 uniform convergence, 286 normal subspace, 664 normal to a surface, 664 normed linear space, 88 normed space dual of a, 612 notation big O, 626 Landau, 626 small o, 626 null set, 401 N O negative open half-space, 112 neighborhood of infinity, 262 basis, 178 item of a point, 174 net, 204 anti-monotonic, 205 Cauchy, 333 clusters, 205 on-line learning, 938 one-class classification, 935 open function, 221 open interval in Rn , 390 open segment, 117 open set, 162 in R size of, 240 page 962 May 2, 2018 11:28 Mathematical Analysis for Machine Learning Index open sphere, 46 open-closed interval in Rn , 61, 390 open-closed segment, 117 ordering cone, 132 orthogonal set of vectors, 100 orthogonal subspaces, 103 orthogonal vectors, 100 orthonormal set of vectors, 100 outer measure, 417 regular, 426 outlier, 930 P pair ordered, components of an, parallel affine subspaces, 122 parallelipiped, 480 partial derivative, 630 partial order strict, 18 partially ordered set, 18 partition, blocks of a, decision function of a, 898 measurable, 385 related to a simple function, 393 partition topology, 185 perfect matching, 15 permutation, 47 cyclic, 49 cyclic decomposition of a, 50 descent of a, 50 even, 51 inversion of a, 50 odd, 51 permutation matrices, 48 permutation parity, 51 pointed cone, 130 points in a topological space, 162 pointwise convergence, 343 pointwise convergence of a sequence of linear operators, 344 polyhedron, 142 boundary hyperplane of a, 142 9in x 6in b3234-main page 963 963 polytope, 142 poset, 18 chain in a, 25 greatest element of a, 20 least element of a, 20 upward closed set in a, 163 positive combination, 120 positive linear functional, 540 positive open half-space, 112 pre-image of a set, 14 premeasure, 398 primal problem, 843 probability, 464 conditional, 465 probability conditioned by an event, 465 probability space, 464 probability space induced by a random variable, 572 product measurable space of a family of measurable spaces, 386 product of the topologies, 226 product of topological spaces, 226 pseudo-metric, 44 Q quadratic optimization problem, 846 quasi-concave function, 770 quasi-convex function, 770 R Radon-Nikodym derivative, 531 random variable, 467 σ-algebra generated by a, 467 conditional expectation, 580 discrete, 572 point mass of a, 572 distribution function of a, 573 distribution of a, 572 expectation of a, 572 mean value, 572 probability density function of a, 574 variance of a, 572 May 2, 2018 11:28 964 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page 964 Mathematical Analysis for Machine Learning and Data Mining random variables correlation coefficient of, 579 independent, 468, 577 uncorrelated, 579 version of a, 468 random vector, 577 distribution of a, 577 non-degenerate n-dimensional, 579 real linear space n-dimensional, 71 dimension of a, 71 rectangle on a set product, 387 reflexion homothety in a, 77 regressor, 910 regula falsi, 891 regular elements of an operator, 616 regular point, 664 relation antisymmetric, infix notation of a, irreflexive, on a pair of sets, partial order, reflexive, symmetric, transitive, transitive closure of a, 32 transitive-reflexive closure of a, 33 relatively open set, 352 remainder of order n, 654 residual of a vector, 874 reverse Fatou Lemma, 589 Riesz’ Lemma, 347 ridge loss function, 916 ring of sets on S, 61 S σ-algebra product, 386 separable, 471 saddle point, 656, 852 Fritz John, 828 Schauder basis, 314 section of a set, 533 semi-ring, 61 semimetric, 45 seminorm, 88 separating collection of seminorms, 339 separating hyperplane, 356 sequence, 13 Cauchy, 276 convergence from left, 261 convergence of order p of a, 865 convergent, 186, 260 limit point of a, 186 linear convergence of a, 865 quadratic convergence of a, 865 subsequence of a, 28 superlinear convergence of a, 865 convergent to a point, 186 divergent to +∞, 187 generated by conjugate directions, 876 of functions pointwise convergence of a, 283 uniform convergence of a, 283 sequence of sets contracting, 16 expanding, 16 monotone, 16, 30 series, 307 absolutely convergent, 308 convergent, 307 divergent, 307 in a normed space, 312 absolutely convergent, 313 partial sums of a, 312 terms of a, 312 partial sum of a, 307 semiconvergent, 308 sum of a, 307 terms of a, 307 set bounded, 19 complement of a, first category, 171 gauge on a, 45 index, 13 May 2, 2018 11:28 Mathematical Analysis for Machine Learning Index measurable, 385 nowhere dense, 171 product, relation on a, saturated by a partition, second category, 171 unbounded, 19 set bounded from below, 364 set of active constraints, 826, 836 set of continuous functions point separation by a, 286 set of permutations, 47 sets Cartesian product of, collection refinement of a, collection of, trace of, inclusion between, collection of hereditary, disjoint, equipollent, 941 symmetric difference of, entropy Shannon’s, 756 β-, 798 signed measure, 456 σ-finite, 461 absolute value of a, 461 finite, 461 Jordan decomposition of a, 460 negative set for a, 457 null set for a, 457 positive set for a, 457 total variation of a, 461 signed measure space, 456 similarity, 46, 295 ratio, 295 space, 46 simple function, 392 Lebesgue integral of a, 486 simplex, 142 dimension of a, 142 subsimplex of a, 145 triangulation of a, 146 9in x 6in b3234-main page 965 965 slack variables, 930 soft margin, 931 space of sequences p , 99 spectrum, 616 continuous, 616 point, 616 residual, 616 standard simplex, 145 star-shaped set, 119 stationary point, 656 step function, 560 stochastic matrix, 48 strict local maximum of a functional, 819 strict local minimum of a functional, 819 strict order, 18 strictly antimonotonic mapping, 27 strictly concave function, 748 strictly convex function, 748 strictly decreasing mapping, 27 strictly increasing mapping, 27 strictly monotonic mapping, 27 strictly quasi-convex function, 770 strictly separated sets, 357 subcollection, subcover of an open cover, 184 subdivision of an interval, 514 subset closed under a set of operations, 33 subset of a linear space in general position, 124 subspace affine, 110 orthogonal complement of a, 104 sum of two linear operators, 77 summable sequence of elements, 307 summable set, 942 support hyperplane, 134 support vector machines kernelization of, 933 support vectors, 927 supporting hyperplane, 357 supremum, 20 supremum of a function, 445 symmetric set, 138 May 2, 2018 11:28 966 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page 966 Mathematical Analysis for Machine Learning and Data Mining system π-, 41 Dynkin, 41 irreducibly inconsistent, 367 minimally inconsistent, 367 system of normal equations, 911 T tangent hyperplane, 664 tangent subspace, 664 tangent vector, 370 Taylor’s Formula, 650 the Bolzano-Weierstrass property of compact spaces, 193 the class of functions C k (X), 649 the least square method, 910 Theorem Almgren, 752 Aronszajn, 729 Arzel` a-Ascoli, 317 Baire, 278 Banach Fixed Point, 298 Banach-Steinhaus, 603 Beppo Levi, 510 Birkhoff-von Neumann, 135 Bohnenblust-Karlin-Shapley, 779 Bolzano, 305 Bolzano-Weierstrass, 191, 193 Borwein, 786 Carath´eodory, 131, 137 Carath´eodory Extension, 425 Carath´eodory Outer Measure, 420 Chain Rule, 632 Closed Graph, 605 Completeness of Convergence in Measure, 555 Complex Hahn-Banach, 83 Complex Stone-Weierstrass, 290 Cram´er-Wold, 568 Dedekind, 23 Dini, 284 Dominated Convergence, 508 Fan-Glicksburg-Hoffman, 775 Farkas, 366 Fritz John’s Necessary Conditions, 832 Fubini, 538 Hall’s Perfect Matching, 15 Heine, 295 Gauss-Markov, 913 Geometric Version of Hahn-Banach, 357 Gordan, 366 Heine-Borel, 293 Helly, 154 Implicit Function, 640 Intermediate Value, 305 Inverse Function, 659 Inversion, 566 Jensen, 753 Jordan’s Decomposition, 460 Kantorovich, 871 Karush-Kuhn-Tucker, 837, 842 Krein-Milman, 374 Lax-Milgram, 739 Lebesgue Decomposition, 533 Leibniz, 308 Lindelă of, 184 Lyusternik, 610 Measure Continuity, 415 Mean Value, 499, 633 Mercer, 734 Monotone Collection, 39 Monotone Convergence, 494 Moreau-Rockafellar, 784 Motzkin, 368 Open Mapping, 604 Partition of Unity, 236 Prolongation, 354 Pythagora, 104 Radon-Nikodym, 531 Real Hahn-Banach, 82 Regularity Theorem for Measures on Rn , 447 Riemann, 310 Riesz, 553, 703 Riesz-Fischer, 563 Riesz-Markov-Kakutani, 546 Schoenberg, 731 Separation, 358 May 2, 2018 11:28 Mathematical Analysis for Machine Learning Index Shapley-Folkman, 138 Spectral Theorem for Compact Self-adjoint Operators, 711 Stone, 127 Stone-Weierstrass, 289 Strong Duality, 851 Tietze, 282 Tonelli, 537 Tychonoff, 229 Uniqueness of Measures, 414 Vitali, 448 Weak Duality, 844 topological F-linear space, 329 topological linear space complete, 333 quasi-complete, 333 sequentially complete, 333 topological property, 220 topological set compact, 189 topological space, 162 T0 , 194 T1 , 194 T2 , 194 T3 , 194 T4 , 194 arcwise connected, 250 basis of a, 180 boundary of a set in a, 172 clopen set in a, 184 closed cover in a, 184 completely regular, 218 connected, 222 connected subset of a, 222 continuous path in a, 250 cover in a, 184 dense set in a, 168 disconnected, 222 empty, 162 first axiom of countability for a, 183 Hausdorff, 194 locally compact, 197 compactification of a, 201 metrizable, 256 normal, 196 9in x 6in b3234-main page 967 967 open cover in a, 184 regular, 196 second axiom of countability for a, 183 separable, 168 separated sets in a, 242 subspace of a, 165 totally disconnected, 225 topologically equivalent metrics, 259 topology, 162 Alexandrov, 163 coarser, 165 cofinite, 163 discrete, 162 Euclidean, 256 finer, 165 indiscrete, 162 metrizable, 259 pull-back, 239 push-forward, 240 stronger, 165 sub-basis of a, 182 usual, 162 weak, 220 weaker, 165 total order, 25 totally bounded set, 291 trace of a matrix, 86 translation, 330 transposition, 49 standard, 50 tree metric, 44 triangular inequality, 43 U ultrametric, 44 ultrametric inequality, 43 uniform convergence, 343 uniform dense set on compact set, 903 uniform distribution, 575 uniform equicontinuity, 316 union associativity of, commutativity of, idempotency of, May 2, 2018 11:28 968 Mathematical Analysis for Machine Learning 9in x 6in b3234-main Mathematical Analysis for Machine Learning and Data Mining upper bound, 19 upper limit of a sequence of sets, 17 usual topology of the extended set of reals, 181 Uryson’s Lemma for normal spaces, 216 for locally compact spaces, 235 V variance, 511 variety, 121 Vitali cover, 448 volume, 427 homogeneity, 435 W Weierstrass M -test, 315 page 968 ... May 2, 2018 11:28 22 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page 22 Mathematical Analysis for Machine Learning and Data Mining upper bound for T , and in this case u cannot... −∞ if x < 0, May 2, 2018 11:28 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page Mathematical Analysis for Machine Learning and Data Mining and x · (−∞) = (−∞) · x = −∞ if... by taking May 2, 2018 11:28 Mathematical Analysis for Machine Learning 9in x 6in b3234-main page Mathematical Analysis for Machine Learning and Data Mining ∅ = S for the empty collection of subsets