Statistics, data mining, and machine learning in astronomy

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	8
Dung lượng	349,43 KB

Nội dung

Statistics, Data Mining, and Machine Learning in Astronomy Index 1/ f process, 458 1/ f 2 process, 458 α value, 150 absolute deviation, 79 ACF, see correlation functions, autocorrelation active learni[.]

Index 1/ f process, 458 1/ f process, 458 α value, 150 absolute deviation, 79 ACF, see correlation functions, autocorrelation active learning, 49 Advances in Machine Learning and Data Mining for Astronomy (WSAS), 10, 46, 47, 49, 275, 278, 280, 317 Aikake information criterion (AIC), 134, 352, 406, 432, 442 aliasing, 412 All of Nonparametric Statistics and All of Statistics: A Concise Course in Statistical Inference (Wass10), 9, 69, 85, 123, 128, 134, 199, 243, 251, 254 analytic function, angular frequency, 408 AR(1) process, 464 ARIMA, see autoregressive models arithmetic mean, 78; clipped, 94; standard error of, 83 ARMA, see autoregressive models Arnoldi decomposition, 309 associated set, 168 AstroML, 511; installing, 37 astronomical flux measurements, 15 atmospheric seeing, 410 autocorrelation function, see correlation functions, autocorrelation autocovariance function, 459 Auton Lab, 11 autoregressive models, 461–463; ARIMA, 463, 465; ARMA, 463, 465; linear, 462 bagging, see bootstrap aggregating band limited, 412 bandwidth, 378 Bar89, see Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences Bayes classifier, 369 Bayes’; rule, 73, 369; theorem, 177, 368 Bayes, Thomas, 175 BayesCosmo, see Bayesian Methods in Cosmology Bayesian blocks, 442, 451, 465 Bayesian inference, 175 —Bayes factor, 187, 225 —Bayes’ theorem, see Bayes’ theorem —classifier, 369 —conjugate priors, 184 —consistency principle, 181 —credible region, 179, 185 —empirical methods, 184, 368 —flat prior, 181 —global likelihood, 187 —hieararchical model, 184 —hyperparameters, 184 —hypothesis testing, 180, 188 —improper prior, 181 —indifference principle, 181 —informative priors, 180 —Jeffreys’ odds ratio scale, 187 —MAP estimate, 179 —marginal likelihood, 186 —marginal posterior pdf, 185 —marginalization, 179, 185 —Markov chain Monte Carlo, 230 —maximum entropy principle, 181 —model odds ratio, 186, 225 —model selection, 186, 223 —nonuniform priors, 191 —nuisance parameters, 177, 185 —numerical methods, 229 —Occam’s razor, 189 —parameter estimation, 196; binomial distribution, 206; Cauchy distribution, 208; effects of binning, 215; Gaussian distribution, 196; outlier rejection, 219; signal and background, 213; uniform distribution, 211 —posterior mean, 179 —prior, 176, 177 —prior predictive probability, 178 —priors, 180 —scale-invariant prior, 181 —uninformative priors, 180 Bayesian information criterion (BIC), 134, 190, 352, 406, 432, 442 Bayesian Logical Data Analysis for the Physical Sciences (Greg05), 9, 105, 182, 184, 231, 232, 243, 408, 413 Bayesian method, Bayesian Methods in Cosmology (BayesCosmo), 10, 231 Bayesian models, 46 beam convolution, 410 Bernoulli’s theorem, 105 Bessel’s correction, 82 bias, 7, 82 BIC, see Bayesian information criterion (BIC) “big O” notation, 44, 45 Boškovi´c, Rudjer, 345 boosted decision tree, 394 boosting, 393, 398, 399 bootstrap, 140, 391 bootstrap aggregating (bagging), 144 534 • Index Brownian motion, 458 burst signal, 453 c statistic, 147 CAR(1) process, 463, 464 Center for Astrostatistics at Penn State University, 11 central limit theorem, 105 Cepheid variables, 403, 426 chirp signal, 406, 453 class labels, 135 classification, 8, 145, 368; Benjamini and Hochberg method, 147; binomial logistic regression, 381; boundary, 146; c statistic, 147; comparison of methods, 397; completeness, 145; contamination, 145; decision tree, 399; discriminative, 367, 380, 385, 397; efficiency, 147; expectation maximization, see expectation maximization; false discovery rate, 147; Gaussian Bayes, 374; Gaussian naive Bayes, 372, 373, 395; generative, see generative classification; GMM Bayes classifier, 377, 398; k-nearest-neighbor, 399; logistic regression, 381, 398, 399; loss, see loss function; naive Bayes, 136, 371, 372, 399; nearest-neighbor, 378, 399; periodic light curves, 427, 443; RR Lyrae stars, 380; sensitivity, 147; simple, 145; supervised, 4, 365, 443; unsupervised, 3, 250, 365, 443 cluster finding, 249 clustering, 3, 270; K -means, 270; “friends-of-friends”, 275; comparison of methods, 281; dendogram, 274; hierarchical, 274; max-radius minimization, 271; mean shift, 271; minimum spanning tree, 275; unsupervised, 250 clusters of galaxies, Cochran’s theorem, 200 cocktail party problem, 313 code management tools, 13; CVS, 13; Git, 13; GitHub, 13 color–magnitude diagram, 22 comparable set, 168 completeness, 368, 372, 395 completeness vs purity, compressed sensing, 303 conditional density distribution, 379 conditional independence, 376 confidence estimation, 123 contamination, 368, 372 contingency table, 75 convolution, 407; convolving pattern, 410; of two functions, 409, 410; theorem, 409, 410, 419 coordinate gradient descent, 336 correlation coefficient, 109, 115; Kendall’s, 116; Pearson’s, 109, 115; population, 109; sample, 109; Spearman’s, 116 correlation functions, 277, 456; autocorrelation, 407, 456–458, 460, 461; covariance, 460; cross-correlation, 460; discrete correlation, 460; Edelson and Krolik’s discrete correlation function, 461; evenly sampled data, 460; n-point, 278; slot autocorrelation, 460; two-point, 277 cosine window, 416 cost function, 131 covariance, 46, 108, 456 covariance matrix, 294 credible region, 179 cross-matching, 47, 54 cross-validation, 144, 164, 352, 355, 379, 390, 392, 398 cross-validation error, 336 cross-validation score, 254 ctypes, see Python/wrapping compiled code cumulative distribution function, curse of dimensionality, 59, 289 cython, see Python/wrapping compiled code damped random walk, 463, 464 data structures; cone trees, 62; cover trees, 62 Data Analysis: A Bayesian Tutorial (Siv06), 9, 181, 182, 208 data cloning, 120, 264 data compression, 299 data mining, 3, data set tools, 14 —fetch_dr7_quasar, 23, 24, 396 —fetch_imaging_sample, 14, 18, 19, 269 —fetch_LINEAR_sample, 29, 440, 442, 443 —fetch_moving_objects, 30, 31, 34 —fetch_sdss_S82standards, 27, 28, 32, 33, 269 —fetch_sdss_specgals, 22, 23, 167, 280, 390, 392, 395 —fetch_sdss_spectrum, 19–21, 425, 426 —fetch_sdss_sspp, 25, 26, 34, 261, 272, 274, 396 —plotting, 31; all-sky distributions, 35; basemap, 37; contour, 32; density, 32; Hammer–Aitoff projection, 35, 36; HEALPix, 37; high dimension, 33; Lambert azimuthal equal-area projection, 36; Mercator projection, 35; Mollweide projection, 36 data sets —LIGO “Big Dog” data, 16, 416, 417 —LINEAR, 27, 29, 403, 438, 440, 442, 443, 445, 446, 448, 449 —RR Lyrae stars, 365, 372, 374, 376–378, 380, 382, 384–388, 395, 396, 426 —SDSS galaxy data, 21, 23, 167, 280, 390, 392, 395 —SDSS imaging data, 16, 269 —SDSS moving objects, 30, 31, 34 —SDSS photometric redshift data, 394, 395 Index —SDSS quasar data, 23, 24, 366, 396 —SDSS spectroscopic data, 19, 21, 291, 298–300, 304, 425, 426 —SDSS stars, 25, 26, 32, 34, 425, 426 —SDSS stellar data, 261, 272, 274, 366, 396 —SDSS Stripe 82, 26; standard stars, 26, 28, 32, 33, 269, 365; simulated supernovas, 5, 325, 328 data smoothing, 249 data structures; kd-tree, 58, 60; B-tree, 51, 53; ball-tree, 60, 62; cosine trees, 62; maximum margin trees, 62; multidimensional tree, 53; oct-tree, 57; orthogonal search trees, 62; partition, 59; quad-tree, 57–59; trees, 47, 51, 386 data types, 43; categorical, 8, 43; circular variables, 43; continuous, 43; nominal, 43; ordinal, 43; ranked variables, 43 data whitening, 298 decision boundary, 370, 380, 386, 397 decision tree, 386, 388, 389, 398, 399 declination, 16, 18 deconvolution, 407; of noisy data, 410 degree of freedom, 98 δ Scu, 446 density estimation, 3, 249, 367, 371; Bayesian blocks, 259; comparison of methods, 281; deconvolution KDE, 256; extreme deconvolution, 264; Gaussian mixtures, 259; kernel (KDE), 48, 251; kernel cross-validation, 254; nearest-neighbor, 257; nonparametric, 250; number of components, 264; parametric, 259 descriptive statistics, 78 DFT, see Fourier analysis, discrete Fourier transform Dickey–Fuller statistic, 463 differential distribution function, digital filtering, 421 Dijkstra algorithm, 311 dimensionality, dimensionality reduction, 289; comparison of methods, 316 discriminant function, 369, 375, 384, 395 discriminative classification, see classification distance metrics, 61 distribution functions, 85 —χ , 96 —Bernoulli, 89, 381 —beta, 101 —binomial, 89 —bivariate, 108; Gaussian, 109 —Cauchy, 92, 459 —exponential, 95 —Fisher’s F , 100 —gamma, 102 —Gauss error, 88 —Gaussian, 87; convolution, 88; Fourier transform, 88 —Hinkley, 94 —Laplace, 95 —Lilliefors, 158 —Lorentzian, 92 —multinomial, 90 —multivariate, 108; Gaussian, 372, 373; normal, 87; Poisson, 91; Student’s t, 99; uniform, 85; Weibull, 103 DR7 Quasar Catalog, 366 dynamic programming, 47, 228 • 535 error rate, 367 estimator, 82 —asymptotically normal, 83 —bias of, 82 —consistent, 82 —efficiency, 83 —Huber, 345 —Landy–Szalay, 279 —luminosity function, 166 —Lynden-Bell’s C − , 168 —maximum a posteriori (MAP), 179 —maximum likelihood, 124, 125; censored data, 129; confidence interval, 128; heteroscedastic Gaussian, 129; homoscedastic Gaussian, 126; properties, 127; truncated data, 129; minimum variance unbiased, 83; robust, 83; Schmidt’s 1/Vmax , 168; unbiased, 82; uncertainty, 82; variance of, 82 Euler’s formula, 409 expectation maximization (EM), 46, 136, 204, 223, 260, 374 expectation value, 78 exploratory data analysis, 4, 249 extreme deconvolution, 264 f2py, see Python/wrapping compiled code false alarm probability, 437 false discovery rate, 147 false negative, 145, 368 false positive, 145, 368 false-positive rate, 405 Eddington–Malmquist bias, 191 FastICA, 315 FB2012, see Modern Statistical Edgeworth series, 160 Methods for Astronomy With efficiency, 395 R Applications eigenspectra, 298 eigenvalue decomposition, 294 FFT, see Fourier analysis, fast Fourier transform empirical Bayes, see Bayesian fingerprint database, 418 inference finite sample size, empirical pdf, 6–8 Fisher’s linear discriminant ensemble learning, 391, 398 (FLD), 375 entropy, 389 Epanechnikov kernel, 255, 273 fitting, flicker noise, 458 error bar, Floyd–Warshall, 311 error distribution, 7, 536 • Index flux measurements, astronomical, 15 Fourier analysis, 406 —band limit, 521 —Bayesian viewpoint, 433 —discrete analog of PSD, 412 —discrete Fourier transform (DFT), 410, 521 —fast Fourier transform (FFT), 408, 415, 521; aliasing, 522; in Python, 500, 523; ordering of frequencies, 522 —Fourier integrals, 410 —Fourier terms, 465 —Fourier transform, 459; approximation via FFT, 521; inverse discrete Fourier transform , 411; inverse Fourier transform, 422; irregular sampling window, 414; regularly spaced Fourier transform, 414; RR Lyrae light curves, 406; transform of a pdf, 409; truncated Fourier series, 442; window function, 414 Freedman–Diaconis rule, 164 frequentist paradigm, 123 function transforms, 48 functions; beta, 100; characteristic, 105; correlation, see correlation functions; gamma, 97, 101; Gauss error, 88; Huber loss, 345; kernel, 251; likelihood, 125; marginal probability, 72; probability density, 71; regression, 334; selection, 166 geometric random walk, 462 Gini coefficient, 154, 389 GMM Bayes classification, see classification goodness of fit, 132 Gram–Charlier series, 81, 160 graphical models, 46 Greg05, see Bayesian Logical Data Analysis for the Physical Sciences Guttman–Kaiser criterion, 302 interquartile range, 81 intrinsic dimension, 63 IsoMap, 311 isometric mapping, 311 IVOA (International Virtual Observatory Alliance), 11 Hadoop, 44 Hanning, 416 hashing and hash functions, 51 Hertzsprung–Russell diagram, 25 Hess diagram, 32 heteroscedastic errors, 460, 465 hidden variables, 135 high-pass filtering, 424 histograms, 6, 163; Bayesian blocks, 228; comparison of methods, 226; errors, 165; Freedman–Diaconis rule, 164; Knuth’s method, 225; optimal choice of bin size, 6; Scott’s rule, 164 homoscedastic errors, 7, 460; Gaussian, 405, 427 HTF09, see The Elements of Statistical Learning: Data Mining, Inference, and Prediction Hubble, Edwin, 365 hypersphere, 290 hypothesis testing, 77, 123, 144, 370, 404; multiple, 146 K nearest neighbors, see clustering Kaiser’s rule, 302 Kalman filters, 465 Karhunen–Loéve transform, 292 Karpathia, 130 kernel density estimation, 49, see density estimation kernel discriminant analysis, 377, 378, 398, 399 kernel regression, 48, 338, 379 knowledge discovery, Kullback–Leibler divergence, 183, 389 kurtosis, 79 GalaxyZoo, 367 Galton, Francis, 321 Gardner, Martin, 74 Gauss–Markov theorem, 332 Gaussian distribution, see distribution functions Gaussian mixture model (GMM), 46, 259, 377 Gaussian mixtures, 134, 374, 400, 446, 447 Gaussian process regression, 48 generative classification, 367, 368, 397 independent component analysis (ICA), 313 inference, —Bayesian, see Bayesian inference —classical, 71 —statistical, 123; types of, 123 information content, 389 information gain, 389 Information Theory, Inference, and Learning Algorithms, 10 installing AstroML, 37 interpolation, 412, 501 jackknife, 140 Jay03, see Probability Theory: The Logic of Science Jeffreys, Harold, 175 Lagrangian multipliers, 182, 294 Landy–Szalay estimator, 279 Laplace smoothing, 372 Laplace, Pierre Simon, 175 Laser Interferometric Gravitational Observatory (LIGO), 16, 403, 415 LASSO regression, 48, 335 learning curves, 356 leptokurtic, 80 LEV diagram, 302 Levenberg–Marquardt algorithm, 341 light curves, 5, 404 LIGO, see Laser Interferometric Gravitational Observatory likelihood, 125 LINEAR, 16 linear algebraic problems, 46 LINEAR data set, see data sets linear discriminant analysis (LDA), 374, 376, 381, 398 locality, 47 Index locally linear embedding (LLE), 3, 307 locally linear regression, 339 location parameter, 78 logistic regression, see classification loss function, 345, 367 lossy compression, 303 low signal-to-noise, 465 low-pass filters, 422 lowess method, 340 luminosity distribution, luminosity functions; 1/Vmax method, 168; C − method, 168; Bayesian approach, 172; estimation, 166 Lup93, see Statistics in Theory and Practice Lutz–Kelker bias, 191 Lynden-Bell’s C − method, 168 machine learning, 3, 4, magic functions, 51 magnitudes, 515; astronomical, 78; standard systems, 516 Mahalanobis distance, 374, 379 Malmquist bias, 191 manifold learning, 47, 306; weaknesses, 312 MAP, 429, 441 MapReduce, 49 Markov chain Monte Carlo (MCMC), 46, 231, 451, 453, 454; detailed balance condition, 231; emcee package, 235; Metropolis–Hastings algorithm, 231, 340; PyMC package, 233 Markov chains, 465 matched filters, 418, 452, 454, 465 maximum likelihood, see estimator maximum likelihood estimation, 371 McGrayne, Sharon Bertsch, 175 mean, 46 mean deviation, 81 mean integrated square error (MISE), 131 median, 79; standard error, 84 memoization, 47 Miller, George, 365 minimum component filtering, 424 minimum detectable amplitude, 405 minimum variance bound, 83 misclassification rate, 367 mixtures of Gaussians, see Gaussian mixture model (GMM) mode, 79 model comparison, 133 model parameters, model selection, 77, 398, 452 models; Bayesian, 46; Gaussian mixtures, see Gaussian mixture model (GMM); hieararchical Bayesian, 184; non-Gaussian mixtures, 140; state-space, 465 Modern Statistical Methods for Astronomy With R Applications (FB2012), 10, 437, 458, 463 Monte Carlo, 229; samples, 119 Monty Hall problem, 73 morphological classification of galaxies, 365 multidimensional color space, multidimensional scaling framework (MDS), 311 multiple harmonic model, 438 MythBusters, 74 • 537 nonnegative matrix factorization (NMF), 305 nonparametric bootstrap resampling, 437 nonparametric method, nonparametric models, 4, nonuniformly sampled data, 414 null hypothesis, 144 number of neighbors, 379 Numerical Recipes: The Art of Scientific Computing (NumRec), 8, 50, 120, 135, 141, 151, 156, 162, 408, 415, 418, 422, 424, 435, 436 NumRec, see Numerical Recipes: The Art of Scientific Computing Nyquist; frequency, 415, 436, 522; limit, 422; Nyquist–Shannon theorem, 412; sampling theorem, 412, 521 O(N), 45 Occam’s razor, 189 online learning, 48 optical curve, 448 optimization, 46, 501 Ornstein–Uhlenbeck process, 463 outliers, 80, 83 overfitting, 380, 391 p value, 144 parallel computing, 49 N-body problems, 46, 53 parallelism, 49 Nadaraya–Watson regression, parameter estimation, 406, 452; 338 deterministic models, 406 naive Bayes, see Bayesian parametric methods, 6, 398 inference Pareto distribution, 459 nearest neighbor, 47, 49; Parseval’s theorem, 409 all-nearest-neighbor search, Pattern Recognition and 54; approximate methods, 63; Machine Learning, 10 bichromatic case, 54; pdf, monochromatic case, 54; periodic models, 405 nearest-neighbor distance, periodic time series, 426 57; nearest-neighbor search, periodic variability, 465 53 periodicity, 434 neural networks, 398–400 periodograms, 430, 441, 444, no free lunch theorem, 397 448; definition of, 430; nonlinear regression, 340 generalized Lomb–Scargle, 538 • Index 438; Lomb–Scargle periodogram, 426, 430, 434–436, 438, 442, 444, 449, 465; noise, 431 phased light curves, 441, 442 photometric redshifts, 366, 390 pink noise, 409, 458 platykurtic, 80 point estimation, 123 population pdf, 6, population statistics, 78 power spectrum, 407, 409, 430, 454; estimation, 415 Practical Statistics for Astronomers (WJ03), 9, 69, 424 precision, see efficiency prediction, principal axes, 111 principal component analysis (PCA), 3, 49, 292, 444; missing data, 302 principal component regression, 337 probability, 69 —axioms, 69; Cox, 71; Kolmogorov, 70; conditional, 70, 72; density function, 71; law of total, 71, 72; notation, 69; random variable, 5; sum rule, 70 probability density, 368 probability density functions, 5, probability distribution, 5, 43 probability mass function, Probability Theory: The Logic of Science (Jay03), 9, 71, 182 programming languages —Python, 471 —C, 507 —C++, 507 —Fortran, 37, 507 —IDL, 37 —Python, 12 —R, 10 —SQL (Structured Query Language), 14–16, 44, 50, 53, 519; where, 17 projection pursuit, 3, 314 PSD, see power spectrum Python —AstroML, see AstroML —further references, 508 —installation, 474 —introduction, 471 —IPython, 473, 486; documentation, 487; magic functions, 488 —Matplotlib, 473, 494 —NumPy, 472, 488, 498; efficient coding, 503; scientific computing, 472; SciPy, 472, 498; tutorial, 474; wrapping compiled code, 506 kernel, 338; LASSO, 335; learning curves, 356; least absolute value, 345; least angle, 336; linear models, 325; local polynomial, 340; locally linear, 339; M estimators, 345; maximum likelihood solution, 327; method of least squares, 326; multivariate, 329; nonlinear, 340; overfitting, 352; polynomial, 330; principal component, 337; regularization, 332; ridge, 333; robust to outliers, 344; sigma clipping, 345; quadratic discriminant analysis Theil–Sen method, 345; (QDA), 375, 376, 398 toward the mean, 150; quadratic programming, 383 uncertainties in the data, 342; quantile, 79; function, 6; underfitting, 352 standard error, 84 regression function, 369 quartile, 81 regularization, 332; LASSO quasar, regression, 335; ridge quasar variability, 458, 460, 463, regression, 333; Tikhonov, 464 333 quicksort, 51 Relational Database Management System, 44 random forests, 391, 398, 399 relative error, 63 random number generation, resolution, 412 119 responsibility, 136 random walk, 449, 458, 462, 463 ridge regression, 333 rank error, 63 ridge regularization, 384 Rayleigh test, 448 right ascension, 16, 18 RDBMS, see Relational risk, 367 Database Management robustness, 80 System runtime, 45 recall, see completeness, 368 recall rate, 147 sample contamination, receiver operating characteristic 405 (ROC) curve, 147, 395 sample selection, red noise, 409, 458 sample size, regression, 4, 321 sample statistics, 78, 81 —Bayesian outlier methods, 346 sampling, 49; window, 414; —comparison of methods, 361 window function, 414 —cross-validation, 355; K -fold, Savitzky–Golay filter, 424 360; leave-one-out, 360; scale parameter, 78 random subset, 360; twofold, scatter, 360; design matrix, 327; SciDB, 44 formulation, 322; Gaussian Scott’s rule, 164 basis functions, 331; Gaussian scree plot, 298 process, 349; Gaussian SDSS “Great Wall”, 250, 255, vs Poissonian likelihood, 275 215; Kendall method, 345; searching and sorting, 50, 51 Index SEGUE Stellar Parameters Catalog, 366 selection effects, 166 selection function, self-similar classes, sensitivity, see completeness Shannon interpolation formula, 412 shape parameter, 78 Sheldon, Erin, 13 significance level, 144 Simon, Herbert, 365 sinc-shifting, 412 sine wave, 415 single harmonic model, 405, 427, 433, 435, 438, 465 single-valued quantity, singular value decomposition, 295, 337 singular vectors, 295 Siv06, see Data Analysis: A Bayesian Tutorial skewness, 79 Sloan Digital Sky Survey (SDSS), 15, 250 —Catalog Archive Server (CAS), 15; CASJobs, 17; PhotoObjAll, 17; PhotoTag, 17; Schema Browser, 17 —Data Release 7, 15 —Data Release 8, 22 —Data Release 9, 25 —flags, 17 —magnitudes; model magnitudes, 17; Petrosian magnitudes, 22; PSF magnitudes, 17; object types, 17; SEGUE Stellar Parameters Pipeline, 25; spectroscopic follow-up, 15; Stripe 82, 15, 32, 372 Sobolev space, 163 software packages; Python, 471; AstroML, 12, 511; AstroPy, 13; AstroPython, 13; Chaco, 473; CosmoloPy, 13; esutil, 13; HealPy, 13; IPython, 14, 52, 473; Kapteyn, 13; Markov chain Monte Carlo, 13; Matplotlib, 12, 473; MayaVi, 473; NetworkX, 473; Numerical Python, 12, 18, 472; Pandas, 473; PyMC, 13; Python, 12; Scientific Python, 12, 472; Scikit-learn, 12, 473; Scikits-image, 473; Statsmodels, 473; SymPy, 473; urllib2, 20 sorting, 51 specific flux, 515 spectral window function, 414 spherical coordinate systems, 35 spherical harmonics, 37 standard deviation, 7, 79 state-space models, 465 stationary signal, 452 statistically independent, Statistics in Theory and Practice (Lup93), 9, 37, 69, 81, 84, 85, 105, 117, 118, 127, 141–143, 176, 208 Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences (Bar89), 9, 69 stochastic programming, 48 stochastic time series, 458 stochastic variability, 455 streaming, 48 structure function, 458, 460 sufficient statistics, 199 sum of sinusoids, 406 supervised classification, see classification supervised learning, support vector machines, 48, 382, 384, 398, 399 support vectors, 382 SWIG, see Python/wrapping compiled code SX Phe, 446 telescope diffraction pattern, 410 temporal correlation, 404 tests; Anderson–Darling, 154, 157; F , 162; Fasano and Franceschini, 156; Kolmogorov–Smirnov, 151; Kuiper, 152; Mann–Whitney–Wilcoxon, 155; non-Gaussianity, 157; nonparametric, 151; • 539 parametric, 160; power, 145; Shapiro–Wilk, 158; t, 161; U , 155; Welch’s t, 162; Wilcoxon rank-sum, 155; Wilcoxon signed-rank, 155 The Elements of Statistical Learning: Data Mining, Inference, and Prediction (HTF09), 9, 134, 136, 137, 141, 147, 181 “The magical number ± 2”, 365 The Visual Display of Quantitative Information, 31 time series, 403, 406, 458; comparison of methods, 465 top-hat, 416 total least squares, 343 training sample, tree traversal patterns, 378 tricubic kernel, 340 trigonometric basis functions, 417 Two Micron All Sky Survey (2MASS), 15 Type I and II errors, 145, 368 uncertainty distribution, uneven sampling, 465 unevenly sampled data, 460, 461 uniformly sampled data, 410 unsupervised classification, see classification unsupervised clustering, see clustering unsupervised learning, Utopia, 130 variability, 404 variable —categorical, 371 —continuous, 372 —random, 71; continuous, 71; discrete, 71; independent, 71; independent identically distributed, 71; transformation, 77 variance, 46, 79; of a well-sampled time series, 405 variogram, 458 vectorized, 55 540 • Index Voronoi tessellation, 379 vos Savant, Marilyn, 74 418; Mexican hat, 418; Morlet, 418; PyWavelets, 418; wavelet PSD, 418, 419 Wass10, see All of weave, see Python/wrapping Nonparametric Statistics and compiled code All of Statistics: A Concise Welch’s method, 416 Course in Statistical Inference whitening, 298 wavelets, 418, 454; Daubechies, Whittaker–Shannon, 412 418; discrete wavelet width parameter, 78 transform (DWT), 418; Haar, Wiener filter, 422, 423 Wiener–Khinchin theorem, 457, 461, 463 WJ03, see Practical Statistics for Astronomers WMAP cosmology, 170 WSAS, see Advances in Machine Learning and Data Mining for Astronomy zero-one loss, 367 ... 422, 423 Wiener–Khinchin theorem, 457, 461, 463 WJ03, see Practical Statistics for Astronomers WMAP cosmology, 170 WSAS, see Advances in Machine Learning and Data Mining for Astronomy zero-one... choice of bin size, 6; Scott’s rule, 164 homoscedastic errors, 7, 460; Gaussian, 405, 427 HTF09, see The Elements of Statistical Learning: Data Mining, Inference, and Prediction Hubble, Edwin, 365... Elements of Statistical Learning: Data Mining, Inference, and Prediction (HTF09), 9, 134, 136, 137, 141, 147, 181 “The magical number ± 2”, 365 The Visual Display of Quantitative Information, 31 time

Ngày đăng: 20/11/2022, 11:16