Chemometrics in Food Chemistry Intentionally left as blank Data Handling in Science and Technology Volume 28 Chemometrics in Food Chemistry Edited by Federico Marini Department of Chemistry, University of Rome “La Sapienza”, Rome, Italy AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD PARIS • SAN DIEGO • SAN FRANCISCO • SYDNEY • TOKYO Elsevier The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands First edition 2013 Copyright © 2013 Elsevier B.V All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+ 44) (0) 1865 843830; fax (+ 44) (0) 1865 853333; email: permissions@elsevier.com Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-444-59528-7 ISSN: 0922-3487 For information on all Elsevier publications visit our web site at store.elsevier.com Printed and bound in Great Britain 13 14 15 16 10 Contents Contributors Preface Introduction xi xiii Federico Marini Another Book on the Wall Organisation of the Book References Part I Theory Experimental Design Riccardo Leardi Introduction Full Factorial Design 2k Plackett–Burman Designs Central Composite Design Doehlert Design D-Optimal Designs Qualitative Variables at More than Two Levels Mixture Designs Conclusions References Exploratory Data Analysis 12 24 29 38 40 43 46 52 53 55 Mario Li Vigni, Caterina Durante, and Marina Cocchi The Concept (Let Your Data Talk) Descriptive Statistics 2.1 Frequency Histograms 2.2 Box and Whisker Plots Projection Techniques 3.1 Principal Component Analysis 3.2 Other Projection Techniques Clustering Techniques 55 58 59 61 62 64 108 114 v vi Contents Remarks References 119 120 Regression 127 Frank Westad, Marta Bevilacqua, and Federico Marini Introduction Multivariate Calibration Theory 3.1 Univariate Linear Regression: Introducing the Least Squares Concept 3.2 Multivariate Generalization of the Ordinary Least Squares Approach 3.3 Principal Component Regression 3.4 PLS Regression 3.5 Principal Covariate Regression Validation 4.1 Test-set Validation 4.2 Cross-Validation Diagnostics and Error Measures 5.1 Diagnostics 5.2 Error Measures Model Interpretation 6.1 Interpretation of the Structured Part 6.2 Plots used to Detect Outliers Variable Selection 7.1 Using Model Parameter and Diagnostics 7.2 Model-Based Variable Importance 7.3 iPLS 7.4 Genetic Algorithms 7.5 Re-sampling Methods: Bootstrap, Jackknifing and Cross-Validation 7.6 Cross Model Validation References Classification and Class-Modelling 127 129 131 132 134 136 138 143 144 145 145 147 147 154 157 157 159 161 163 163 165 165 167 168 168 171 Marta Bevilacqua, Remo Bucci, Andrea D Magrı`, Antonio L Magrı`, Riccardo Nescatelli, and Federico Marini Introduction 1.1 Classification of Classification Methods Discriminant Classification Methods 2.1 Linear and QDA 2.2 Extended Canonical Variates Analysis 2.3 Partial Least Squares Discriminant Analysis 2.4 k Nearest Neighbours 2.5 Density-Based Methods (Potential Functions) 171 172 176 177 188 195 205 208 Contents 2.6 Other Discriminant Classification Methods Class-Modelling Methods 3.1 Soft Independent Modelling of Class Analogies 3.2 Unequal Class-Modelling 3.3 Potential Functions as Class-Modelling Methods Conclusions References Multivariate Curve Resolution Methods for Food Chemistry vii 215 215 215 224 225 231 231 235 Anna de Juan and Sı´lvia Mas Introduction MCR: The Basics MCR Applied to Qualitative and Quantitative Analysis of Compounds in Food Samples MCR and Food Fingerprinting MCR for Food Processes Conclusions References Multiway Methods 235 236 241 247 253 258 259 265 Jose´ Manuel Amigo and Federico Marini Introduction: Why Multiway Data Analysis? Nomenclature and General Notation Parallel Factor Analysis 3.1 The General PARAFAC Model 3.2 PARAFAC Iterations Convergence to the Solution Alternating Least Squares 3.3 Properties of PARAFAC Model 3.4 Model Validation Selection of the Number of Factors 3.5 Imposing Constraints to the Model 3.6 PARAFAC in Practice Parallel Factor Analysis 4.1 PARAFAC2 General Model 4.2 Resemblances and Dissimilarities Between PARAFAC and PARAFAC2 4.3 Application of PARAFAC2 in Food Research Tucker Models 5.1 Mathematical Formulation of the Tucker3 Model 5.2 Properties of the Tucker3 Model 5.3 Other Tucker Models 5.4 Some Considerations on the Core Array 5.5 Calculating a Tucker3 Model 5.6 Tucker3 in Practice Multiway Regression 266 266 270 270 271 272 273 275 276 281 282 283 285 287 287 288 289 290 291 292 298 viii Contents 6.1 Multilinear PLS (N-PLS) 6.2 Multiway Covariate Regression Future Perspectives References Robust Methods in Analysis of Multivariate Food Chemistry Data 298 305 308 309 315 Ivana Stanimirova, Michał Daszykowski, and Beata Walczak Introduction Basic Concepts in Robust Statistics 2.1 Classic and Robust Estimators of Data Location and Scale 2.2 Robust Estimates of Covariance and Multivariate Location and Scatter Robust Modelling of Data Variance 3.1 Spherical Principal Component Analysis 3.2 Robust PCA using PP with the Qn Scale 3.3 ROBPCA: A Robust Variant of PCA Classic and Robust Calibration 4.1 Partial Robust M-Regression 4.2 RSIMPLS and RSIMCD: Robust Variants of SIMPLS 4.3 Spatial Sign Preprocessing and Robust PLS 4.4 Identification of Outlying Samples using a Robust Model Discrimination and Classification 5.1 Classic and Robust Discrimination 5.2 Classic and Robust Classification Dealing with Missing Elements in Data Containing Outliers Further Reading and Software References 316 317 319 322 323 324 326 328 329 330 331 331 332 333 333 334 334 336 337 Part II Applications Hyperspectral Imaging and Chemometrics: A Perfect Combination for the Analysis of Food Structure, Composition and Quality 343 Jose´ Manuel Amigo, Idoia Martı´, and Aoife Gowen Introduction 1.1 Quality Assessment 1.2 The Role of Hyperspectral Image in Food Quality Assessment 1.3 The Need for Chemometrics 1.4 Objective of the Book Chapter 344 344 345 345 350 Contents Structure of a Hyperspectral Image Hyperspectral Analysis and Chemometrics: Practical Examples 3.1 Overview of HSI Data Analysis 3.2 Pre-processing Methods 3.3 Unsupervised Techniques to Explore the Image: PCA 3.4 Supervised Techniques for Classification of Features 3.5 Regression Modelling for Obtaining Quantitative Information from Hyperspectral Images Final Remarks References 10 The Impact of Chemometrics on Food Traceability ix 350 352 352 353 355 358 362 364 366 371 Lucia Bertacchini, Marina Cocchi, Mario Li Vigni, Andrea Marchetti, Elisa Salvatore, Simona Sighinolfi, Michele Silvestri, and Caterina Durante Introduction 1.1 Authenticity and Traceability: The European Union Point of View 1.2 Authenticity and Traceability: A Scientific Point of View Food Traceability Applications 2.1 Chemometrics Approaches for Soil Sampling Planning in Traceability Studies 2.2 Geographical Traceability of Raw Materials for PDO and PGI Oenological Products Food Authenticity Applications 3.1 Study of Grape Juice Heating Process in a Context of Quality Control of Food 3.2 Study of Sensory and Compositional Profiles During the Ageing Process of ABTM 3.3 Characterisation and Classification of Ligurian Extra Virgin Olive Oil References 11 NMR-Based Metabolomics in Food Quality Control 372 373 374 376 376 384 390 390 397 403 407 411 Alberta Tomassini, Giorgio Capuani, Maurizio Delfini, and Alfredo Miccheli Introduction Methodology 2.1 NMR Sample Preparation 2.2 NMR Acquisition and Processing Parameters 2.3 Targeted Analysis and Pattern Recognition NMR-Base Metabolomics Applications 3.1 Food Quality Control 3.2 Quality Control: Geographical Origin and Authentication 3.3 Quality Control, Adulteration, and Safety 411 412 413 415 418 420 420 426 433 484 PART II Applications REFERENCES [1] Gibney MJ, Walsh M, Brennan L, Roche HM, German B, van Ommen B Metabolomics in human nutrition: opportunities and challenges Am J Clin Nutr 2005;82:497–503 [2] Savorani F, Rasmussen MA, Mikkelsen MS, Engelsen SB A primer to nutritional metabolomics by NMR spectroscopy and chemometrics Food Res Int 2013 In Press http://dx doi.org/10.1016/j.foodres.2012.12.025 (http://www.sciencedirect.com/science/article/pii/ S0963996912005480) [3] Capozzi F, Placucci G Preface, In: 1st International conference in Foodomics, Cesena, Italy, 2009 [4] Cifuentes A Food analysis and foodomics foreword J Chromatogr A 2009;1216:7109 [5] Johnels D, Edlund U, Grahn H, Hellberg S, Sjostrom M, Wold S, et al Clustering of aryl C-13 nuclear magnetic-resonance substituent chemical-shifts—a multivariate data-analysis using principal components J Chem Soc Perkin Trans 1983;2:863–71 [6] Nicholson JK, Lindon JC, Holmes E ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data Xenobiotica 1999;29:1181–9 [7] Nørgaard L, Saudland A, Wagner J, Nielsen JP, Munck L, Engelsen SB Interval partial least squares regression (iPLS): a comparative chemometric study with an example from the near infrared spectroscopy Appl Spectrosc 2000;54:413–9 [8] Larsen FH, van den Berg F, Engelsen SB An exploratory chemometric study of H-1 NMR spectra of table wines J Chemom 2006;20:198–208 [9] Duarte I, Barros A, Belton PS, Righelato R, Spraul M, Humpfer E, et al High-resolution nuclear magnetic resonance spectroscopy and multivariate analysis for the characterization of beer J Agric Food Chem 2002;50:2475–81 [10] Lopez-Rituerto E, Savorani F, Avenoza A, Busto JH, Peregrina JM, Engelsen SB Investigations of La Rioja Terroir for wine production using H-1 NMR metabolomics J Agric Food Chem 2012;60:3452–61 [11] Pearce JTM, Athersuch TJ, Ebbels TMD, Lindon JC, Nicholson JK, Keun HC Robust algorithms for automated chemical shift calibration of 1D H-1 NMR spectra of blood serum Anal Chem 2008;80:7158–62 [12] Van den Berg F, Tomasi G, Viereck N Warping: investigation of NMR pre-processing and correction In: Engelsen SB, Belton PS, Jakobsen HJ, editors Magnetic resonance in food science: the multivariate challenge Cambridge: RSC Publishing; 2005 p 131–8 [13] Beckonert O, Keun HC, Ebbels TMD, Bundy JG, Holmes E, Lindon JC, et al Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts Nat Protoc 2007;2:2692–703 [14] Spraul M, Neidig P, Klauck U, Kessler P, Holmes E, Nicholson JK, et al Automatic reduction of NMR spectroscopic data for statistical and pattern-recognition classification of samples J Pharm Biomed Anal 1994;12:1215–25 [15] Powers R NMR metabolomics and drug discovery Magn Reson Chem 2009;47:S2–S11 [16] Craig A, Cloarec O, Holmes E, Nicholson JK, Lindon JC Scaling and normalization effects in NMR spectroscopic metabonomic data sets Anal Chem 2006;78:2262–7 [17] Jellema R Variable shift and alignment In: Brown SD, Tauler R, Walczak B, editors Comprehensive chemometrics, vol Amsterdam: Elsevier; 2009 p 85e108 [18] Hibbert DB Genetic algorithms in chemistry Chemom Intell Lab Syst 1993;19:277–93 [19] Leardi R, Nørgaard L Sequential application of backward interval partial least squares and genetic of relevant spectral regions J Chemom 2004;18:486–97 Chapter 12 i-Chemometrics 485 [20] Wehrens R, Putter H, Buydens LMC The bootstrap: a tutorial Chemom Intell Lab Syst 2000;54:35–52 [21] Nielsen NPV, Carstensen JM, Smedsgaard J Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping J Chromatogr A 1998;805:17–35 [22] Tomasi G, van den Berg F, Andersson CA Correlation optimized warping and dynamics time warping as pre-processing methods for chromatographic data J Chemom 2004;18: 1–11 [23] Savorani F, Tomasi G, Engelsen SB icoshift: a versatile tool for the rapid alignment of 1D NMR spectra J Magn Reson 2010;202:190–202 [24] Tomasi G, Savorani F, Engelsen SB icoshift: an effective tool for the alignment of chromatographic data J Chromatogr A 2011;1218:7832–40 [25] Eilers PHC Parametric time warping Anal Chem 2004;76:404–11 [26] Wong JWH, Durante C, Cartwright HM Application of fast Fourier transform crosscorrelation for the alignment of large chromatographic and spectral datasets Anal Chem 2005;77:5655–61 [27] Veselkov KA, Lindon JC, Ebbels TMD, Crockford D, Volynkin VV, Holmes E, et al Recursive segment-wise peak alignment of biological H-1 NMR spectra for improved metabolic biomarker recovery Anal Chem 2009;81:56–66 [28] Picone G, Mezzetti B, Babini E, Capocasa F, Placucci G, Capozzi F Unsupervised principal component analysis of NMR metabolic profiles for the assessment of substantial equivalence of transgenic grapes (Vitis vinifera) J Agric Food Chem 2011;59:9271–9 [29] Wold S, Martens H, Wold H The multivariate calibration-problem in chemistry solved by the PLS method Lect Notes Math 1983;973:286–93 [30] Winning H, Roldan-Marin E, Dragsted LO, Viereck N, Poulsen M, Sanchez-Moreno C, et al An exploratory NMR nutri-metabonomic investigation reveals dimethyl sulfone as a dietary biomarker for onion intake Analyst 2009;134:2344–51 [31] Kristensen M, Savorani F, Ravn-Haren G, Poulsen M, Markowski J, Larsen FH, et al NMR and interval PLS as reliable methods for determination of cholesterol in rodent lipoprotein fractions Metabolomics 2010;6:129–36 [32] Savorani F, Kristensen M, Larsen FH, Astrup A, Engelsen SB High throughput prediction of chylomicron triglycerides in human plasma by nuclear magnetic resonance and chemometrics Nutr Metab 2010;7:43 [33] Sta˚hle L, Wold S Partial least squares analysis with cross-validation for the two-class problem: a Monte Carlo study J Chemom 1987;1:185–96 [34] Barker M, Rayens W Partial least squares for discrimination J Chemom 2003;17:166–73 [35] Ferrari E, Foca G, Vignali M, Tassi L, Ulrici A Adulteration of the anthocyanin content of red wines: perspectives for authentication by Fourier transform-near infrared and H-1 NMR spectroscopies Anal Chim Acta 2011;701:139–51 [36] Rasmussen LG, Winning H, Savorani F, Ritz C, Engelsen SB, Astrup A, et al Assessment of dietary exposure related to dietary GI and fibre intake in a nutritional metabolomic study of human urine Genes Nutr 2012;7:281–93 [37] Larsen FH, Jorgensen H, Engelsen SB, Laerke HN Metabolic profiling of lymph from pigs fed with beta-glucan by high-resolution H-1 NMR spectroscopy Livest Sci 2010;133:38–41 [38] Javidnia K, Parish M, Karimi S, Hemmateenejad B Discrimination of edible oils and fats by combination of multivariate pattern recognition and FT-IR spectroscopy: a comparative study between different modeling methods Spectrochim Acta A Mol Biomol Spectrosc 2013;104:175–81 486 PART II Applications [39] Westerhuis JA, van Velzen EJJ, Hoefsloot HCJ, Smilde AK Discriminant Q(2) (DQ(2)) for improved discrimination in PLSDA models Metabolomics 2008;4:293–6 [40] Zweig MH, Campbell G Receiver-operating characteristic (ROC) plots—a fundamental evaluation tool in clinical medicine Clin Chem 1993;39:561–77 [41] Nørgaard L, Bro R, Westad F, Engelsen SB A modification of canonical variates analysis to handle highly collinear multivariate data J Chemom 2006;20:425–35 [42] Nørgaard L, Soletormos G, Harrit N, Albrechtsen M, Olsen O, Nielsen D, et al Fluorescence spectroscopy and chemometrics for classification of breast cancer samples—a feasibility study using extended canonical variates analysis J Chemom 2007;21:451–8 [43] Picone G, Engelsen SB, Savorani F, Testi S, Badiani A, Capozzi F Metabolomics as a powerful tool for molecular quality assessment of the fish Sparus aurata Nutrients 2011;3: 212–27 [44] Savorani F, Picone G, Badiani A, Fagioli P, Capozzi F, Engelsen SB Metabolic profiling and aquaculture differentiation of gilthead sea bream by (1)H NMR metabonomics Food Chem 2010;120:907–14 [45] Næs T, Tomic O, Mevik BH, Martens H Path modelling by sequential PLS regression J Chemom 2011;25:28–40 [46] Ma˚ge I, Menichelli E, Næs T Preference mapping by PO-PLS: separating common and unique information in several data blocks Food Qual Prefer 2012;24:8–16 [47] Smilde AK, Westerhuis JA, de Jong S A framework for sequential multiblock component methods J Chemom 2003;17:323–37 [48] Westerhuis JA, Kourti T, MacGregor JF Analysis of multiblock and hierarchical PCA and PLS models J Chemom 1998;12:301–21 [49] Westerhuis JA, Hoefsloot HCJ, Smit S, Vis DJ, Smilde AK, van Velzen EJJ, et al Assessment of PLSDA cross validation Metabolomics 2008;4:81–9 [50] Skov T, Engelsen SB Chemometrics, mass spectrometry, and foodomics In: Cifuentes A, editor Foodomics: advanced mass spectrometry in modern food science and nutrition New York: Wiley; 2013 p 507–34 Index Note: Page numbers followed by “f ” indicate figures, and “t” indicate tables A Aceto Balsamico Tradizionale di Modena (ABTM) see Ageing process, ABTM Ageing process, ABTM final sensation, 398 GC signals, 397 N-PLS, 399–402 product, distribution, 398 regression model, 397 unfold-PLS, 398–399 volatile fraction, 398 wooden casks capacities, producers, 397, 397t ALS see Alternating least squares (ALS) Alternating least squares (ALS) MCR–ALS, 239, 240 PARAFAC iterations, 271, 272 ATR-MIR spectroscopy see Attenuated total reflectance mid-infrared (ATR-MIR) spectroscopy Attenuated total reflectance mid-infrared (ATR-MIR) spectroscopy, 384 Authenticity, food ageing process, ABTM, 397–403 EVOO see Extra virgin olive oil (EVOO) grape juice heating process, quality control, 390–396, 391t PARAFAC analysis see Parallel factor (PARAFAC) analysis PCA see Principal component analysis (PCA) and traceability analytical strategies, 374–375, 376 chemometrics technique, 375 EU, 373–374 geographical models, 375 B Binning vs interval approach, 456–458, 457f Block chemometric methods chemical compounds, 481 correlations, iPCA scores, 481, 482f C Canonical variates analysis see Extended canonical variates analysis (ECVA) C/E methods see Compression/expansion (C/E) methods Central composite design circumscribed, 30, 31f classifier, 30 degrees of freedom, 31, 32f experimental error, 32–33 matrix, 29 face-centred design, 29–30, 30f indirect estimation, 34 isoresponse plots, 31, 33f leverage plot, experimental design, 33–34, 34f linear regression, 31 mathematical characteristics, 29 overlapped plots, 37 quadratic behaviour, 29 response surface, 31–32 semi-amplitude, 35 shaped ellipse, 36 Chemometrics authenticity see Authenticity, food bilinear modelling techniques, 1–2 food analysis and control, 1–2 and HSI analysis applications, foodstuff, 346, 347t hypercubes, 352–353, 352f pre-processing methods, 353–355 regression see Regression supervised techniques, 358–362 unsupervised techniques, 355–358 methods, 345–346 molecular profiling, food, 1–2 techniques, 1–2 traceability see Traceability, food Classification methods categorization, 173 chemometric modelling, 172 class-modelling methods, 215–230 definition, 174–176 discriminant methods, 176–215 487 488 Classification methods (Continued ) food quality/authentication, 171–172 hyperspace, variables, 173, 174f individual algorithms, 172 linear and non-linear, 173–174, 175f measured variables, 173–174 Class-modelling methods potential functions density-based methods, 225 equivalent determinant, 229–230 sample percentile, 227–229 SIMCA, 215–224 UNEQ, 224–225 Cluster analysis methods, EDA agglomerative, 117–118 algorithms, 117 Euclidean distance, 114–116 HCA, 118–119, 119f partition and hierarchical methods, 117–118 projection methods, 117 Compression/expansion (C/E) methods, 462 D Data mining (modelling) arguments, i-chemometrics interferences, 455 model performances and interpretability, 455–456 parsimonious models, 455 variables, 455 Data pretreatments, PCA column animal feed, NIR spectroscopy, 80, 80f balsamic vinegar, ageing, 81, 82f block-scaling, 84, 84f data centring and scaling, 80 data set FlourRheoData, 84, 85f dispersion, data, 82–83 food data analysis, 81 non-negativity constraints, 81 subjective weights, 83 type, weight, 81 row baseline correction methods, 89, 89f chemical variations, 90 column mean centring, 94, 97f computation, signal derivatives, 89 derivative spectra, 90, 92f EMSC, 91 filtering methods, 87 interferant pure spectrum, 91–94 Index linear/curvilinear offset, 88 linear fitting, 91 model-based, 90 MSC, 90 normalization, 87–88 OSC and OPLS, 94 pretreatments, spectral, 94, 95f signal preprocessing, 87 simulation, data, 90, 91f SNV, 88 spectral interpretation, 89 time series data, 87 variants, baseline correction, 88–89 Density-based methods adjustable parameter, 208–209 Bayes’ theorem, 210–211 Gaussian functions, 208–209 individual and cumulative potentials, 209, 210f measurement vector, 210–211 multivariate space, 208–209 potential function method, 211, 212f smoothing parameter, 211, 213f superposition, individual contribution, 208–209 Descriptive statistics box and whisker plots, 61–62 frequency histograms discrete variables, 59–60, 60f pixels intensity, image, 59 skewness, distribution, 59–60, 60f graphical tools, 58 log transform, 58–59 visualization tools, 58 Discriminant analysis Bayes’ theorem, 176 definition, 176 density-based methods, 208–215 ECVA, 188–195 error rate, 176–177 kNN, 205–207 LDA and QDA, 177–188 nonlinearity model, 215 PLS-DA see Partial least squaresdiscriminant analysis (PLS-DA) posterior probability, 176 Doehlert design experimental matrix, 39 sequentiality, 39, 40f variables, 38, 39f D-optimal designs experimental variance, 41 489 Index fluidized bed combustor, 40 normalized determinant vs number of experiments, 42, 42f ‘repair’ data matrices, 43 variables, 40, 41t E ECVA see Extended canonical variates analysis (ECVA) EDA see Exploratory data analysis (EDA) EEM see Excitation–emission (EEM) fluorescence spectroscopy EFA see Evolving factor analysis (EFA) Elliptical principal component analysis (ePCA), 326 EM see Expectation maximisation (EM) EMDA see Exploratory multivariate data analysis (EMDA) Evolving factor analysis (EFA), 240 EVOO see Extra virgin olive oil (EVOO) Excitation–emission (EEM) fluorescence spectroscopy fluorophores, 273f hyphenated chromatographic systems, 269 landscape, vinegars, 277f measurements, 270 variables, 269 Expectation maximisation (EM), 334–336 Experimental design central composite design, 29–38 Doehlert design, 38–40 D-optimal designs, 40–43 factorial design 2k, 12–24 mathematical model, 10–11 mixture designs, 46–52 pH, 10 Plackett–Burman designs, 24–28 postulated model, 11 qualitative variables computation, interactions, 45 D-optimal design, 45 experimental and model matrix, 43, 44t linear effects, 46 model matrix, 44, 44t racing, 10 waxing, 10 Exploratory data analysis (EDA) analytical chemistry, 57 clustering techniques, 114–119 Copernican revolution, 57 descriptive statistics, 58–62 distribution-free approach, 56–57 EMDA, 57–58 food processing, 55 projection technique, 62–114 reductionism, 119 robust and nonparametric statistical concepts, 56 statistical significance, 56–57 system complexity, 55 techniques and tools, 57 technological tools, 55–56 visual graphs, 56 Exploratory multivariate data analysis (EMDA) chemometrics, 94 data pretreatment, 78–79 data reduction, 63 data structure, 57–58 PCA, 72–73 statistical inference, 73 uni and oligovariate EDA, 62 Extended canonical variates analysis (ECVA) canonical variates, 191 data set, 192, 192f eigenvector, 190 feature-reduced data set, 194–195 matrix, weight, 193 PLS models, 193 regression, 191 and TIC, 192 training data, 194, 194f two-classes case, 190–191 weights, 195, 196f Extra virgin olive oil (EVOO) analysis, volatile fraction, 403 categorization, samples, 406 definition, 403 GC profiles, Liguria volatile fraction, 404, 405f instrumental signals, 404 Liguria class, 407 SIMCA, 405–406, 406t training and test sets, 404, 405f F Factorial design 2k application, 13 chromatographic peak, 24 coefficients, polymer, 16–17, 16t diagonal dispersion, 22 dispersion matrix, 21–22, 22t 490 Factorial design 2k (Continued ) effects, supersaturation, 23 experimental standard deviation, 23–24 experimental variable, 19 formulation, 18 geometrical representation, 12, 13f geometrical shape, 18 isoresponse plot, 17–18, 18f linear models, 19 linear terms, 23 mathematical model, 13 model matrix, 21–22, 21t NASA, 20, 20t polymer, experimental plan, 14, 14t quantitative/qualitative, 12 spatial representation, 15, 16f speed, reaction, 18 systematic effects, 14 variance, experimental, 22–23 viscosity, 15, 15t Fixed size image window-evolving factor analysis (FSIW-EFA), 346, 356 Food Authenticity see Authenticity, food data analysis EMDA, 94 explorative analysis phase, 101 food science analysis, 103 industrial bread-making production, 98 NIRdoughRising data set, 103–104 NIR signal, 101 PCA, data set FlourRheo, 98–100, 99f seasonal effects, 98–100 fingerprinting compositional description, 248 2D fluorescence measurements, 251 elution misalignment, 251 hyphenated separation technique, 251 instrumental responses, 248–251 LDA, 251–252 qualitative (structural) information, 251 sensory and nutritional properties, 247–248 processes, MCR carotenoid stability, 258 data matrix augmentation, 257, 257f evolution, 257–258 hyphenated separation technique, 258 monitoring, 257–258 natural sensory characteristics, 253 photodegradation experiments, 257 properties, 253 spectroscopic monitoring, 253–257 Index traceability see Traceability, food Food science and authentication, Fourier transform infrared (FTIR), 241 FSIW-EFA see Fixed size image windowevolving factor analysis (FSIW-EFA) FTIR see Fourier transform infrared (FTIR) G GA see Genetic algorithms (GA) Gas chromatographic (GC) signals, 397, 398, 399 GC signals see Gas chromatographic (GC) signals Genetic algorithms (GA), 165–167 Genetically modified organisms (GMO) iPCA, grape cultivars data clssification, regions, 467–469, 468f and PCA, 469 Silcora lines, 475–476 GMO see Genetically modified organisms (GMO) H HCA see Hierarchical cluster analysis (HCA) Hierarchical cluster analysis (HCA), 118–119, 119f High-performance liquid chromatography with diode array detection (HPLC–DAD), 236 HPLC–DAD see High-performance liquid chromatography with diode array detection (HPLC–DAD) HSI see Hyperspectral imaging (HSI) Hyperspectral imaging (HSI) and chemometrics see Chemometrics objectives, 350 quality assessment, 344–345, 344f, 346f structure, 350–351, 351f I Icoshift, data alignment automated procedure, 463–464 binning procedures, 462 C/E methods, 462 foodomics analytical platforms, 462 I/D based methods, 462–463 NMR data set, signal alignment, 464, 465f preprocessing, 463 resonance frequencies, 461–462 workflow, 463, 464f 491 Index I/D methods see Insertion/deletion (I/D) methods iECVA see Interval extended canonical variates analysis (iECVA) Insertion/deletion (I/D) methods, 462–463 Interval-based chemometric methods, NMR foodomics and binning see Binning vs interval approach block chemometric methods, 481–482 data mining arguments, 454–456 definition, 458 and global model performances, 460–461 icoshift, data alignment, 461–465 iECVA, 477–480 iPCA, 465–469 iPLS see Interval partial least squares (iPLS) iPLS-DA see Interval Partial Least Squares Discriminate Analysis (iPLS-DA) model’s performance backward elimination, 459–460 forward selection, 459 requirements, bilinear models, 461 spectral regions, 451–454 Interval extended canonical variates analysis (iECVA) analysis, data set, 477, 477f metabolomic approach, 478–480, 480f Interval partial least squares (iPLS) calibration, lactic acid content actual vs predicted plot, validated concentrations, 472–473, 473f “dumb” interval partitioning, 474 table wines analysis, 1H NMR, 470–471, 472f dependent variables, 165 RMSECV, 483 Interval partial least squares discriminant analysis (iPLS-DA) data set, 474, 474f metabolomic differences, Silcora GMO lines, 475–476, 476f RMSECV, 474–475 Interval principal component analysis (iPCA) data set, 466, 466f exploratory data analysis, GMO grape cultivars, 466–469 iPCA see Interval principal component analysis (iPCA) iPLS see Interval partial least squares (iPLS) iPLS-DA see Interval partial least squares discriminant analysis (iPLS-DA) K k Nearest neighbours (kNN) classification, 205, 205f error, classification, 207, 208f Euclidean distance, 206 Gamma function, 206–207 optimization procedure, 206 and QDA, 207 kNN see k Nearest neighbours (kNN) L Latent variables calibration, 163–164 definition, 146 plots, RMSE, 155 PLS, 138–139 LDA see Linear discriminant analysis (LDA) Linear discriminant analysis (LDA) classic estimators, location and covariance, 334 definition, 177 and HMF, 184 honey samples, 184, 185f individual matrices, 177–178 low-dimensional space, 184 multidimensional space, 177 multiple classes categorization, hypersurface, 182 classification, 182, 182f covariance matrix, 183 eigenvectors, 183 hyperspace, 180–182 single canonical variate, 180–182, 181f linear surfaces, 182 and PLS-DA, 360–362, 361f probability functions, 184 and QDA see Quadratic discriminant analysis (QDA) two classes bi-dimensional space, 178, 178f canonical variate, 180, 181f class A and B, 179 2D space division, 180 linear combination, 179 logarithm, probabilities, 179 and PLS, 180 separation, classes, 180 types, robust estimators, 334 variance/covariance matrices, 177 492 M Maximum likelihood principal components analysis (MLPCA), 106–107 MCR see Multivariate curve resolution (MCR) Mid-infrared (MIR) spectroscopy classification models, 388, 389t and NIR, 384 MIR spectroscopy see Mid-infrared (MIR) spectroscopy Mixture designs arbitrary units, 47–48 biunivocal correspondence, 47 coefficients, two-term interactions, 49 effect, variation, 47 equilateral triangle, 47 hyperplanes, 51 implicit constraint, 47 isoresponse plot, 49, 50f joint acceptability, 51, 52f model, second response, 49 surface, isoresponse, 51 three-component mixture, constraints, 52, 53f MLPCA see Maximum likelihood principal components analysis (MLPCA) MLR see Multiple linear regression (MLR) Multi-block methods definition, 481 interval methods, 483 objectives, 481–482 Multilinear PLS emission spectra, 302–303 excitation and emission weights, 303, 304f independent and dependent arrays, 298–299 new samples prediction, 301–302 N-PLS algorithm multivariate Y (N-PLS2), 300–301 N-PLS weights, landscapes, 305, 306f single y (N-PLS1), 299–300 pairwise maximal covariance, 298–299 predicted vs reference ash, 305, 307f regression coefficients, 305, 307f sugar data set, 303, 303f sugar production, 302 Multiple linear regression (MLR) calibration model, 154, 161–163 definition, 135 and PCR, 143 predicted value, response, 153 Index Multivariate calibration chemical compounds, wavelengths, 130, 130f food chemistry, 129 objectives, 130–131 Multivariate curve resolution (MCR) algorithms, 237 analytical techniques, 235–236 Beer–Lambert law, 236–237 bilinear model, measurement, 237 chemometric tools, 258 constraints, MCR–ALS, 237, 238f EFA, 240 food chemistry, composition, 235 and food fingerprinting, 247–253 HPLC–DAD, 236, 236f HPLC–MS, 237–239 hyphenated chromatographic technique, 236 instrumental response, 236 iterative methods, 237 mathematical constraints, 240 MCR–ALS, 239 multicomponent systems, 258 multivariate resolution methods, 240 peak shifting/broadening, 239 processes, food, 253–258 scheme, 237–239, 238f sensory properties, 258 spectroscopic measurements, 235–236 SVD, 240 Multivariate food chemistry covariance and multivariate location and scatter, 322–323 data elements, 317 EM, 334–336 food production process, 316 identification, outlying sample, 332–333 instrumental techniques, 316 iterative approach, 329–330 LDA, 333–334 matrix, data, 329 model errors, 330 multivariate modelling techniques, 329 optimal process conditions, 316 PLS regression estimator, 329 PRM, 330–331 robust modelling, data variance, 323–328 robust multivariate statistics, 317–323 RSIMPLS and RSIMCD, 331 SIMCA, 334 493 Index spatial sign preprocessing and robust PLS, 331–332 Multiway covariate regression factor model, 305–308 independent array, 305–308 least squares criterion, 305–308 PARAFAC decomposition, 308 Tucker3 decomposition, 308 Multiway methods excitation-emission fluorescence, 266 higher-order arrays, 266 hyphenated chromatography, 266 mathematical and statistical tools, 266 nomenclature and notation EEM, 269 Hadamard, Kronecker and Khatri-Rao products, 267–269, 268f higher-order arrays, 267 linear relationship, 269 PARAFAC, 270–280 three-way array, 266, 267f PARAFAC see Parallel factor analysis (PARAFAC) PARAFAC 2, 281–287 regression covariate regression see Multiway covariate regression multilinear PLS see Partial least squares (PLS) two-way model, 298 robust and rapid methodologies, 309 second-order property, 309 Tucker models, 287–298 N Near infrared (NIR) spectroscopy classification models Emilia Romagna class, 385, 387t, 388t PLS-DA, 387–388, 388t selection, variables, 386 SIMCA, 385 HSI system, 360 non-commercial, 356 and MIR spectra, 384 PCA model, cheese, 357, 357f Sapphire Matrix, 356 NIR spectroscopy see Near infrared (NIR) spectroscopy NMR-based metabolomics acquisition and processing parameters aqueous extract, mozzarella cheese, 415–416, 416f automated procedures, 418 post-processing, 417–418 signal intensities, 417 advantages, 413 definition, 411–412 “foodomics”, 412 foodstuff, 412 QC see Quality control (QC), food sample preparation analysis, high-resolution, 413–414 liquid foodstuffs, 415 procedures, 414, 414f quenching, 414 sensitivity, 413 targeted analysis and pattern recognition, 418–420 NMR spectroscopy see Nuclear magnetic resonance (NMR) spectroscopy Nuclear magnetic resonance (NMR) spectroscopy fingerprint, 83 spectral regions absolute intensity, 452 baseline noise, 454 chemical quality content, 451–452, 451f horizontal shift, 453–454, 453f peak density, 453 N-way partial least squares (N-PLS) see Multilinear PLS O OLS regression see Ordinary least squares (OLS) regression OPLS-DA see Orthogonal partial least squares discriminant analysis (OPLS-DA) Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) and PCA, 432 score plots, 433 Ordinary least squares (OLS) regression coefficients, vector, 134–135 independent variables, 134 MLR, 135, 136 single-y case, 135 values, responses, 135–136 Orthogonal signal correction (OSC), 442 OSC see Orthogonal signal correction (OSC) 494 Index P correlation structure, dependent block, 143 extraction, component, 142 factors, 141–142 single-y case, 142 N-PLS, 399–402, 402f and N-PLS, 298–305 single-y variable coefficient matrix, 140 components, 140 dependent and independent blocks, 139 PCA loadings and calculation, 139–140 unfold-PLS efficiency, 403 LVs, 399 regression coefficients, 394f, 399 RMSEP-LOP, 399, 400f Partial least squares-discriminant analysis (PLS-DA) chromatogram, 203 classification, 199 components, 199–200, 201f dependent and independent matrices, 197–198 dependent matrix, 197 and HPLC-CAD, 198 independent variables, 197–198 and LDA, 360–362, 361f linear functional relation, 197 oil samples, 201–203, 202f olive and oil samples, 199–200 PLS-DA and SIMCA, 198, 199f regression coefficients, 203 scores plot, 200 and SIMCA, 389–390 test and training sets, Emilia Romagna class, 388, 388t, 389t training and test, 199, 200f training set, classification ability, 387–388 vectors, regression, 203, 204f and VIP, 201–203 Partial robust M-regression (PRM) application, 331 global weights, 331 leverage and residual weights, 330 PCA see Principal component analysis (PCA) PCovR see Principal covariate regression (PCovR) PCR see Principal component regression (PCR) PDO see Protected designation of origin (PDO) Plackett–Burman designs PARAFAC see Parallel factor (PARAFAC) analysis PARAFAC analysis see Parallel factor (PARAFAC) analysis Parallel factor (PARAFAC) analysis classification, vinegars excitation and emission, 280 first-factor vs second-factor relative concentrations, 280, 280f PLS, 280 constraints, 275–276 data array, 270–271, 271f, 393–394, 394f decomposition method, 270 and EEM, 270 F1 vs F2 loadings, 395, 396f iterations, ALS see Alternating least squares (ALS) loading matrices, 270–271 PARAFAC2 analytes co-eluting measurement, 281, 281f chromatographic three-way structure, 282 cross-product constraint, 282 elution profile, 281 food-related fields, 281–282 in food research, 285–287, 286f vs PARAFAC, 283–285 resemblances and dissimilarities, 283–285 shape/length, 282 two-factor, data array, 282, 283f physico-chemical behaviour, 272 second-order advantage, 272–273 selection EEM landscape, 276, 277f excitation and emission spectral loadings, 278, 279f unconstrained model, 278 variance and core consistency, 272, 278t selection, model, 276–279 signal-to-noise ratio, 274 split-half analysis, 274 squres, elements, 275 synergistic combination, 395 trilinear data, 273 Tucker3 model, 274 two-factor model, 275 unfolded array, 271 Partial least squares (PLS) algorithm, 298 bilinear model, 138–139 multiple responses Index coefficients, model, 27–28, 28f constant and linear terms, 26–27 diagonal matrix, 26–27 experimental matrix, 25–26, 26t factorial design, 24 plot, coefficients, 27–28, 28f quantitative variables, 24–25, 25t screening design, 28 types, design, 28 univariate approach, 25–26 variables, 24, 25t PLS see Partial least squares (PLS) PLS-DA see Partial least squares-discriminant analysis (PLS-DA) Predicted residual sums of squares (PRESS), 155 Pre-processing methods, HSI and chemometrics, 353–355, 354f PRESS see Predicted residual sums of squares (PRESS) Principal component analysis (PCA) algebraic property, 69–71 autoscaling procedure, 69 bilinear decomposition/projection technique, 64 biplots, 69, 70f, 71–72 chemometrics and HSI analysis adaptation and application, hypercubes, 355, 356f applications, foodstuffs, 347t, 355 disadvantage, 356 loss, spatial information, 356 unsupervised classification, almonds, 358, 359f water detection, cheese, 357, 357f and cluster analysis, 430 coordinate analysis and multidimensional scaling, 111–113 covariance, 69 cumulative variance and eigenvalue ratio plot, 73, 75f data matrix, 65 pretreatments see Data pretreatments, PCA structure, 73 variation/patterns, 64 derivation, algorithms, 71, 71f description, 64 Euclidean distance, 72 evaluation, eigenvalues, 74–75 EVD, 71 495 Flour-Rheo data set and contribution plots, 78, 79f food data, 94–104 F2-selective TOCSY spectra, rice wines, 437, 438f geometry, 65, 66f hyper plane, 76–77 and iPCA, 469 mathematical formulation, 65–69 maximum variance directions, 65–69 measurement errors, 106 metabolites, 425–426, 425f MLPCA, 106–107 multivariate explorative tool, 64 multivariate statistical process control, 104 NIR, 73–74 orthogonal projection, 65 PC1 loadings vs variables, 393, 394f PCs, 76–77, 77f projection pursuit and independent component analysis, 109–110 samples, monitored cooking process, 391t, 392, 393f scores, barley and wheat beer samples, 428, 428f self-organizing Kohonen’s maps, 113–114 squared elements, 77 statistical inference, 73 SVD, 71 symmetric scaling, 72 unsupervised, 439–440 wine, different cultivars, 65–69, 67f Principal component regression (PCR) definition, 136–137 disadvantage, 138 independent matrix, 137 MLR, 137–138 Principal covariate regression (PCovR), 143–144 PRM see Partial robust M-regression (PRM) Projection technique chromatograms/spectroscopic measurements, 63 food production chain, 63 multivariate exploratory tools, 62 multivariate screening tools, 64 PCA, 64–108 thermal degradation/ageing, 63 Protected designation of origin (PDO) graphical traceability models, 376 and PGI oenological products ATR-MIR spectroscopy, 384 classification models, 385–388 496 Index SIMCA modelling, 434 definition, 420–421 estimation of origin, orange juice sample, 423–424, 423f geographical origin and authentication beer characterization, 427 black tea leaves, Sri Lanka, 430–431 Chinese and non-Chinese tea scores, 430 German beer legislation, 429 in-house Matlab macroprogram, 429–430 Japanese green tea, 430 Ligurian samples, 431–432 metabolomics, 426, 429 OPLS-DA model, 431, 433 pattern recognition methods, 427 scores grouping, PCA, 427–428, 428f training and test sets extraction, citrate buckets, 432 intra-factory, 421 kiwifruits, 424 “nutraceuticals” and “functional foods”, 421–422 PCA, metabolites, 424–425 PLS model, 424–425 and processing ageing process, balsamic vinegar of Modena, 439–440 alcoholic fermentation and ageing, must and wine, 440–441 coffee beans roasting, 441 evaluation, proprietary products, 440 fermentation, 437–438 irradiation, 441–442 OSC, 442 soy sauces, 438 types, Daqu, 439 unsupervised PCA, rice wines, 437, 438f SGF ProfilingTM, 422 untargeted approach, 422–423 vegetable/animal-derived products, 421 Protected designation of origin (PDO) (Continued ) Emilia Romagna class, 388–389, 389t geographical origin and grape varieties, 384, 385t raw and pre-processed NIR spectra, 384–385, 386f Savitzky–Golay smoothing, MIR spectra, 385, 387f topping up procedure, 384 Protected geographical indication (PGI) see Protected designation of origin (PDO) Q QDA see Quadratic discriminant analysis (QDA) Quadratic discriminant analysis (QDA) arbitrary classes, 187 bi-variate Gaussian distribution, 184–185, 186f chemical parameters, 187–188 classification rules, 185–187 natural logarithm, 185–187 orientation and volume, 188 variance/covariance matrices, 184–185 wine samples, 187–188, 189f Qualitative and quantitative analysis, food compounds analytical determinations, 246–247 calibration strategy, 245 chromatographic, 246–247 classical integration methods, 246–247 FTIR, 241 hyphenated separation techniques, 244–245 MCR techniques, 241, 242t multiway resolution methods, 244 nutritional/sensory properties, 241 PARAFAC, 244 PLS, 241 polyphenol compounds, 246–247, 247f scale, concentration profiles, 244 signal-to-concentration ratio, analytes, 245 Quality control (QC), food adulteration and safety chemical composition, 435 chemometric analysis, 435–436 classification, 433–434 high-throughput sample screening techniques, 436–437 NMR fingerprinting and multivariate analysis techniques, 436 R Red-green-blue (RGB) systems, 360, 361f Region of interest (ROI), 353 Regression coefficients, 157–158 correlation and causality, 128–129 cross model validation, 168 definition, 127–128 GA, 165–167 influence analysis, 153–154 iPLS, 165 497 Index least squares loss function, 131–132 loading weights, 158–159 measurements, error cross-validation, 154–155 local regression models, 156 MLR calibration model, 154 PRESS, 155 RMSE, calibration and validation, 155–156, 155f RMSEP, 155 uncertainty, prediction, 156 model-based variable importance, 163–165, 166f modelling, HSI, 357, 363f, 364f multivariate calibration, 129–131 multivariate generalization, OLS see Ordinary least squares (OLS) regression parameter and diagnostics, 163 PCovR, 143–144 PCR see Principal component regression (PCR) plots, outliers detection influence plot, 161, 162f leverage/Hotellings’ T2, 159–160, 161f predicted vs reference plot, 159 residual plots, 159, 160f RMSE/explained variance, Y-variables, 159 PLS see Partial least squares (PLS) re-sampling methods, 167–168 residual analysis distribution, squared residuals, 149, 151f distribution, X-residuals, 147–149, 148f individual predictors/response variables, 151 ‘spectral-like’ plot, X-residuals, 149–151, 152f univariate linear, 132–133 validation see Validation procedure, regression variable selection, 161–168 X-loadings, 158 X-score, 158 Y-loadings, 158 Re-sampling methods, regression, 167–168 RGB systems see Red-green-blue (RGB) systems RMSE see Root mean square error (RMSE) RMSECV see Root mean square error of cross validation (RMSECV) RMSEP see Root mean square error in prediction (RMSEP) ROBPCA see Robust principal component analysis (ROBPCA) Robust multivariate statistics classic estimators, 317–318 covariance and multivariate location, 322–323 Gaussian distribution, 321 influence functions, 317–318, 318f L1-median, 320, 320f location and scatter, distribution, 319 MAD, 317–318 regression estimators, 318–319 sample median, 319–320 Robust principal component analysis (ROBPCA) algorithmic steps, 328–329 Croux and Ruiz-Gazen algorithm, 327, 327f optimization scheme, 326–327 outlyingness measure, 328 projection index, 326–327 Robust principal components (RPCs) see Robust principal component analysis (ROBPCA) ROI see Region of interest (ROI) Root mean square error (RMSE) calibration and validation, 155–156, 155f definition, 154 MLR calibration model, 154 plots, 155 Y-variables, 159 Root mean square error in prediction (RMSEP) vs precision, reference method, 155 test-set validation, 154 and validation residual variance, 145–146 Root mean square error of cross validation (RMSECV) iPLS-DA plot, 475, 476f PLS-DA, 476 regression models, 460–461 S Selectivity ratio (SR) advantages, 165 definition, 165 and VIP, 163–164 SIMCA see Soft independent modelling of class analogies (SIMCA) Singular value decomposition (SVD) biplots, 71–72 decomposition method, 106–107 498 Soft independent modelling of class analogies (SIMCA) augmented distance, 217–218 classification methods, 334 component model, 215–216 degrees of freedom, 219–221 Mahalanobis distance, 219–221 MIR-based, 388 oil samples, 221–223, 223f ‘olive oils’, 221, 222f orthogonal and score distances, 217, 218f PLS-DA, 221 reduction, variables, 219–221 residual standard deviation, 219 score distance, 218–219 specificity, 385 standard deviation, 215–216 vector, residuals, 216–217 Soil sampling planning, food traceability evaluation, soil variability, 379–383, 382f, 383f parameters, 376–377 PCA, 377 selection, representative sampling sites D-optimal Onion design, 379 five and three-layer onion designs, 379 PCs, 379, 380f, 381f plots, G-efficiency values, 379, 380f variables, 378, 378t sPCA see Spherical PCA (sPCA) Spherical PCA (sPCA) algorithm, 325–326 ePCA, 326 L1-median estimator, 324, 326f robust properties, 324 SR see Selectivity ratio (SR) Supervised techniques, HSI and chemometrics application, classification modelling, 358–360, 360f contamination detection, cheese, 360–362, 361f ROIs, 358 SVD see Singular value decomposition (SVD) T Targeted analysis and pattern recognition method data matrix, 420 multivariate analysis, 420 Index strategies, 418, 419f TOCSY see Total correlation spectroscopy (TOCSY) Total correlation spectroscopy (TOCSY), 437, 438f Traceability, food PDO and PGI oenological products, 384–390 soil sampling planning see Soil sampling planning, food traceability Tucker models assessor modes, 294–298, 296f cheese, 292, 293f core array, 290–291 core elements, 294–298 mathematical formulation, 287–288 optimal dimensionality, 292 properties, 288–289 selection, optimal model complexity, 292–294, 295f sensory analysis application, 292 single-block multiway data arrays, 287 sour taste and chalky sensation, 294–298 split-half analysis, 294 Tucker3 core array, 294, 295t Tucker1 model, 289–290 Tucker2 model, 289 Tucker3 model, 291–292 U UNEQual class modelling (UNEQ), 224–225 Univariate linear regression, 132–133 Unsupervised techniques see Principal component analysis (PCA) V Validation procedure, regression cross-validation, 145–146 data-driven and hypothesis-driven, 144–145 food chemistry, 144 objectives, 144 test-set, 145 Variable importance in projection (VIP) advantages, 165 definition, 163–164 score, calculation, 164 VIP see Variable importance in projection (VIP) .. .Chemometrics in Food Chemistry Intentionally left as blank Data Handling in Science and Technology Volume 28 Chemometrics in Food Chemistry Edited by Federico Marini Department of Chemistry, ... in the field of food analysis and control [3–5] Indeed, since the very beginning, chemometrics has been dealing with different problems related to food quality [6–8] Today, when considering food. .. Differences Between Interval Approach and Binning (Bucketing) 2.2 Different Ways of Defining Intervals 2.3 Combinations of Intervals for Model’s Performance Improvement 2.4 Comparison Between Interval Models