Classification and Data Mining [Giusti, Ritter & Vichi 2012-12-17]

Studies in Classification, Data Analysis, and Knowledge Organization Managing Editors Editorial Board H.-H Bock, Aachen W Gaul, Karlsruhe M Vichi, Rome C Weihs, Dortmund D Baier, Cottbus F Critchley, Milton Keynes R Decker, Bielefeld E Diday, Paris M Greenacre, Barcelona C.N Lauro, Naples J Meulman, Leiden P Monari, Bologna S Nishisato, Toronto N Ohsumi, Tokyo O Opitz, Augsburg G Ritter, Passau M Schader, Mannheim For further volumes: http://www.springer.com/series/1564 • Antonio Giusti Maurizio Vichi Gunter Ritter Editors Classification and Data Mining 123 Editors Prof Antonio Giusti Department of Statistics University of Florence Florence, Italy Prof Dr Gunter Ritter Faculty for Informatics and Mathematics University of Passau Passau, Germany Prof Maurizio Vichi Department of Statistics, Probability and Applied Statistics University of Rome “La Sapienza” Rome, Italy ISSN 1431-8814 ISBN 978-3-642-28893-7 ISBN 978-3-642-28894-4 (eBook) DOI 10.1007/978-3-642-28894-4 Springer Heidelberg New York Dordrecht London Library of Congress Control Number: 2012952267 © Springer-Verlag Berlin Heidelberg 2013 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer Permissions for use may be obtained through RightsLink at the Copyright Clearance Center Violations are liable to prosecution under the respective Copyright Law The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface Following a biannual tradition of organizing joint meetings between classification societies, the Classification and Data Analysis Group of the Italian Statistical Society, CLADAG, has organized its international meeting together with the German Classification Society, GfKl, at Firenze, Italy, September 8–10, 2010 The Conference was originally conceived as a German-Italian event, but it counted the participation of researchers from several nations and especially from Austria, France, Germany, Great Britain, Italy, Korea, the Netherlands, Portugal, Slovenia, and Spain The meeting has shown once more the vitality of data analysis and classification and served as a forum for presentation, discussion, and exchange of ideas between the most active scientists in the field It has also shown the strong bonds between the two classification societies and has greatly helped to deepen relationships The conference program included Plenary, 12 Invited, and 31 Contributed Sessions This book contains selected and peer-reviewed papers presented at the meeting in the area of “Classification and Data Mining.” Browsing through the volume, the reader will see both methodological articles showing new original methods and articles on applications illustrating how new domain-specific knowledge can be made available from data by clever use of data analysis methods According to the title, the book is divided into three parts: Classification and Data Analysis Data Mining Applications The methodologically oriented papers on classification and data analysis deal, among other things, with robustness, analysis of spatial data, and application of Monte Carlo Markov Chain methods Variable selection and clustering of variables play an increasing role in applications where there are substantially more variables than observations Support vector machines offer models and methods for the analysis of complex data structures that go beyond classical ones Special discussed topics are association patterns and correspondence analysis Automated methods in data mining, producing knowledge discovery in huge data structures such as those associated with new media (e.g., Internet), digital images, v vi Preface or genomes in Genetics, continue to represent, in the near future, a big challenge for data analysis Information is readily retrieved in these fields; however, interpreting it and identifying relevant results is not a straightforward task at all Especially data produced by the Internet, genetics studies on genomes, and proteomes have a particular appeal as objects of analysis and are studied in this book Furthermore, there are applications of the Markov chains model, to a new brand of problems such as the knowledge discovery in the Internet, the analysis of large biomedical data sets, and in more general sensor data Moreover, the automatic online processing of data streams is becoming increasingly important In sociology and market research, opinion mining on a large number of expressed preferences plays an important role All these data typologies require algorithmic methods in the interface between statistics and computer science Other contributions in the book focus on the application of the singular value decomposition to structural learning in Bayesian networks and on molecular simulation for drug design The last part of the book contains interesting applications to various fields of research such as sociology, market research, environment, geography, and music: estimation in demographic data, description of professional profiles, metropolitan studies such as income in municipalities, labor market research, environmental energy consumption, geographical data such as seismic time series, auditory models in speech and music, application of mixture models to multi-state data, and visualization techniques We hope that this short description stimulates the reader to take a closer look at some of the articles Our thanks go to Andrea Giommi and his local organizing team who have done a great job (Bruno Bertaccini, Matilde Bini, Anna Gottard, Leonardo Grilli, Alessandra Mattei, Alessandra Petrucci, Carla Rampichini, Emilia Rocco) We gratefully acknowledge the Faculty of Economics and the “Ente Cassa di Risparmio di Firenze” for financial support, and desire to express our special thanks to Chiara Bocci for her valuable contribution to the organization of the meeting and for her assistance in producing this book Also on behalf of our colleagues we may say that we have very much enjoyed having been their guests in Firenze The dinner with a view to the Dome was excellent and we appreciate it very much We wish to express our gratitude to the other members of the Scientific Programme Committee: Daniel Baier, Reinhold Decker, Filippo Domma, Luigi Fabbris, Christian Hennig, Carlo Lauro, Berthold Lausen, Hermann Locarek-Junge, Isabella Morlini, Lars Schmidt-Thieme, Gabriele Soffritti, Alfred Ultsch, Rosanna Verde, Donatella Vicari, and Claus Weihs We also thank the section organizers for having put together such strong sections The Italian tradition of discussants and rejoinders has been a new experience for GfKl Thanks go to the referees for their important job Last but not least, we thank all speakers and all who came to listen and to discuss with them Florence, Italy Passau, Germany Rome, Italy Antonio Giusti Gunter Ritter Maurizio Vichi Contents Part I Classification and Data Analysis Robust Random Effects Models: A Diagnostic Approach Based on the Forward Search Bruno Bertaccini and Roberta Varriale Joint Correspondence Analysis Versus Multiple Correspondence Analysis: A Solution to an Undetected Problem Sergio Camiz and Gast˜ao Coelho Gomes Inference on the CUB Model: An MCMC Approach Laura Deldossi and Roberta Paroli Robustness Versus Consistency in Ill-Posed Classification and Regression Problems Robert Hable and Andreas Christmann 11 19 27 Issues on Clustering and Data Gridding Jukka Heikkonen, Domenico Perrotta, Marco Riani, and Francesca Torti 37 Dynamic Data Analysis of Evolving Association Patterns Alfonso Iodice D’Enza and Francesco Palumbo 45 Classification of Data Chunks Using Proximal Vector Machines and Singular Value Decomposition Antonio Irpino, Mario Rosario Guarracino, and Rosanna Verde Correspondence Analysis in the Case of Outliers Anna Langovaya, Sonja Kuhnt, and Hamdi Chouikha Variable Selection in Cluster Analysis: An Approach Based on a New Index Isabella Morlini and Sergio Zani 55 63 71 vii viii Contents A Model for the Clustering of Variables Taking into Account External Data Karin Sahmer Calibration with Spatial Data Constraints Ivan Arcangelo Sciascia Part II 81 89 Data Mining Clustering Data Streams by On-Line Proximity Updating Antonio Balzanella, Yves Lechevallier, and Rosanna Verde 97 Summarizing and Detecting Structural Drifts from Multiple Data Streams 105 Antonio Balzanella and Rosanna Verde A Model-Based Approach for Qualitative Assessment in Opinion Mining 113 Maria Iannario and Domenico Piccolo An Evaluation Measure for Learning from Imbalanced Data Based on Asymmetric Beta Distribution 121 Nguyen Thai-Nghe, Zeno Gantner, and Lars Schmidt-Thieme Outlier Detection for Geostatistical Functional Data: An Application to Sensor Data 131 Elvira Romano and Jorge Mateu Graphical Models for Eliciting Structural Information 139 Federico M Stefanini Adaptive Spectral Clustering in Molecular Simulation 147 Marcus Weber Part III Applications Spatial Data Mining for Clustering: An Application to the Florentine Metropolitan Area Using RedCap 157 Federico Benassi, Chiara Bocci, and Alessandra Petrucci Misspecification Resistant Model Selection Using Information Complexity with Applications 165 Hamparsum Bozdogan, J Andrew Howe, Suman Katragadda, and Caterina Liberati A Clusterwise Regression Method for the Prediction of the Disposal Income in Municipalities 173 Paolo Chirico Contents ix A Continuous Time Mover-Stayer Model for Labor Market in a Northern Italian Area 181 Fabrizio Cipollini, Camilla Ferretti, Piero Ganugi, and Mario Mezzanzanica Model-Based Clustering of Multistate Data with Latent Change: An Application with DHS Data 189 Jos´e G Dias An Approach to Forecasting Beanplot Time Series 197 Carlo Drago and Germana Scepi Shared Components Models in Joint Disease Mapping: A Comparison 207 Emanuela Dreassi Piano and Guitar Tone Distinction Based on Extended Feature Analysis 215 Markus Eichhoff, Igor Vatolkin, and Claus Weihs Auralization of Auditory Models 225 Klaus Friedrichs and Claus Weihs Visualisation and Analysis of Affiliation Networks as Tools to Describe Professional Profiles 233 Cristiana Martini Graduation by Adaptive Discrete Beta Kernels 243 Angelo Mazza and Antonio Punzo Modelling Spatial Variations of Fertility Rate in Italy 251 Massimo Mucciardi and Pietro Bertuccelli Visualisation of Cluster Analysis Results 261 Hans-Joachim Mucha, Hans-Georg Bartel, and Carlos Morales-Merino The Application of M-Function Analysis to the Geographical Distribution of Earthquake Sequence 271 Eugenia Nissi, Annalina Sarra, Sergio Palermi, and Gaetano De Luca Energy Consumption – Gross Domestic Product Causal Relationship in the Italian Regions 279 Antonio Angelo Romano and Giuseppe Scandurra • ... (eds.), Classification and Data Mining, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-642-28894-4 1, © Springer-Verlag Berlin Heidelberg 2013 B Bertaccini and. .. Classification and Data Mining, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-642-28894-4 2, © Springer-Verlag Berlin Heidelberg 2013 11 12 S Camiz and. .. speakers and all who came to listen and to discuss with them Florence, Italy Passau, Germany Rome, Italy Antonio Giusti Gunter Ritter Maurizio Vichi Contents Part I Classification and Data Analysis

Định dạng
Số trang	290
Dung lượng	4,68 MB