1. Trang chủ
  2. » Tất cả

Classification and Data Mining [Giusti, Ritter & Vichi 2012-12-17]

290 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • Classification and Data Mining

    • Preface

    • Contents

    • Contributors

    • Part I Classification and Data Analysis

      • Robust Random Effects Models: A Diagnostic Approach Based on the Forward Search

        • 1 Introduction

        • 2 The Random Effects Model

        • 3 Forward Search

          • 3.1 Step 1: Choice of the Initial Subset

          • 3.2 Step 2: Adding Observations During the Search

          • 3.3 Step 3: Monitoring the Search

        • References

      • Joint Correspondence Analysis Versus Multiple Correspondence Analysis: A Solution to an Undetected Problem

        • 1 Introduction

        • 2 Theoretical Framework

          • 2.1 Multiple Correspondence Analysis

          • 2.2 Joint Correspondence Analysis

        • 3 An Application

        • 4 Conclusion

        • References

      • Inference on the CUB Model: An MCMC Approach

        • 1 Introduction

        • 2 The CUB Model

        • 3 Bayesian Inference

        • 4 A Simulation Study

        • 5 An Application to Real Data

        • 6 Conclusion

        • References

      • Robustness Versus Consistency in Ill-Posed Classification and Regression Problems

        • 1 Introduction

        • 2 Ill-Posed Statistical Problems

        • 3 Robustness Versus Consistency

        • 4 Example: Support Vector Machines

        • 5 Conclusions

        • Appendix

        • References

      • Issues on Clustering and Data Gridding

        • 1 Introduction

        • 2 Gridding Approach

        • 3 Example Results

        • 4 Conclusions

        • References

      • Dynamic Data Analysis of Evolving Association Patterns

        • 1 Introduction

        • 2 Problem Statement

        • 3 The Procedure

        • 4 Example on Real Data

        • References

      • Classification of Data Chunks Using Proximal Vector Machines and Singular Value Decomposition

        • 1 Introduction

        • 2 ReGEC and Vector Machines Based Classification Methods

        • 3 ReGEC for Data Chunks: SVD Based Ensemble Classifier

        • 4 Experiments

        • 5 Conclusions

        • References

      • Correspondence Analysis in the Case of Outliers

        • 1 Introduction

        • 2 Notation

        • 3 Outliers

          • 3.1 Generation of Outliers

        • 4 Simulation Study

          • 4.1 CA-Coordinates

        • 5 Conclusion

        • References

      • Variable Selection in Cluster Analysis: An Approach Based on a New Index

        • 1 Introduction

        • 2 The Index and Its Properties

        • 3 The Complement of the Index

        • 4 Criteria for Variable Selection

        • 5 An Application to a Real Data Set

        • References

      • A Model for the Clustering of Variables Taking into Account External Data

        • 1 Introduction

        • 2 A Model for Preference Data

        • 3 Properties of CLV

          • 3.1 Properties of CLV Taking into Account External Data

          • 3.2 Properties of CLV Without External Data

          • 3.3 Conclusion

        • 4 Simulation Study

        • 5 Conclusion and Perspectives

        • References

      • Calibration with Spatial Data Constraints

        • 1 Introduction

        • 2 Constraints

          • 2.1 Constraint on Auxiliary Information

          • 2.2 Constraint on Missing Responses

          • 2.3 Constraint on the Density of the Units in the Subset Grid

        • 3 Simulation

        • 4 Concluding Remarks

        • References

    • Part II Data Mining

      • Clustering Data Streams by On-Line Proximity Updating

        • 1 Introduction

        • 2 State of Art

        • 3 Clustering Data Streams Through the On-Line Clustering of Data Batches

          • 3.1 Dissimilarity Updating

          • 3.2 Off-Line Partitioning

        • 4 Main Results

        • 5 Conclusions

        • References

      • Summarizing and Detecting Structural Drifts from Multiple Data Streams

        • 1 Introduction

        • 2 Monitoring Proximity Relations Among Data Streams

        • 3 Evaluating the Evolution of Proximity Relations

        • 4 Main Results

        • 5 Conclusions and Perspectives

        • References

      • A Model-Based Approach for Qualitative Assessment in Opinion Mining

        • 1 Introduction

        • 2 Opinions Expressed by Ordinal Data

        • 3 Mixture Models for Ordinal Data

        • 4 A Multistage Ranking Model

        • 5 A Case Study on Political Opinions

        • 6 Concluding Remarks

        • References

      • An Evaluation Measure for Learning from Imbalanced Data Based on Asymmetric Beta Distribution

        • 1 Introduction

        • 2 The H Measure: A Replacement for the AUC

        • 3 B42: A New Evaluation Measure for Learning from Imbalanced Data

        • 4 Empirical Evaluation

          • 4.1 Data Sets

          • 4.2 B42 Versus AUC and H

        • 5 Conclusion

        • References

      • Outlier Detection for Geostatistical Functional Data: An Application to Sensor Data

        • 1 Introduction

        • 2 Geostatistical Functional Data

        • 3 Outlier Detection for Geostatistical Functional Data

        • 4 An Application to Sensor Data

        • 5 Conclusion and Perspectives

        • References

      • Graphical Models for Eliciting Structural Information

        • 1 Introduction

        • 2 Methods

          • 2.1 Structural Reference Features

          • 2.2 The Degree of Belief

          • 2.3 A Simple Case Study

        • 3 Discussion

        • References

      • Adaptive Spectral Clustering in Molecular Simulation

        • 1 Introduction

        • 2 Spectral Clustering and PCCA+

        • 3 Application of PCCA+ in Molecular Simulation

        • 4 Adaptive Decomposition

        • 5 Summary

        • References

    • Part III Applications

      • Spatial Data Mining for Clustering: An Application to the Florentine Metropolitan Area Using RedCap

        • 1 Introduction

        • 2 Regionalization Method

        • 3 Data and Results

        • 4 Concluding Remarks

        • References

      • Misspecification Resistant Model Selection Using Information Complexitywith Applications

        • 1 Introduction

        • 2 Multivariate Regression Modeling with ICOMP

          • 2.1 Multivariate Gaussian Regression

          • 2.2 Robust Misspecification-Resistant InformationComplexity Criteria

        • 3 Dimension Reduction with the Genetic Algorithm and Probabilistic Principle Components Analysis

          • 3.1 Genetic Algorithm

          • 3.2 Probabilistic Principle Components Analysis

        • 4 Numerical Results

        • 5 Concluding Remarks

        • References

      • A Clusterwise Regression Method for the Prediction of the Disposal Income in Municipalities

        • 1 Introduction

        • 2 The Basic Model

        • 3 From a Single Model to k Models

          • 3.1 The Clusterwise Regression

          • 3.2 Four Models for PCDI Prediction

          • 3.3 The Municipal PCDI Prediction

        • 4 Final Considerations

        • References

      • A Continuous Time Mover-Stayer Model for Labor Market in a Northern Italian Area

        • 1 Motivation

        • 2 The C.OBB Data and the Job States

        • 3 The Continuous Time Mover Stayer Model

        • 4 Bayesian Inference on the CTMS

        • 5 Main Results

        • 6 Conclusion

        • References

      • Model-Based Clustering of Multistate Data with Latent Change: An Application with DHS Data

        • 1 Introduction

        • 2 The Finite Mixture Model with Latent Change

        • 3 Application

        • 4 Conclusion

        • References

      • An Approach to Forecasting Beanplot Time Series

        • 1 Introduction

        • 2 Beanplot Time Series, Internal and External Modeling

        • 3 An Approach to Forecasting Beanplot Time Series

        • 4 Forecasting the Beanplot Time Series Related to the Dow JonesMarket

          • 4.1 Diagnostic Models an Accuracy of the Forecasts

        • 5 Conclusions

        • References

      • Shared Components Models in Joint Disease Mapping: A Comparison

        • 1 Introduction

        • 2 Shared Components Models

          • 2.1 Shared Components Exchangeable Poisson Model

          • 2.2 Shared Components Multinomial Model

        • 3 Simulation Study

        • 4 Results

        • 5 Conclusion

        • References

      • Piano and Guitar Tone Distinction Based on Extended FeatureAnalysis

        • 1 Introduction

        • 2 First Study: Characterization by High Level Features

          • 2.1 Groups of Features

          • 2.2 Classification and Evaluation

          • 2.3 Interpretation of Selected Features

        • 3 Second Study: Classification with a Large Feature Set

        • 4 Conclusions

        • References

      • Auralization of Auditory Models

        • 1 Introduction

        • 2 Auralization by Using Statistical Methods

          • 2.1 Frequency Detection

          • 2.2 Power Estimation of Each Frequency

        • 3 Conclusion

        • References

      • Visualisation and Analysis of Affiliation Networks as Tools to Describe Professional Profiles

        • 1 The Analysis of Professional Profiles

        • 2 Jobs, Competencies and Activities as Affiliation Networks

        • 3 An Application to the R&D Field

        • 4 Final Remarks

        • References

      • Graduation by Adaptive Discrete Beta Kernels

        • 1 Introduction

        • 2 Discrete Beta Kernel Graduation

        • 3 An Adaptive Variant

        • 4 The Choice of h and s

        • 5 Simulation Results

        • 6 An Application to Italian Mortality Data

        • 7 Concluding Remarks

        • References

      • Modelling Spatial Variations of Fertility Rate in Italy

        • 1 Introduction

        • 2 The GWR Model

        • 3 Performance Comparison on Socio-demographic Data Set

        • 4 Conclusion

        • References

      • Visualisation of Cluster Analysis Results

        • 1 Introduction

        • 2 Cluster Analysis and Graphical Representation

        • 3 Validation of Cluster Analysis by Bootstrapping

        • 4 The ``Big Grid'' Spreadsheet Plotting Board of Excel

        • References

      • The Application of M-Function Analysis to the Geographical Distribution of Earthquake Sequence

        • 1 Introduction

        • 2 The M-Function: a Variant of Ripley's K-Function

        • 3 Details of Application

          • 3.1 Data Set

          • 3.2 Results and Discussion

        • 4 Concluding Remarks

        • References

      • Energy Consumption – Gross Domestic Product Causal Relationship in the Italian Regions

        • 1 Introduction and Background

        • 2 Data and Methods

        • 3 Main Results and Discussion

        • 4 Conclusion and Further Researches

        • References

Nội dung

Studies in Classification, Data Analysis, and Knowledge Organization Managing Editors Editorial Board H.-H Bock, Aachen W Gaul, Karlsruhe M Vichi, Rome C Weihs, Dortmund D Baier, Cottbus F Critchley, Milton Keynes R Decker, Bielefeld E Diday, Paris M Greenacre, Barcelona C.N Lauro, Naples J Meulman, Leiden P Monari, Bologna S Nishisato, Toronto N Ohsumi, Tokyo O Opitz, Augsburg G Ritter, Passau M Schader, Mannheim For further volumes: http://www.springer.com/series/1564 • Antonio Giusti Maurizio Vichi  Gunter Ritter Editors Classification and Data Mining 123 Editors Prof Antonio Giusti Department of Statistics University of Florence Florence, Italy Prof Dr Gunter Ritter Faculty for Informatics and Mathematics University of Passau Passau, Germany Prof Maurizio Vichi Department of Statistics, Probability and Applied Statistics University of Rome “La Sapienza” Rome, Italy ISSN 1431-8814 ISBN 978-3-642-28893-7 ISBN 978-3-642-28894-4 (eBook) DOI 10.1007/978-3-642-28894-4 Springer Heidelberg New York Dordrecht London Library of Congress Control Number: 2012952267 © Springer-Verlag Berlin Heidelberg 2013 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer Permissions for use may be obtained through RightsLink at the Copyright Clearance Center Violations are liable to prosecution under the respective Copyright Law The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface Following a biannual tradition of organizing joint meetings between classification societies, the Classification and Data Analysis Group of the Italian Statistical Society, CLADAG, has organized its international meeting together with the German Classification Society, GfKl, at Firenze, Italy, September 8–10, 2010 The Conference was originally conceived as a German-Italian event, but it counted the participation of researchers from several nations and especially from Austria, France, Germany, Great Britain, Italy, Korea, the Netherlands, Portugal, Slovenia, and Spain The meeting has shown once more the vitality of data analysis and classification and served as a forum for presentation, discussion, and exchange of ideas between the most active scientists in the field It has also shown the strong bonds between the two classification societies and has greatly helped to deepen relationships The conference program included Plenary, 12 Invited, and 31 Contributed Sessions This book contains selected and peer-reviewed papers presented at the meeting in the area of “Classification and Data Mining.” Browsing through the volume, the reader will see both methodological articles showing new original methods and articles on applications illustrating how new domain-specific knowledge can be made available from data by clever use of data analysis methods According to the title, the book is divided into three parts: Classification and Data Analysis Data Mining Applications The methodologically oriented papers on classification and data analysis deal, among other things, with robustness, analysis of spatial data, and application of Monte Carlo Markov Chain methods Variable selection and clustering of variables play an increasing role in applications where there are substantially more variables than observations Support vector machines offer models and methods for the analysis of complex data structures that go beyond classical ones Special discussed topics are association patterns and correspondence analysis Automated methods in data mining, producing knowledge discovery in huge data structures such as those associated with new media (e.g., Internet), digital images, v vi Preface or genomes in Genetics, continue to represent, in the near future, a big challenge for data analysis Information is readily retrieved in these fields; however, interpreting it and identifying relevant results is not a straightforward task at all Especially data produced by the Internet, genetics studies on genomes, and proteomes have a particular appeal as objects of analysis and are studied in this book Furthermore, there are applications of the Markov chains model, to a new brand of problems such as the knowledge discovery in the Internet, the analysis of large biomedical data sets, and in more general sensor data Moreover, the automatic online processing of data streams is becoming increasingly important In sociology and market research, opinion mining on a large number of expressed preferences plays an important role All these data typologies require algorithmic methods in the interface between statistics and computer science Other contributions in the book focus on the application of the singular value decomposition to structural learning in Bayesian networks and on molecular simulation for drug design The last part of the book contains interesting applications to various fields of research such as sociology, market research, environment, geography, and music: estimation in demographic data, description of professional profiles, metropolitan studies such as income in municipalities, labor market research, environmental energy consumption, geographical data such as seismic time series, auditory models in speech and music, application of mixture models to multi-state data, and visualization techniques We hope that this short description stimulates the reader to take a closer look at some of the articles Our thanks go to Andrea Giommi and his local organizing team who have done a great job (Bruno Bertaccini, Matilde Bini, Anna Gottard, Leonardo Grilli, Alessandra Mattei, Alessandra Petrucci, Carla Rampichini, Emilia Rocco) We gratefully acknowledge the Faculty of Economics and the “Ente Cassa di Risparmio di Firenze” for financial support, and desire to express our special thanks to Chiara Bocci for her valuable contribution to the organization of the meeting and for her assistance in producing this book Also on behalf of our colleagues we may say that we have very much enjoyed having been their guests in Firenze The dinner with a view to the Dome was excellent and we appreciate it very much We wish to express our gratitude to the other members of the Scientific Programme Committee: Daniel Baier, Reinhold Decker, Filippo Domma, Luigi Fabbris, Christian Hennig, Carlo Lauro, Berthold Lausen, Hermann Locarek-Junge, Isabella Morlini, Lars Schmidt-Thieme, Gabriele Soffritti, Alfred Ultsch, Rosanna Verde, Donatella Vicari, and Claus Weihs We also thank the section organizers for having put together such strong sections The Italian tradition of discussants and rejoinders has been a new experience for GfKl Thanks go to the referees for their important job Last but not least, we thank all speakers and all who came to listen and to discuss with them Florence, Italy Passau, Germany Rome, Italy Antonio Giusti Gunter Ritter Maurizio Vichi Contents Part I Classification and Data Analysis Robust Random Effects Models: A Diagnostic Approach Based on the Forward Search Bruno Bertaccini and Roberta Varriale Joint Correspondence Analysis Versus Multiple Correspondence Analysis: A Solution to an Undetected Problem Sergio Camiz and Gast˜ao Coelho Gomes Inference on the CUB Model: An MCMC Approach Laura Deldossi and Roberta Paroli Robustness Versus Consistency in Ill-Posed Classification and Regression Problems Robert Hable and Andreas Christmann 11 19 27 Issues on Clustering and Data Gridding Jukka Heikkonen, Domenico Perrotta, Marco Riani, and Francesca Torti 37 Dynamic Data Analysis of Evolving Association Patterns Alfonso Iodice D’Enza and Francesco Palumbo 45 Classification of Data Chunks Using Proximal Vector Machines and Singular Value Decomposition Antonio Irpino, Mario Rosario Guarracino, and Rosanna Verde Correspondence Analysis in the Case of Outliers Anna Langovaya, Sonja Kuhnt, and Hamdi Chouikha Variable Selection in Cluster Analysis: An Approach Based on a New Index Isabella Morlini and Sergio Zani 55 63 71 vii viii Contents A Model for the Clustering of Variables Taking into Account External Data Karin Sahmer Calibration with Spatial Data Constraints Ivan Arcangelo Sciascia Part II 81 89 Data Mining Clustering Data Streams by On-Line Proximity Updating Antonio Balzanella, Yves Lechevallier, and Rosanna Verde 97 Summarizing and Detecting Structural Drifts from Multiple Data Streams 105 Antonio Balzanella and Rosanna Verde A Model-Based Approach for Qualitative Assessment in Opinion Mining 113 Maria Iannario and Domenico Piccolo An Evaluation Measure for Learning from Imbalanced Data Based on Asymmetric Beta Distribution 121 Nguyen Thai-Nghe, Zeno Gantner, and Lars Schmidt-Thieme Outlier Detection for Geostatistical Functional Data: An Application to Sensor Data 131 Elvira Romano and Jorge Mateu Graphical Models for Eliciting Structural Information 139 Federico M Stefanini Adaptive Spectral Clustering in Molecular Simulation 147 Marcus Weber Part III Applications Spatial Data Mining for Clustering: An Application to the Florentine Metropolitan Area Using RedCap 157 Federico Benassi, Chiara Bocci, and Alessandra Petrucci Misspecification Resistant Model Selection Using Information Complexity with Applications 165 Hamparsum Bozdogan, J Andrew Howe, Suman Katragadda, and Caterina Liberati A Clusterwise Regression Method for the Prediction of the Disposal Income in Municipalities 173 Paolo Chirico Contents ix A Continuous Time Mover-Stayer Model for Labor Market in a Northern Italian Area 181 Fabrizio Cipollini, Camilla Ferretti, Piero Ganugi, and Mario Mezzanzanica Model-Based Clustering of Multistate Data with Latent Change: An Application with DHS Data 189 Jos´e G Dias An Approach to Forecasting Beanplot Time Series 197 Carlo Drago and Germana Scepi Shared Components Models in Joint Disease Mapping: A Comparison 207 Emanuela Dreassi Piano and Guitar Tone Distinction Based on Extended Feature Analysis 215 Markus Eichhoff, Igor Vatolkin, and Claus Weihs Auralization of Auditory Models 225 Klaus Friedrichs and Claus Weihs Visualisation and Analysis of Affiliation Networks as Tools to Describe Professional Profiles 233 Cristiana Martini Graduation by Adaptive Discrete Beta Kernels 243 Angelo Mazza and Antonio Punzo Modelling Spatial Variations of Fertility Rate in Italy 251 Massimo Mucciardi and Pietro Bertuccelli Visualisation of Cluster Analysis Results 261 Hans-Joachim Mucha, Hans-Georg Bartel, and Carlos Morales-Merino The Application of M-Function Analysis to the Geographical Distribution of Earthquake Sequence 271 Eugenia Nissi, Annalina Sarra, Sergio Palermi, and Gaetano De Luca Energy Consumption – Gross Domestic Product Causal Relationship in the Italian Regions 279 Antonio Angelo Romano and Giuseppe Scandurra • ... (eds.), Classification and Data Mining, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-642-28894-4 1, © Springer-Verlag Berlin Heidelberg 2013 B Bertaccini and. .. Classification and Data Mining, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-642-28894-4 2, © Springer-Verlag Berlin Heidelberg 2013 11 12 S Camiz and. .. speakers and all who came to listen and to discuss with them Florence, Italy Passau, Germany Rome, Italy Antonio Giusti Gunter Ritter Maurizio Vichi Contents Part I Classification and Data Analysis

Ngày đăng: 17/04/2017, 08:36

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN