Machine learning in medicine

SPRINGER BRIEFS IN STATISTICS Ton J. Cleophas Aeilko H. Zwinderman Machine Learning in Medicine— Cookbook Three SpringerBriefs in Statistics More information about this series at http://www.springer.com/series/8921 Ton J Cleophas Aeilko H Zwinderman • Machine Learning in Medicine—Cookbook Three 123 Aeilko H Zwinderman Department of Biostatistics and Epidemiology Academic Medical Center Leiden The Netherlands Ton J Cleophas Department of Medicine Albert Schweitzer Hospital Sliedrecht The Netherlands Additional material to this book can be downloaded from http://extras.springer.com/ ISSN 2191-544X ISBN 978-3-319-12162-8 DOI 10.1007/978-3-319-12163-5 ISSN 2191-5458 (electronic) ISBN 978-3-319-12163-5 (eBook) Library of Congress Control Number: 2013957369 Springer Cham Heidelberg New York Dordrecht London © The Author(s) 2014 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer Permissions for use may be obtained through RightsLink at the Copyright Clearance Center Violations are liable to prosecution under the respective Copyright Law The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface The amount of medical data is estimated to double every 20 months, and clinicians are at a loss to analyze them Fortunately, user friendly statistical software has been helpful for the past 30 years However, traditional statistical methods have difficulty to identify outliers in large datasets, and to find patterns in big data and data with multiple exposure/outcome variables In addition, analysis-rules for surveys and questionnaires, which are currently common methods of medical data collection, are, essentially, missing Fortunately, a new discipline, machine learning, is able to cover all of these limitations It involves computationally intensive methods like factor analysis, cluster analysis, and discriminant analysis It is currently mainly the domain of computer scientists, and is already commonly used in social sciences, marketing research, operational research, and applied sciences It is little used in medical research, probably due to the traditional belief of clinicians in clinical trials where multiple variables even out by the randomization process, and are not taken into account In contrast, modern medical computer files often involve hundreds of variables like genes and other laboratory values, and computationally intensive methods are required In the past years we have completed a series of three textbooks entitled Machine Learning in Medicine Part One, Two, and Three (ed by Springer Heidelberg Germany, 2012–2013) Also, we produced two-100 page cookbooks, entitled Machine Learning in Medicine—Cookbook One and Two These cookbooks were (1) without background information and theoretical discussions, (2) highlighting technical details, (3) with data examples available at extras.springer.com for readers to perform their own analyses, (4) with references to the above textbooks for those wishing background information The current volume, entitled Machine Learning in Medicine—Cookbook Three was written in a way much similar to that of the first two, and it reviews concised versions of machine learning methods so far, like spectral plots, Bayesian networks, v vi Preface support vector machines (Chaps 9, 12, 13) Also, a first description is given of several new methods already employed by technical and market scientists, and of their suitabilities for clinical research, like ordinal scalings for inconsistent intervals, loglinear models for varying incident risks, iteration methods for cross-validations (Chaps 4–6, 16) Additional new subjects are the following Chapter describes a novel method for data mining using visualization processes instead of calculus methods Chapter describes the use of trained clusters, a scientifically more appropriate alternative for traditional cluster analysis Chapter 11 describes evolutionary operations (evops), and the evop calculators, already widely used in chemical and technical process improvements Similar to the first cookbook, the current work will assess in a nonmathematical way the stepwise analyses of 20 machine learning methods, that are, likewise, based on three major machine learning methodologies: Cluster Methodologies (Chaps 1, 2), Linear Methodologies (Chaps 3–8), Rules Methodologies (Chaps 9–20) In extras.springer.com the data files of the examples (14 SPSS files) are given (both real and hypothesized data) Furthermore, csv type excel files are available for data analysis in the Konstanz Information Miner, a widely approved free machine learning software package on the Internet since 2006 The current 100-page book entitled Machine Learning in Medicine—Cookbook Three, and its complementary “Cookbooks One and Two” are written as training companions for 60 important machine learning methods relevant to medicine We should emphasize that all of the methods described have been successfully applied in the authors’ own research Lyon, August 2014 Ton J Cleophas Aeilko H Zwinderman Contents of Previous Volumes Machine Learning in Medicine—Cookbook One Cluster Models Hierarchical Clustering and K-means Clustering to Identify Subgroups in Surveys Density-based Clustering to Identify Outlier Groups in Otherwise Homogeneous Data Two-Step Clustering to Identify Subgroups and Predict Subgroup Memberships Linear Models Linear, Logistic, and Cox Regression for Outcome Prediction with Unpaired Data Generalized Linear Models for Outcome Prediction with Paired Data Generalized Linear Models for Predicting Event-Rates Factor Analysis and Partial Least Squares (PLS) for Complex-Data Reduction Optimal Scaling of High-sensitivity Analysis of Health Predictors Discriminant Analysis for Making a Diagnosis from Multiple Outcomes 10 Weighted Least Squares for Adjusting Efficacy Data with Inconsistent Spread 11 Partial Correlations for Removing Interaction Effects from Efficacy Data 12 Canonical Regression for Overall Statistics of Multivariate Data Rules Models 13 Neural Networks for Assessing Relationships that are Typically Nonlinear 14 Complex Samples Methodologies for Unbiased Sampling 15 Correspondence Analysis for Identifying the Best of Multiple Treatments in Multiple Groups 16 Decision Trees for Decision Analysis 17 Multidimensional Scaling for Visualizing Experienced Drug Efficacies vii viii Contents of Previous Volumes 18 Stochastic Processes for Long Term Predictions from Short Term Observations 19 Optimal Binning for Finding High Risk Cut-offs 20 Conjoint Analysis for Determining the Most Appreciated Properties of Medicines to be Developed Machine Learning in Medicine—Cookbook Two Cluster Models Nearest Neighbors for Classifying New Medicines Predicting High-Risk-Bin Memberships Predicting Outlier Memberships Linear Models 10 11 Polynomial Regression for Outcome Categories Automatic Nonparametric Tests for Predictor Categories Random Intercept Models for Both Outcome and Predictor Categories Automatic Regression for Maximizing Linear Relationships Simulation Models for Varying Predictors Generalized Linear Mixed Models for Outcome Prediction from Mixed Data Two Stage Least Squares for Linear Models with Problematic Predictors Autoregressive Models for Longitudinal Data Rules Models 12 13 14 15 16 17 18 19 20 Item Response Modeling for Analyzing Quality of Life with Better Precision Survival Studies with Varying Risks of Dying Fuzzy Logic for Improved Precision of Pharmacological Data Analysis Automatic Data Mining for the Best Treatment of a Disease Pareto Charts for Identifying the Main Factors of Multifactorial Outcomes Radial Basis Neural Networks for Multidimensional Gaussian Data Automatic Modeling for Drug Efficacy Prediction Automatic Modeling for Clinical Event Prediction Automatic Newton Modeling in Clinical Pharmacology Contents Part I Cluster Models Data Mining for Visualization of Health Processes (150 Patients with Pneumonia) 1.1 General Purpose 1.2 Primary Scientific Question 1.3 Example 1.4 Knime Data Miner 1.5 Knime Workflow 1.6 Box and Whiskers Plots 1.7 Lift Chart 1.8 Histogram 1.9 Line Plot 1.10 Matrix of Scatter Plots 1.11 Parallel Coordinates 1.12 Hierarchical Cluster Analysis with SOTA (Self Organizing Tree Algorithm) 1.13 Conclusion 3 3 7 10 11 12 13 Training Decision Trees for a More Meaningful Accuracy (150 Patients with Pneumonia) 2.1 General Purpose 2.2 Primary Scientific Question 2.3 Example 2.4 Downloading the Knime Data Miner 2.5 Knime Workflow 2.6 Conclusion 15 15 15 16 17 18 20 ix ... analysis in the Konstanz Information Miner, a widely approved free machine learning software package on the Internet since 2006 The current 100-page book entitled Machine Learning in Medicine? ??Cookbook... for Outcome Categories, 39 M Machine learning, v Machine learning in medicine? ??cookbook 1, 2, 3, v Machine learning in medicine Parts 1, 2, 3, v Marginalization, 75 Markov models, 121 Matrix of... 2012 Machine Learning in Medicine Part One, Springer Heidelberg Germany Chap Introduction to machine learning, p 5, 2012, Chap Optimal scaling: discretization, p 28, 2012, Chap Optimal scaling,

Định dạng
Số trang	132
Dung lượng	8,59 MB
File đính kèm	58. Machine Learning in Medicine.rar (7 MB)