Biomedical engineering systems and technologies

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	418
Dung lượng	21,01 MB

Nội dung

Nathalia Peixoto Margarida Silveira Hesham H Ali Carlos Maciel Egon L van den Broek (Eds.) Communications in Computer and Information Science 881 Biomedical Engineering Systems and Technologies 10th International Joint Conference, BIOSTEC 2017 Porto, Portugal, February 21–23, 2017 Revised Selected Papers 123 Communications in Computer and Information Science Commenced Publication in 2007 Founding and Former Series Editors: Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu, Dominik Ślęzak, and Xiaokang Yang Editorial Board Simone Diniz Junqueira Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil Phoebe Chen La Trobe University, Melbourne, Australia Joaquim Filipe Polytechnic Institute of Setúbal, Setúbal, Portugal Igor Kotenko St Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St Petersburg, Russia Krishna M Sivalingam Indian Institute of Technology Madras, Chennai, India Takashi Washio Osaka University, Osaka, Japan Junsong Yuan University at Buffalo, The State University of New York, Buffalo, USA Lizhu Zhou Tsinghua University, Beijing, China 881 More information about this series at http://www.springer.com/series/7899 Nathalia Peixoto Margarida Silveira Hesham H Ali Carlos Maciel Egon L van den Broek (Eds.) • • Biomedical Engineering Systems and Technologies 10th International Joint Conference, BIOSTEC 2017 Porto, Portugal, February 21–23, 2017 Revised Selected Papers 123 Editors Nathalia Peixoto George Mason University Fairfax, VA USA Margarida Silveira Instituto Superior Técnico (IST) University of Lisbon Lisbon Portugal Carlos Maciel University of Sao Paulo Sao Carlos, SP Brazil Egon L van den Broek Utrecht University Utrecht The Netherlands Hesham H Ali University of Nebraska at Omaha Omaha, NE USA ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-3-319-94805-8 ISBN 978-3-319-94806-5 (eBook) https://doi.org/10.1007/978-3-319-94806-5 Library of Congress Control Number: 2018947372 © Springer International Publishing AG, part of Springer Nature 2018 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Printed on acid-free paper This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface The present book includes extended and revised versions of a set of selected papers from the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017), held in Porto, Portugal, during February 21–23, 2017 BIOSTEC is composed of five co-located conferences, each specialized in a different knowledge area, namely, BIODEVICES, BIOIMAGING, BIOINFORMATICS, BIOSIGNALS, and HEALTHINF BIOSTEC 2017 received 297 paper submissions from 56 countries, of which only 6% are included in this book This reflects our care in selecting contributions These papers were selected by the conference chairs on the basis of a number of criteria that include the classifications and comments provided by the Program Committee members, the session chairs’ assessment, and the program chairs’ meta-review of the papers that were included in the technical program The authors of selected papers were invited to submit a revised, extended, and improved version of their conference paper, including at least 30% new material The purpose of the BIOSTEC joint conferences is to bring together researchers and practitioners, including engineers, biologists, health professionals, and informatics/computer scientists Research presented at BIOSTEC included both theoretical advances and applications of information systems, artificial intelligence, signal processing, electronics, and other engineering tools in areas related to advancing biomedical research and improving health care The papers included in this book contribute to the understanding of relevant research trends in biomedical engineering systems and technologies As such, they provide an overview of the field’s current state of the art We thank the authors for their contributions and Monica Saramago for process management In particular, we express our gratitude to the reviewers, who helped to ensure the quality of this publication February 2017 Nathalia Peixoto Margarida Silveira Hesham H Ali Carlos Maciel Egon L van den Broek Organization Conference Co-chairs Ana Fred Hugo Gamboa Mỏrio Vaz Instituto de Telecomunicaỗừes/IST, Portugal LIBPHYS-UNL/FCT - New University of Lisbon, Portugal INEGI LOME, FEUP, Portugal Program Co-chairs BIODEVICES Nathalia Peixoto Neural Engineering Lab, George Mason University, USA BIOIMAGING Margarida Silveira Instituto Superior Técnico (IST), Portugal BIOINFORMATICS Hesham Ali University of Nebraska at Omaha, USA BIOSIGNALS Carlos Maciel University of São Paulo, Brazil HEALTHINF Egon L van den Broek Utrecht University, The Netherlands BIODEVICES Program Committee Azam Ali Mohammed Bakr Steve Beeby Hadar Ben-Yoav Dinesh Bhatia Efrain Zenteno Bolos Luciano Boquete Carlo Capelli Vítor Carvalho Hamid Charkhkar Wenxi Chen Mireya Fernández Chimeno Youngjae Chun University of Otago, New Zealand CCIT-AASTMT, Egypt University of Southampton, UK Ben-Gurion University of the Negev, Israel North Eastern Hill University, India Universidad Católica San Pablo, Peru Alcala University, Spain Norwegian School of Sport Sciences, Norway IPCA and Algoritmi Research Centre, UM, Portugal Case Western Reserve University, USA The University of Aizu, Japan Universitat Politècnica de Catalunya, Spain University of Pittsburgh, USA VIII Organization James M Conrad Albert Cook Maeve Duffy Paddy French Juan Carlos Garcia Javier Garcia-Casado Bryon Gomberg Miguel Angel García Gonzalez Klas Hjort Toshiyuki Horiuchi Leonid Hrebien Sandeep K Jha Eyal Katz Michael Kraft Ondrej Krejcar Ning Lan Jung Chan Lee Chwee Teck Lim Mai S Mabrouk Jordi Madrenas Jarmo Malinen Karen May-Newman Joseph Mizrahi Raimes Moraes Umberto Morbiducci Antoni Nowakowski Eoin O’Cearbhaill Mónica Oliveira Abraham Otero Gonzalo Pajares Sofia Panteliou Nancy Paris Lionel Pazart Nathalia Peixoto Marek Penhaker Dmitry Rogatkin Wim L C Rutten Seonghan Ryu Ashutosh Sabharwal V V Raghavendra Sai Chutham Sawigun Michael J Schöning Mauro Serpelloni Dong Ik Shin University of North Carolina at Charlotte, USA University of Alberta, Canada NUI Galway, Ireland Delft University of Technology, The Netherlands University of Alcala, Spain Universitat Politècnica de València, Spain Geyra Gassner IP Law, Israel Universitat Politècnica de Catalunya, Spain Uppsala University, Sweden Tokyo Denki University, Japan Drexel University, USA Indian Institute of Technology Delhi, India Tel Aviv University, Israel University of Liege, Belgium University of Hradec Kralove, Czech Republic Shanghai Jiao Tong University, China Seoul National University, South Korea National University of Singapore, Singapore Misr University for Science and Technology, Egypt Universitat Politècnica de Catalunya, Spain Aalto University, Finland San Diego State University, USA Technion, Israel Institute of Technology, Israel Universidade Federal de Santa Catarina, Brazil Politecnico di Torino, Italy Gdansk University of Technology, Poland University College Dublin, Ireland University of Strathclyde, UK Universidad San Pablo CEU, Spain Universidad Complutense de Madrid, Spain University of Patras, Greece British Columbia Institute of Technology, Canada CHU, France George Mason University, USA VŠB, Technical University of Ostrava, Czech Republic Moscow Regional Research and Clinical Institute MONIKI, Russian Federation University of Twente, The Netherlands Hannam University, South Korea Rice University, USA IIT Madras, India Mahanakorn University of Technology, Thailand FH Aachen, Germany University of Brescia, Italy Asan Medical Center, South Korea Organization Alcimar Barbosa Soares Filomena Soares Akihiro Takeuchi Gil Travish John Tudor Renato Varoto Pedro Vieira Bruno Wacogne Huikai Xie Sen Xu Hakan Yavuz IX Universidade Federal de Uberlândia, Brazil Algoritmi Research Centre, UM, Portugal Kitasato University School of Medicine, Japan Adaptix Ltd., UK University of Southampton, UK University of Campinas, Brazil Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal FEMTO-ST, France University of Florida, USA Merck & Co., Inc., USA Çukurova Üniversity, Turkey BIOIMAGING Program Committee Sameer K Antani Peter Balazs Grégory Barbillon Alpan Bek Mads Sylvest Bergholt Obara Boguslaw Alberto Bravin Tom Brown Enrico G Caiani Rita Casadio Alessia Cedola Heang-Ping Chan James Chan Dean Chapman Guanying Chen Jyh-Cheng Chen Christos E Constantinou Edite Maria Areias Figueiras Dimitrios Fotiadis Patricia Haro Gonzalez P Gopinath Dimitris Gorpas Alberto Del Guerra Tzung-Pei Hong Kazuyuki Hyodo Shu Jia Ming Jiang Xiaoyi Jiang National Library of Medicine, National Institutes of Health, USA University of Szeged, Hungary EPF-Ecole d’Ingénieurs, France Middle East Technical University, Turkey Imperial College London, UK University of Durham, UK European Synchrotron Radiation Facility, France University of St Andrews, UK Politecnico di Milano, Italy University of Bologna, Italy CNR, Institute of Nanotechnology, Italy University of Michigan, USA University of California Davis, USA University of Saskatchewan, Canada Harbin Institute of Technology and SUNY Buffalo, China/USA National Yang-Ming University, Taiwan Stanford University, USA National Physical Laboratory, Portugal University of Ioannina, Greece Universidad Autonoma Madrid, Spain Indian Institute of Technology Roorkee, India Technical University of Munich, Germany University of Pisa, Italy National University of Kaohsiung, Taiwan High Energy Accelerator Research Organization, Japan Stony Brook University, USA Peking University, China University of Münster, Germany X Organization Pluim Josien Patrice Koehl Adriaan A Lammertsma Sang-Won Lee Rainer Leitgeb Ivan Lima Xiongbiao Luo Modat Marc David McGloin Aidan D Meade Erik Meijering Israel Rocha Mendoza Kunal Mitra Christophoros Nikou Joanna Isabelle Olszewska Kalman Palagyi Joao Papa Tae Jung Park Gennaro Percannella Ales Prochazka Jia Qin Wan Qin Miroslav Radojevic Joseph Reinhardt Giovanna Rizzo Bart M ter Haar Romeny Emanuele Schiavi Jan Schier Leonid Shvartsman Chikayoshi Sumi Chi-Kuang Sun Pablo Taboada Xiaodong Tao Pécot Thierry Kenneth Tichauer Eric Tkaczyk Arkadiusz Tomczyk Eindhoven University of Technology, The Netherlands University of California, USA VU University Medical Center Amsterdam, The Netherlands Korea Research Institute of Standards and Science, South Korea Medical University Vienna, Austria North Dakota State University, USA XMU-TMMU, China University College London, UK University of Dundee, UK Centre for Radiation and Environmental Science, Dublin Institute of Technology, Ireland Erasmus University Medical Center, The Netherlands Centro de Investigación Científica y de Educación Superior de Ensenada, (CICESE), Mexico Florida Institute of Technology, USA University of Ioannina, Greece University of West Scotland, UK University of Szeged, Hungary UNESP, Universidade Estadual Paulista, Brazil Chung-Ang University, South Korea University of Salerno, Italy University of Chemistry and Technology, Czech Republic University of California/University of Washington, USA University of Washington, USA Erasmus MC, Biomedical Imaging Group Rotterdam, The Netherlands University of Iowa, USA Consiglio Nazionale delle Ricerche, Italy Eindhoven University of Technology (TU/e), The Netherlands Universidad Rey Juan Carlos, Spain The Institute of Information Theory and Automation of the Czech Academy of Sciences, Czech Republic Hebrew University, Israel Sophia University, Japan National Taiwan University, Taiwan University of Santiago de Compostela, Spain University of California, Santa Cruz, USA Medical University of South Carolina, France Illinois Institute of Technology, USA Vanderbilt University, USA Lodz University of Technology, Poland 386 A Kabir et al The Pearson correlation coefficient, R, is a measure of the linear dependence between X = {X1,…,Xn} and Z = {Z1,…,Zn} It gives a value between −1 and +1 where −1 stands for total negative correlation, for no correlation and +1 for total positive correlation It can be defined as follows [36]: ÁÀ Á PÀ Xi À X Zi À Z R ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Á PÀ Á2 PÀ Zi À Z Xi À X ð4Þ where X and Z are means of X and Z respectively When measuring the correlation coefficient for a prediction task, a higher value of the coefficient indicates more similarity between the actual and predicted outputs, and hence better prediction performance Mean absolute error (MAE) and root mean squared error (RMSE) are both widely used in prediction tasks to measure the amount of deviation of the predicted values from the actual values The two are defined in the following way: MAE ¼ n 1X jzi À ^zi j n i¼1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n 1X RMSE ¼ ðjzi À ^zi jị2 n iẳ1 5ị 6ị Where n is the number of predictions, z1 ; ; zn are the actual and ^z1 ; ; ^zn are the predicted values respectively [37] Since MAE and RMSE measure differences between the actual and predicted outputs, lower values of these measures indicate better prediction performance 3.3 Classification Algorithms and Parameters The best classification method found in this study is a technique where a classification is done based on the regression results obtained from bagged M5 model trees (as described in Sect 3.2) When the numeric prediction is obtained from this regression method, we round it to the nearest integer and assign the instance to the class corresponding to that integer For example, a regression output of 0.35 is assigned to class “0” and an output of 3.83 is assigned to class “4” We denote this approach here as classification via regression For comparison, we consider two more widely used classification algorithms: logistic regression [15] and C4.5 decision tree [38] The choice of logistic regression is motivated by the fact that it is the standard classification method used in clinical trial studies As for decision tree, it gives a good diagrammatic representation of the prediction process as well as proving to be empirically successful in classification tasks The machine learning tool Weka (version 3.8) [34] was used for our classification experiments as well To reduce overfitting, the C4.5 Decision trees were pruned by requiring at least 10 instances per leaf For every other parameter, the default values in Weka (version 3.8) were applied Regression, Classification and Ensemble Machine Learning Approaches 387 Evaluation Criteria The main evaluation criterion for the classification algorithms used in this study is accuracy – the percentage of cases where the actual and the predicted classes are the same But since there are six different classes with subtle variations between two adjacent mRS-90 scores, we may consider predictions that are close enough to the actual score to be fairly accurate as well We therefore define “near-accuracy” to refer to the percentage of cases in which the classifier either makes an accurate prediction, or makes a wrong prediction which is either one more or one less than the correct mRS score For example, if the correct class is 3, only a prediction of will be correct in terms of accuracy, but a prediction of 2, or will be acceptable in terms of near-accuracy Once again, 10-fold cross validation [35] (using the process described in Sect 3.2) was used to assess how well the models will generalize to an independent dataset 3.4 Ordinal Regression Our experiments on ordinal regression were run in R using the R package ordinal [39] We experimented with the five different link functions discussed in Sect 2.4 The evaluation criteria used for ordinal regression is different from both classification and regression This is because the ordinal scale leads to problems in defining an appropriate loss function [15] The evaluation criteria used for conventional regression would not be applicable here because of the categorical nature of the target attribute However, a simple hit-or-miss loss function does not reflect the ordering of the categories in the target attribute We therefore use the Akaike Information Criterion (AIC) which offers an estimate of the relative information lost when a given model is used to represent the process that generates the data [40] Smaller values of AIC indicate better fitted models The other criterion we use is log-likelihood – a measure of the probability that the observed data is generated from a certain model For this criterion, the higher values indicate better models Results and Discussion 4.1 Regression Models to Predict mRS-90 We performed supervised regression on the stroke data to predict the patient outcome after 90 days of stroke onset The predictive and target attributes of the dataset are described in Table We constructed models using M5 model tree and linear regression algorithms We then applied bootstrap aggregating (bagging) using M5 model trees and linear regression models as respective base predictors For comparison purposes, we constructed also the simple regression model whose prediction is always the average of the values of the dependent variable in the training set As described in Sect 3.2, we experimented with several parameters of the algorithms to enhance performance of the models In Table 4, we present the results of the best models achieved after experimentation with different parameter values The comparison is in terms of the correlation coefficient (R), mean absolute error (MAE) and root mean squared error (RMSE) 388 A Kabir et al Results show that bagging used in tandem with M5 model trees performs much better than all the other techniques Even without bagging, the M5 model trees perform better than linear regression The improvement in performance is most impressive when the mean absolute error is considered, but not so much when we consider the root mean squared error This leads us to an important point Large errors have a relatively greater influence when the errors are squared So, if the variance associated with the frequency distribution of the error magnitude increases, the difference between MAE and RMSE also increases [41] From our observation of the relative MAE and RMSE values of the M5 model tree, we can conclude that there is a high variance in the prediction error of the M5 model tree It therefore makes sense that a variance-reducing procedure like bagging should improve the performance of the model tree, as observed in Table Note however that bagging does not have the same kind of effect in improving the performance of linear regression To see if any statistically significant improvement in performance is achieved, we performed paired t-tests in terms of correlation coefficient on each pair of the five methods considered The difference between means for each pair are examined at a p-value of 0.05 The results of the tests are presented in Table It shows that the performances of linear regression and M5 model trees (without bagging) are not statistically different from each other When bagging is used with linear regression, it is unable to improve the performance significantly However, when bagging is used with M5 model trees, the resulting regression model performs significantly better than the models of all the other methods Table Comparison of different regression methods on stroke data in terms of R, MAE and RMSE For R, higher values indicate better model fit, whereas for the MAE and RMSE metrics lower values are better Table reused from [4] Method R Average prediction −0.136 Linear regression 0.779 M5 model tree 0.785 Bagging with linear regression 0.783 Bagging with M5 model trees 0.822 MAE 1.235 0.654 0.577 0.649 0.537 RMSE 1.461 0.916 0.905 0.908 0.832 Analysis of the Linear Regression Model The linear regression model returns a linear equation delineating the amount of influence each predictive attribute has on the final outcome Figure shows the model obtained through linear regression The attributes with a positive coefficient contribute to an increase in the mRS-90 score (worse outcome) The magnitude of the coefficient points to the degree of contribution Note that the values of the continuous attributes age and NIHSS were scaled to a range between and before running linear regression This allows comparison of their coefficients with those of all the other attributes From the linear regression model, we observe that older age, higher NIHSS at admission (more severe initial stroke) and higher mRS at discharge all have large positive coefficients These implies that they are all predictive of a poor outcome Regression, Classification and Ensemble Machine Learning Approaches 389 Table Results of statistical significance analysis on correlation coefficient with p-value of 0.05 Each cell represents the result of the paired t-test between a pair of algorithms If the algorithm in the row is significantly better than the one in the column, a ‘)’ is shown If it is significantly worse, a ‘(’ is shown A ‘’ indicates that there is no statistically significant difference Table reused from [4] Average prediction Linear regression M5 model tree Bagging with linear regression Bagging with M5 model trees Average pred ) ) ) Linear regression ( M5 tree ( Bagging lin reg ( - Bagging M5 trees ( ( ( ( ) ) ) ) - Alcohol also has noticeable contribution towards a poorer outcome All the stroke subtypes have a negative coefficient, so it is difficult to draw conclusions from these coefficients Among the other attributes that have a negative coefficient, Perfusion is found to have fairly high contribution towards a better outcome Fig Linear regression model to predict 90-days outcome of stroke from patients’ data Analysis of the M5 Model Tree We investigate the model returned by the M5 model tree algorithm to find insights about stroke outcome Figure shows the model tree where each leaf is a linear equation The linear equations are shown alongside the tree The sign and magnitude of coefficients of each predictive attribute in the equations give an indication of how the output attribute responds to changes in the given input attribute The continuous variables age and NIHSS at admission are scaled to the range between and 1, so that the magnitudes of these attributes are within the [0,1] range 390 A Kabir et al From the model tree of Fig 5, it is clear that the tree divides the input space based on the mRS score at discharge, and builds linear models for different ranges of that score By following the decision branches of the tree, we can see that the linear models LM and LM correspond to mRS discharge scores of and respectively LM is associated with mRS discharge scores of and 3, and LM with scores of and Let us now take a closer look at each of the linear models In LM 1, the y-intercept is a very small value and there is no other attribute that has a large enough coefficient to change the prediction substantially This means that the mRS-90 prediction for almost all patients reaching this point of the tree will be close to At LM 2, the mRS-disch value is with a coefficient of 0.1265 Since the y-intercept is 0.3596, if all the other attributes are absent, the output is 0.4861 Let us call this the baseline prediction for this leaf Other attributes will either increase or decrease this baseline prediction based on their coefficients Older age, higher NIHSS at admission and antihypertensives contribute towards increasing the mRS-90 score On the other hand, cardioembolic and cryptogenic strokes contribute significantly towards lowering the mRS-90 score At LM 3, the mRS score can be either or In the former case, the baseline prediction is 2*0.0951 − 0.3414 = −0.1512 and in the latter case, it is 3*0.0951 − 0.3414 = −0.0561 However, there are several attributes in this model that may have a major impact on the final prediction, notably age, NIHSS at admission, diabetes, large vessel stroke subtype and mRS before admission Higher values for some or all of the above attributes will result in increased mRS-90 score For LM 4, the baseline prediction is either 2.6762 (for mRS discharge = 4) or 4.1181 (for mRS discharge = 5) If a patient reaches this leaf, the output mRS-90 prediction is likely to be quite high Only neurointervention has a major effect of lowering the mRS-90 score Fig The M5 model tree built on the stroke dataset with minimum 10 instances in each leaf Each leaf is a linear model (LM1–LM4) predicting the target attribute mRS-90 The numbers under the leaves indicate how many instances are covered under that particular linear model Each of the linear models are shown in detail alongside the tree Regression, Classification and Ensemble Machine Learning Approaches 391 Fig Model trees obtained from bootstraps 1–6 when using bagging with M5 algorithm on the stroke data Bootstrap numbers are shown alongside the tree for reference 392 A Kabir et al 10 Fig Model trees obtained from bootstraps 7–10 when using bagging with M5 algorithm on the stroke data Bootstrap numbers are shown alongside the tree for reference Analysis of the Bagged M5 Model Trees In bootstrap aggregating, a number of bootstraps are created by sampling from the training data Each bootstrap is used to build a model, which in our case is a M5 model tree Since we used 10 bags, we obtained 10 model trees as shown in Figs and Because of the huge number of linear models created in the trees, it is not possible to delineate all of them So, we limit our discussion on the observations made from the structure of the trees and the internal nodes that were created The first glaring observation is the confirmation that mRS at discharge is the most influential attribute in the trees In all 10 trees, it is the root of the tree, and in all but two trees it also acts as at least one of the children of the root In several trees, the first linear model is formed by making a branch corresponding to mRS at discharge of These linear models always evaluate to a very low mRS-90 score Trees 7, and are almost identical to the single M5 tree we found using M5 model trees without bagging Apart from mRS discharge, the other important attributes that occurred several times in the trees are: Age, diabetes, antiplatelet, NIHSS at admission and Neurointervention Regression, Classification and Ensemble Machine Learning Approaches 4.2 393 Classification Models to Predict mRS-90 We now consider the mRS-90 attribute as discrete multinomial (i.e., consisting of individual classes 0, 1, …, 5) instead of a continuous numeric attribute, and construct classification models to predict this discrete attribute We used a classification via regression approach where the regression outputs from the bagged M5 model trees were discretized and treated as classification outputs For comparison, we also applied two traditional multinomial classification techniques: C4.5 Decision tree and Logistic regression We chose these two because the classification models induced by both of these algorithms are very expressive in nature Moreover, Logistic regression is extensively used for multivariate analysis in medical literature As the evaluation metrics, we chose accuracy and near-accuracy as described in Sect 3.3 Table shows a comparison of the performance of classification via regression with those of multi-class classification using Logistic regression and C4.5 decision trees For comparison purposes, we include also the majority class classifier which classifies any test instance with the mRS-90 value that appears most frequently in the training set We experimented with different pruning parameters of the C4.5 decision trees, and show here the accuracies of the model that has the best generalization performance Table shows that the classification via regression method performs better in terms of both accuracy and near-accuracy To check whether this improvement is statistically significant, we performed paired t-tests on the classification accuracy for the four algorithms The results, given in Table 7, show that classification via regression performs significantly better than logistic regression, but not significantly better than the C4.5 decision tree at a level of p = 0.05 The fact that we obtained significantly better results than logistic regression is noteworthy, since logistic regression is the default analysis method used in medical literature Table Comparison of logistic regression, C4.5 and classification via regression (bagging with M5 model trees) on the stroke dataset in terms of accuracy and near-accuracy Table reused from [4] Method Majority class Logistic Regression C4.5 (with pruning) Classification via regression Accuracy 46.9% 54.2% 56.7% 59.7% Near-accuracy 64.4% 83.6% 86.8% 90.0% In Table 8, we show the confusion matrix obtained from the best-performing classification model, i.e., the classification via bagged model tree regression The diagonal of the matrix represents the correct predictions One observation from the misclassifications is that there is a central tendency of prediction For mRS-90 scores 0–2, the misclassifications tend to be on the higher side That means, more predicted scores are overestimates rather than underestimates of the actual score The opposite is true for mRS-90 scores of 3–5 where the predictions tend to err more by underestimating than overestimating 394 A Kabir et al Table Results of statistical significance analysis on classification accuracy with p-value of 0.05 Each cell represents the result of the paired t-test between a pair of algorithms If the algorithm in the row is significantly better than the one in the column, a ‘)’ is shown If it is significantly worse, a ‘(’ is shown A ‘’ indicates that there is no statistically significant difference Table reused from [4] Majority class ) ) ) Majority class Logistic regression C4.5 tree Classification via regression Logistic regression ( ) C4.5 tree ( Classification via regression ( ( - Analysis of the C4.5 Decision Tree For classification of mRS-90 values, we created a decision tree using the C4.5 algorithm In order to avoid overfitting, we pruned the trees by imposing restrictions of having a minimum of 10 instances per leaf The decision tree we obtained is shown in Fig The structure of the classification tree is similar to that of the regression trees we discussed before The difference is that the leaf nodes of the classification tree are mRS-90 scores (0,…,5) rather than linear models to calculate the scores Like the trees in our regression models, this tree also has mRS at discharge as the primary factor Patients with mRS at discharge = are predicted to have mRS-90 = The same is true for patients with mRS at discharge = unless they have a non-cryptogenic stroke and use antihypertensives, in which case the mRS-90 prediction is Patients having mRS at discharge of or usually are predicted to end Table Confusion matrix for the method of supervised classification via regression using bagging with M5 model trees The rows show the actual mRS scores while the columns show the ones predicted by the model The diagonals (in darker gray) are the correct predictions The cells adjacent to the diagonals (in lighter gray) are near-correct predictions missing the actual score by Table adapted from [4] Actual Predicted 159 36 11 0 10 40 19 0 2 15 31 14 19 21 10 Regression, Classification and Ensemble Machine Learning Approaches 395 up with mRS-90 of There are, however, exceptions in two cases: If the patients have low (0 or 1) mRS before admission, has a history of alcohol consumption and is younger than 87, they are predicted to better and have mRS-90 of 1; On the other hand, if mRS before admission is higher than and NIHSS at admission is higher than 5, they are predicted to worse and have mRS-90 of Patients having mRS at discharge of have predicted mRS-90 of or Patients having mRS at discharge of are predicted to have mRS-90 of or Fig C4.5 Decision tree constructed from the stroke data by treating the mRS-90 scores as multinomial categories Analysis of the Logistic Regression Model Among the six categories of mRS-90, we chose the last category (mRS-90 = 5) as the reference category for multinomial logistic regression Table shows the coefficients that were obtained for each attribute/category combination When analyzing this model, we need to consider the fact that all the coefficients are with respect to the reference category So positive coefficients imply higher probability of being classified in that category than the reference category A negative coefficient would imply the opposite As a concrete example, diabetes has negative coefficients for all the categories, so a diabetic patient is less likely to have mRS scores 0–4 than mRS score of We inspected the sign and magnitude of each attribute, and compared their coefficients across categories The following attributes were found to have very high negative coefficients in all categories: age, NIHSS score at admission, and hypertension In addition, several other attributes such as gender, diabetes and alcohol have negative coefficient across all categories This means that all these attributes have a lower probability of being in the lower mRS scores (lower probability of better outcome) As for the different treatment methods, only perfusion has positive coefficients in all categories Antidiabetics and antiplatelets also have positive coefficients for mRS scores of 0–3 This indicates that these methods of treatment improve the probability of going to a lower mRS score (better outcome) 396 4.3 A Kabir et al Ordinal Regression to Predict mRS-90 We created ordinal regression models with different link functions, and recorded the Akaike Information Criterion (AIC) and log-likelihood for model comparison Table 10 summarizes the results obtained Since ordinal regression with the logit link function yields best result (lowest AIC and highest log-likelihood), we examine the model in more detail Table 11 shows the coefficients and p-values obtained for that model In ordinal regression with the logit link function, the logit (or log of odds) of the final outcome is a linear combination of the independent attributes The positive coefficients contribute to worse outcome Two attributes – age and mRS at discharge – have high positive coefficients and are the only ones that have statistically significant Table Logistic regression coefficients for each category of the mRS-90 attribute for the stroke data Here mRS-90 = is used as the reference category Attribute Coefficients for each category Subtype = Large vessel −0.886 1.652 −1.080 −13.28 Subtype = Small vessel −0.858 0.162 −2.960 −14.45 Subtype = Cardioembolic 2.725 4.685 0.987 −12.24 Subtype = Cryptogenic −1.310 0.798 −1.758 −14.46 Gender (female) −3.200 −3.233 −3.076 −4.001 Age −19.60 −18.64 −16.43 −18.81 NIHSS score at admission −9.696 −9.376 −8.096 −6.668 Hypertension −29.73 −30.99 −31.28 −31.05 Hyperlipidemia 2.869 2.598 2.882 2.735 Diabetes −4.155 −4.010 −2.869 −3.472 Smoking 4.538 4.680 4.029 4.353 Alcohol problem −4.463 −5.363 −3.866 −4.179 Previous history of stroke −2.316 −2.018 −1.045 −1.331 Atrial fibrillation 0.810 1.133 2.010 3.047 Carotid artery disease −0.038 0.889 0.487 1.468 Congestive heart failure −0.162 −0.007 −0.656 −1.405 Peripheral artery disease −1.909 −1.444 −0.274 −1.458 Hemorrhagic conversion −0.327 −1.130 −1.952 −1.029 tPA −0.641 −0.860 −0.565 −1.374 Statins −2.597 −2.323 −2.715 −2.057 Antihypertensives −1.595 −0.875 −1.052 −0.222 Antidiabetics 0.030 0.658 0.478 1.194 Antiplatelets 1.397 0.515 0.182 0.129 Anticoagulants −1.725 −1.295 −1.897 −1.797 Perfusion 4.584 3.587 2.132 3.163 Neurointervention −5.795 −6.002 −5.304 −6.513 mRS before admission −1.994 −1.222 −1.274 −0.636 mRS at discharge −3.754 −1.954 −1.133 −0.561 −2.426 −4.619 −1.222 −4.119 −3.098 −13.25 −8.224 −30.68 2.670 −3.672 2.530 −1.675 −0.469 2.038 2.168 −1.294 −2.282 −1.737 −1.289 −2.641 −1.340 −0.246 −0.244 −2.735 2.693 −5.680 −1.109 0.042 Regression, Classification and Ensemble Machine Learning Approaches 397 Table 10 Performance of ordinal regression on stroke data for different link functions For AIC, smaller values indicate better performance whereas for log-likelihood, larger values (lower magnitude of negative values) indicate better performance Link function Logit Probit Cloglog Cauchit AIC 651.96 654.17 663.71 685.53 Log-likelihood −292.98 −294.09 −298.85 −309.77 Table 11 Ordinal regression (using the logit link function) coefficients for each attribute of the stroke data The p-values are shown in the rightmost column, and the attributes with p < 0.05 are marked with asterisks Attribute Subtype = Large vessel Subtype = Small vessel Subtype = Cardioembolic Subtype = Cryptogenic Gender Age NIHSS score at admission Hypertension Hyperlipidemia Diabetes Smoking Alcohol problem Previous history of stroke Atrial fibrillation Carotid artery disease Congestive heart failure Peripheral artery disease Hemorrhagic conversion tPA Statins Antihypertensives Antidiabetics Antiplatelets Anticoagulants Perfusion Neurointervention mRS before admission mRS at discharge Coefficient p-value 0.438 0.461 −0.259 0.738 −0.062 0.918 0.356 0.550 0.343 0.235 4.164 0.001* 0.280 0.723 −0.545 0.201 0.118 0.730 0.426 0.289 −0.396 0.199 0.407 0.297 0.610 0.073 0.242 0.526 0.378 0.319 −0.037 0.939 0.553 0.253 −0.123 0.739 0.101 0.747 −0.096 0.776 0.388 0.289 −0.096 0.832 −0.302 0.164 0.347 0.406 −0.767 0.096 −0.552 0.153 0.269 0.089 2.002

Ngày đăng: 02/03/2019, 10:56

Nguồn tham khảo

Tài liệu tham khảo

Loại

Chi tiết

26. Schumann, A., Akimova, L.: Syllogistic system for the propagation of parasites.The Case of Schistosomatidae (Trematoda: Digenea). Stud. Log. Gramm. Rhetor.40(1), 303–319 (2015)

Sách, tạp chí

Tiêu đề:	Schistosomatidae

3. Beni, G., Wang, J.: Swarm intelligence in cellular robotic systems. In: Dario, P., Sandini, G., Aebischer, P. (eds.) Robots and Biological Systems: Towards a New Bionics. NATO ASI Series, pp. 703–712. Springer, Heidelberg (1993). https://doi.org/10.1007/978-3-642-58069-7 38

Link

13. Jones, J.D.: Towards lateral inhibition and collective perception in unorganised non-neural systems. In: Pancerz, K., Zaitseva, E. (eds.) Computational Intelligence, Medicine and Biology. SCI, vol. 600, pp. 103–122. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16844-9 6

Link

1. Abrahams, M., Colgan, P.: Risk of predation, hydrodynamic eﬃciency, and their inﬂuence on school structure. Environ. Biol. Fishes 13(3), 195–202 (1985) 2. Adamatzky, A., Erokhin, V., Grube, M., Schubert, T., Schumann, A.: Physarumchip project: growing computers from slime mould. Int. J. Unconv. Comput. 8(4), 319–323 (2012)

Khác

4. Blumer, H.: Symbolic Interactionism; Perspective and Method. Prentice-Hall, Englewood Cliﬀs (1969)

Khác

5. Bull, R.A., Segerberg, K.: Basic modal logic. In: The Handbook of Philosophical Logic, vol. 2, pp. 1–88. Kluwer (1984)

Khác

6. Costerton, J.W., Lewandowski, Z., Caldwell, D.E., Korber, D.R., et al.: Microbial bioﬁlms. Annu. Rev. Microbiol. 49, 711–745 (1995)

Khác

7. Duﬀy, J.E.: The ecology and evolution of eusociality in sponge-dwelling shrimp.In: Kikuchi, T. (ed.) Genes, Behavior, and Evolution in Social Insects, pp. 1–38.University of Hokkaido Press, Sapporo (2002)

Khác

8. Helbing, D., Keltsch, J., Molnar, P.: Modelling the evolution of human trail systems. Nature 388, 47–50 (1997)

Khác

9. Helbing, D., Farkas, I., Vicsek, T.: Simulating dynamical features of escape panic.Nature 407(6803), 487–490 (2000)

Khác

10. Jacobs, D.S., et al.: The colony structure and dominance hierarchy of the Dama- raland mole-rat, Cryptomys damarensis (Rodentia: Bathyergidae) from Namibia.J. Zool. 224(4), 553–576 (1991)

Khác

11. Jarvis, J.: Eusociality in a mammal: cooperative breeding in naked mole-rat colonies. Science 212(4494), 571–573 (1981)

Khác

12. Jarvis, J.U.M., Bennett, N.C.: Eusociality has evolved independently in two genera of bathyergid mole-rats but occurs in no other subterranean mammal. Behav. Ecol.Sociobiol. 33(4), 253–360 (1993)

Khác

14. Kearns, D.B.: A ﬁeld guide to bacterial swarming motility. Nat. Rev. Microbiol.8(9), 634–644 (2010)

Khác

15. Kr¨ utzen, M., Mann, J., Heithaus, M.R., Connor, R.C., Bejder, L., Sherwin, W.B.:Cultural transmission of tool use in bottlenose dolphins. PNAS 102(25), 8939–8943 (2005)

Khác

16. Michener, C.D.: Comparative social behavior of bees. Annu. Rev. Entomol. 14, 299–342 (1969)

Khác

19. Parsons, T.: Social Systems and The Evolution of Action Theory. The Free Press, New York (1975)

Khác

20. Riesenhuber, M., Poggio, T.: Neural mechanisms of object recognition. Curr. Opin.Neurobiol. 12(2), 162–168 (2002)

Khác

21. Sakiyama, T., Gunji, Y.-P.: The M¨ uller-Lyer illusion in ant foraging. In: Hemmi, J.M. (ed.) PLoS ONE, vol. 8, no. 12, p. e81714 (2013)

Khác

22. Schadschneider, A., Klingsch, W., Klpfel, H., Kretz, T., Rogsch, C., Seyfried, A.: Evacuation dynamics: empirical results, modeling and applications. In: Mey- ers, R.A. (ed.) Encyclopedia of Complexity and Systems Science, pp. 3142–3176

Khác