1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo y học: "Time series analysis as input for clinical predictive " ppt

55 286 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 55
Dung lượng 3,58 MB

Nội dung

Theoretical Biology and Medical Modelling This Provisional PDF corresponds to the article as it appeared upon acceptance Fully formatted PDF and full text (HTML) versions will be made available soon Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU Theoretical Biology and Medical Modelling 2011, 8:40 doi:10.1186/1742-4682-8-40 Curtis E Kennedy (cekenned@texaschildrenshospital.org) James P Turley (James.P.Turley@uth.tmc.edu) ISSN Article type 1742-4682 Research Submission date 22 November 2010 Acceptance date 24 October 2011 Publication date 24 October 2011 Article URL http://www.tbiomed.com/content/8/1/40 This peer-reviewed article was published immediately upon acceptance It can be downloaded, printed and distributed freely for any purposes (see copyright notice below) Articles in TBioMed are listed in PubMed and archived at PubMed Central For information about publishing your research in TBioMed or any BioMed Central journal, go to http://www.tbiomed.com/authors/instructions/ For information about other BioMed Central publications go to http://www.biomedcentral.com/ © 2011 Kennedy and Turley ; licensee BioMed Central Ltd This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU Curtis E Kennedy1§, James P Turley2 Department of Pediatrics, Baylor College of Medicine, 6621 Fannin, WT 6-006, Houston, TX 77030, USA The University of Texas School of Biomedical Informatics, 7000 Fannin, Suite 600, Houston, TX 77030, USA § Corresponding Author Email addresses: CEK: cekenned@texaschildrenshospital.org JPT: James.P.Turley@uth.tmc.edu Abstract Background: Thousands of children experience cardiac arrest events every year in pediatric intensive care units Most of these children die Cardiac arrest prediction tools are used as part of medical emergency team evaluations to identify patients in standard hospital beds that are at high risk for cardiac arrest There are no models to predict cardiac arrest in pediatric intensive care units though, where the risk of an arrest is 10 times higher than for standard hospital beds Current tools are based on a multivariable approach that does not characterize deterioration, which often precedes cardiac arrests Characterizing deterioration requires a time series approach The purpose of this study is to propose a method that will allow for time series data to be used in clinical prediction models Successful implementation of these methods has the potential to bring arrest prediction to the pediatric intensive care environment, possibly allowing for interventions that can save lives and prevent disabilities Methods: We reviewed prediction models from nonclinical domains that employ time series data, and identified the steps that are necessary for building predictive models using time series clinical data We illustrate the method by applying it to the specific case of building a predictive model for cardiac arrest in a pediatric intensive care unit Results: Time course analysis studies from genomic analysis provided a modeling template that was compatible with the steps required to develop a model from clinical time series data The steps include: 1) selecting candidate variables; 2) specifying measurement parameters; 3) defining data format; 4) defining time window duration and resolution; 5) calculating latent variables for candidate variables not directly measured; 6) calculating time series features as latent variables; 7) creating data subsets to measure model performance effects attributable to various classes of candidate variables; 8) reducing the number of candidate features; 9) training models for various data subsets; and 10) measuring model performance characteristics in unseen data to estimate their external validity Conclusions: We have proposed a ten step process that results in data sets that contain time series features and are suitable for predictive modeling by a number of methods We illustrated the process through an example of cardiac arrest prediction in a pediatric intensive care setting Background Roughly 1-6% of children being cared for in an ICU will experience a cardiac arrest while in the ICU.(1,2) Many of these arrests occur because their vital signs deteriorate to the point where they enter a state of progressive shock.(3-5) These arrests happen despite the fact that they are being continuously monitored by ECG, pulse oximetery, and frequent blood pressure measurements While there are tools that help identify patients in a non-intensive care setting that are at risk of arrest or have deteriorated to the point where they need to transfer to an ICU(621), there are no equivalent tools to identify which patients are likely to arrest in an intensive care setting That is not to say that ICU environments are devoid of useful tools that provide decision support: systems such as VISICU and BioSign/Visensia (22-26) provide an added level of safety to the ICU environment by enabling remote monitoring and providing automated rule checking and high-specificity alerting for deteriorations that occur across multiple channels While these tools are excellent for detecting deteriorations, they still are largely lacking in their ability to predict specific adverse outcomes The goal of this study is to develop a framework for building prediction models that use time series data and can serve as the foundation for tools that can evaluate for specific consequences of a deterioration, with the ultimate goal of augmenting existing systems with the ability answer questions like “Who is most likely to arrest?” in an ICU environment There are over 13,000 tools available to help clinicians interpret the data they are presented(27) Almost all of these tools have been designed so that they can be manually used A tool’s adoption typically depends on a balance between how easy it is to use and what the information content of the tool is(28,29), so tools that are built to be manually used are constrained to a relatively small number of variables in order to achieve adequate simplicity As a result, input variables have typically been restricted to a multivariable data paradigm where each variable is represented by a single value A consequence of this strategy is that useful trend information cannot be incorporated into a model unless it is explicitly encoded as a variable Of course doing this would add complexity to the task, so it is therefore rarely done As healthcare is transitioning from manual processes to electronic ones, it is becoming increasingly easy to automate the processes of data collection and analysis In an automated system, there is no longer a need to remain constrained to a multivariable data paradigm in order to achieve simplicity at the user level Clinical studies using time series analysis has been undertaken in a number of settings(30-41), but thus far has been relatively limited in scope, tending to focus on interpretation of a single analytic method rather than incorporating multiple analytic methods into a more robust modeling paradigm The purpose of this article is to describe a method for developing clinical prediction models based on time-series data elements The model development process that we are presenting is novel to clinical medicine, but the individual steps comprising the process are not Our intention is to provide not only the description of the method, but the theoretical basis of the steps involved We are demonstrating the application of this process in an example of cardiac arrest prediction in a pediatric intensive care unit It is our hope that we describe the steps of the process and their theoretical basis clearly enough that the methods can be extended to other domains where predictions based on time-series data is needed Introduction In order to ensure that the concepts in this article can be understood by clinician and nonclinician alike, we will provide four brief overviews of the core concepts that form the foundation of this article First, we will describe how the growth of data has impacted medicine and some of the strategies that have evolved to manage this growth Second, we will review a few relevant concepts that relate to statistical analysis and modeling, with special focus on multivariable versus time series data paradigms Third, we will specifically discuss clinical prediction models: their utilities, their limitations, and considerations for improvements Finally, we will review the rationale behind selecting cardiac arrest in a pediatric ICU (PICU) as the example to illustrate the process, and we will provide a brief overview of the physiologic principles that serve as the theoretical basis for our prediction model Data in Medicine: Medical care has existed since long before diseases were understood at a scientific level Early medical care was characterized more by art and religion than by science as we know it today(42) The transition of medicine from an art to a science is based on the accumulation of data, and the information, knowledge, and wisdom that has been derived from it While this transition has improved outcomes, there is a side effect of the data: information overload(43-46) Currently, the amount of data in the medical field is so extensive and is growing so fast that is impossible for any single person to utilize it all effectively In order to utilize data, it must be interpreted in context (transforming it into information) and evaluated by the user(47) This process requires substantial cognitive resources and is time consuming In an attempt to address this problem, at least two strategies have been employed: specialization and computerized support(43,48) Specialization allows clinicians to focus their efforts on a narrow field where they become expert in a relatively small group of related diseases In doing so, they reduce their educational burden to a point where they can “afford” the cost of training and staying current in their specialty In fact, some specialties have even reached the point where subspecialization is required in order to stay abreast of the latest trends(49) There is a fundamental limitation to specialization as a means to cope with excessive amounts of data or information: a more robust solution to the problem is needed Ideal properties of the new solution should include: scalability(50) (it can continue to grow indefinitely), flexibility(50,51) (it can be used for a number of purposes), explicit and accurate(51) (it relies on objective parameters), and automaticity(52) (it functions independent of frequent supervision) Computer technology possesses these characteristics, and the field of informatics has been born out of effort to utilize computer based solutions to automate the transformation of data to information in the healthcare setting(53,54) These solutions come in many forms, ranging from aggregating knowledge available on a given disease to informing clinicians when tests or treatments violate parameters deemed to be unsafe(55,56) One of the fundamental goals of this article is to describe a method that can be automated as a computer based solution to help inform clinicians of a patient’s risk of cardiac arrest using trend information that would otherwise require manual interpretation Since clinicians cannot continuously check the risk of cardiac arrest for all patients they are caring for, we are attempting to leverage information from data that would otherwise be left unanalyzed in the current “intermittent check” paradigm Statistical Analysis and Modeling: Of course, medicine is not the only field where data has become so abundant that it is impossible to understand it all Compared to fields such as physics and astronomy, medicine is in a relative state of adolescence When presented with an abundance of data, the first priority is to understand what the data represent This process of gaining an understanding is based on statistical analysis(57-59) Depending on the information needs, data can be analyzed in a number of ways to provide a range of understandings For instance, a univariable analysis(60) of “heart rate” provides an understanding of what the most common heart rate is, the range of heart rates, and how the range is distributed A multivariable analysis(61) that includes “heart rate” as a variable can provide an understanding of how heart rate relates to temperature or blood pressure A time series analysis(62) of “heart rate” can provide an understanding of how the heart rate changes different times of the day The statistical methods for analyzing the data differ fundamentally for time series data since a single variable is represented by multiple values that vary depending on the time they represent Univariable and multivariable statistics, on the other hand, rely on a single value per variable for each case Also, time series data elements are assumed to correlate to adjacent data elements(62), whereas this type of correlation can interfere with univariable and multivariable analysis(63,64) Whereas univariable and multivariable data analysis informs the user of the distribution of a variable across a population and how the variable relates to other variables, time series analysis informs the user of how a variable relates to itself In particular, time series analysis provides two types of information about a variable of interest: trends and seasonality(65) The distinction between the two is that univariable and multivariable analyses aim to describe the static properties of a variable, whereas the aim of a time series analysis is to describe its dynamic properties over time Knowing an airplane is 10 feet off the ground with the nose angled up and is at full throttle are static variables that would suggest a plane is taking off However, knowing that over the last five seconds the elevation was 150 feet off the ground, then 140, then 120, then 90, and then finally 60 feet off the ground changes the interpretation of the multivariable data to suggest that the plane is about to crash The addition of the trend features for the altitude changes the interpretation of the static data about height, pitch and thrust significantly Statistical analysis provides a systematic and standardized process of characterizing data so that it can be understood in the context that it is being analyzed Modeling endeavors also require a systematic approach, but the range of options is more varied than in statistical analysis(66,67) since the products of analyses are often used as “building blocks” for a model It is not uncommon for models to draw on elements from more than one type of analysis in making a prediction One example of this hybrid technique is the time-course approach to microarray analysis(68,69) As an example of this approach, the expression levels of twenty different genes are measured to determine their activity in two classes of cancer If it were to stop here, this would be a basic multivariable model However, the expression levels of these same twenty genes are measured repeatedly under different conditions and at different points in time Under the standard multivariable model that used baseline expression levels of the twenty genes, it is impossible to tell which genes determine cancer class However, by adding the behavior over time in the different nutrient environments, the different classes of cancer can be distinguished from one another This is a well established technique for genomic modeling The technique is based on a paradigm that utilizes time series data elements in a multivariable data format In multivariable statistical analysis, a high degree of correlation between independent variables (known as multicollinearity – an inherent feature of time series data) can invalidate the results of the analysis by invalidating the calculations relating to the analysis of the independent variables as unique components(63,64) However, when modeling is focused on the relationship between the dependent variable and the aggregate of all independent variables (without trying to measure the effects of the independent variables themselves), this multicollinearity is permissible(70) Clinical Prediction Models: For centuries, models have been used to demonstrate our knowledge about the world in which we live They help us share our understandings about the observations we make, and they help us anticipate what is to come In medicine, scoring tools are a class of models that combine multiple data elements, weight them according to their correlation with the outcome of interest, and output a score that can be used in a number of ways Individual scores can be used to make predictions that can help guide treatment decisions and communications with patients and families As an example, medical emergency teams use scoring tools to identify high risk patients that merit transfer to a higher acuity unit(6-9,13-21) Grouping scores allows standardized comparisons between two or more entities by providing a risk-based adjustment to the outcome of interest(10-12,71,72) Almost all clinical models are built on multivariable regression or a regression-like approach that evaluates a number of candidate input features (variables) and measures their individual correlation with the outcome of interest The strength of the correlation is used to assign points for each of the included variables, with more points being assigned for highly correlated variables and for greater deviation from the variable’s normal value Finally, points attributable to each feature are summed together to provide the composite score that provides an estimate of the net effect of all the features combined To illustrate, the Pediatric Risk of Mortality (PRISM) score(11,12) assigns a child who has a heart rate of >150 beats per minute (bpm) points for the abnormal heart rate Heart rate is not the strongest predictor of death though – plenty of children admitted to the PICU have heart rates >150 bpm during the first 24 hours and survive However, if the child’s pupils are fixed and dilated (evidence of severe brain dysfunction), they get 10 remote monitoring of intensive care patients with mortality, complications, and length of stay JAMA 2009;302:2671-2678 27 Iyengar MS, Svirbely JR: The medical algorithms project arXiv:0908.0932 2009 Available online at: http://arxiv.org/abs/0908.0932 Accessed June 15, 2010 28 Adams DA, Nelson RR, Todd PA: Perceived usefulness, ease of use, and usage of information technology: A replication MIS quarterly 1992;16:227-247 29 Takata MN, Benumof JL, Mazzei WJ: The preoperative evaluation form: Assessment of quality from one hundred thirty-eight institutions and recommendations for a highquality form J Clin Anesth 2001;13:345-352 30 Buchman TG, Stein PK, Goldstein B Heart rate variability in critical illness and critical care Curr Opin Crit Care 2002;8:311-315 31 Chen W-L, Tsai T-H, Huang C-C, Chen J-H, Kuo C-D Heart rate variability predicts short-term outcome for successfully resuscitated patients with out-of-hospital cardiac arrest Resuscitation 2009;80:1114-1118 32 Papaioannou VE, Maglaveras N, Houvarda I, Antoniadou E, Vretzakis G Investigation of altered heart rate variability, nonlinear properties of heart rate signals, and organ dysfunction longitudinally over time in intensive care unit patients J Crit Care 2006;21:95-103; discussion 103-4 33 Goldstein B Longitudinal changes in heart rate variability: laying the groundwork for the next generation in clinical monitoring J Crit Care 2006;21:103-104 34 Goldstein B, Fiser DH, Kelly MM, Mickelsen D, Ruttimann U, Pollack MM Decomplexification in critical illness and injury: relationship between heart rate variability, severity of illness, and outcome Crit Care Med 1998;26:352-357 40 35 Tibby SM, Frndova H, Durward A, Cox PN Novel method to quantify loss of heart rate variability in pediatric multiple organ failure Crit Care Med 2003;31: 2059-2067 36 Heintz E, Brodtkorb TH, Nelson N, Levin LA: The long-term cost-effectiveness of fetal monitoring during labour: A comparison of cardiotocography complemented with ST analysis versus cardiotocography alone BJOG 2008;115:1676-1687 37 Osorio I, Frei MG, Wilkinson SB: Real-time automated detection and quantitative analysis of seizures and short-term prediction of clinical onset Epilepsia 1998;39:615627 38 Bigger Jr JT, Fleiss JL, Steinman RC, Rolnitzky LM, Kleiger RE, Rottman JN: Frequency domain measures of heart period variability and mortality after myocardial infarction Circulation 1992;85:164-171 39 Hayano J, Sakakibara Y, Yamada M, Ohte N, Fujinami T, Yokoyama K, Watanabe Y, Takata K: Decreased magnitude of heart rate spectral components in coronary artery disease Its relation to angiographic severity Circulation 1990;81:1217-1224 40 Woo MA, Stevenson WG, Moser DK, Trelease RB, Harper RM: Patterns of beat-to-beat heart rate variability in advanced heart failure Am Heart J 1992;123:704-710 41 Stein PK, Barzilay JI, Chaves PH, Mistretta SQ, Domitrovich PP, Gottdiener JS, Rich MW, Kleiger RE: Novel measures of heart rate variability predict cardiovascular mortality in older adults independent of traditional cardiovascular risk factors: The cardiovascular health study (CHS) J Cardiovasc Electrophysiol 2008;19:1169-1174 42 Cruse J: History of medicine: the metamorphosis of scientific medicine in the everpresent past Am J Med Sci 1999; 318:171-180 41 43 de Meis L, Leta J: Modern science and the explosion of new knowledge Biophys Chem 1997;68:243-253 44 Imhoff M, Webb A, Goldschmidt A: Health informatics Intensive Care Med 2001;27:179186 45 Rebitzer JB, Rege M, Shepard C: Influence, information overload, and information technology in health care Adv Health Econ Health Serv Res 2008;19:43-69 46 Hall A, Walton G: Information overload within the health care system: A literature review Health Information & Libraries Journal 2004;21:102-108 47 Bernstam EV, Smith JW, Johnson TR: What is biomedical informatics? J Biomed Inform 2010;43:104-110 48 Clayton PD, Hripcsak G: Decision support in healthcare Int J Biomed Comput 1995;39: 59-66 49 Ebrahim S: Demographic shifts and medical training BMJ 1999; 319:1358-1360 50 Chau PYK, Tam KY: Factors affecting the adoption of open systems: An exploratory study MIS Quarterly 1997;21:1-24 51 Power DJ, Sharda R: Decision support systems In: Springer Handbook of Automation Edited by Nof, SY New York, NY, Springer Publishing Company, Inc 2009 52 March ST, Hevner AR: Integrated decision support systems: A data warehousing perspective Decis Support Syst 2007;43:1031-1043 53 Patel VL, Kaufman DR: Medical informatics and the science of cognition JAMIA 1998;5:493-502 54 Cesnik B: History of health informatics In: Health Informatics: An Overview, Edited by Hovenga E, Kidd M, and Cesnik B, Melbourne, Australia, Churchill Livingstone 1996 42 55 Sujansky W: Heterogeneous database integration in biomedicine J Biomed Inform 2001;34:285-298 56 Gardner RM: Computerized clinical decision-support in respiratory care Respir Care 2004;49:378-86 57 Ott L, Longnecker M: An Introduction to Statistical Methods and Data Analysis (Fifth edition) Belmont, CA, Cengage Learning, Inc 2006 58 Glantz, SA: Primer of biostatistics (Fourth edition) New York, NY, McGraw-Hill Inc.1997 59 Norusis MJ: SPSS 10.0 guide to data analysis Upper Saddle River, NJ, Prentice-Hall, Inc 2000 60 Cook A, Netuveli G, Sheikh A: Basic Skills in Statistics: A Guide for Healthcare Professionals London, GB, Class Publishing 2004 61 Harris R: A Primer of Multivariate Statistics (Third Edition) Mahwah, NJ, Lawrence Erlbaum Associates, Inc 2001 62 Hamilton, JD: Time Series Analysis Princeton, NJ, Princeton University Press 1994 63 Tabachnick BG, Fidell LS, Osterlind SJ: Using multivariate statistics Boston, MA, Allyn and Bacon 2001 64 Pedhazur EJ, Schmelkin LP Measurement, Design, and Analysis: An Integrated Approach Hillsdale, NJ, Lawrence Erlbaum Associates Inc., 1991 65 Hill T, Lewicki P: STATISTICS Methods and Applications Tulsa, OK, Statsoft 2007 66 Berry MJA, Linoff G: Data mining techniques: For marketing, sales, and customer relationship management (Second Edition) Indianapolis, IN, Wiley 2004 67 Dunham MH: Data mining: Introductory and advanced topics Upper Saddle River, NJ, Prentice-Hall, Inc 2002 43 68 Ebert BL, Golub TR: Genomic approaches to hematologic malignancies Blood 2004;104:923-932 69 Murphy D: Gene expression studies using microarrays: Principles, problems, and prospects Adv Physiol Educ 2002;26:256-270 70 Agresti A, Finlay B: Statistical methods for the social sciences (Fourth Edition) Upper Saddle River, NJ, Prentice-Hall, Inc 2009 71 Woodhouse D, Berg M, van der Putten J, Houtepen J: Will benchmarking ICUs improve outcome? Curr Opin Crit Care 2009;15:450-455 72 Duke G, Santamaria J, Shann F, Stow P: Outcome-based clinical indicators for intensive care medicine Anaesth Intensive Care 2005;33:303-310 73 Tsien CL: Event discovery in medical time-series data Proc AMIA Symp, 2000:858-862 74 Stacey M, McGregor C: Temporal abstraction in intelligent clinical data analysis: A survey Artif Intell Med 2007;39:1-24 75 Gan X, Liew AWC, Yan H: Microarray missing data imputation based on a set theoretic framework and biological knowledge Nucleic Acids Res 2006;34:1608-1619 76 Duan Q, Ajami NK, Gao X, Sorooshian S: Multi-model ensemble hydrologic prediction using bayesian model averaging Adv Water Resour 2007;30:1371-1386 77 Kuttner KN: Estimating potential output as a latent variable Journal of Business & Economic Statistics 1994;12:361-368 78 Jacobs I, Nadkarni V, Bahr J, Berg RA, Billi JE, Bossaert L, Cassan P, Coovadia A, D'Este K, Finn J, Halperin H, Handley A, Herlitz J, Hickey R, Idris A, Kloeck W, Larkin GL, Mancini ME, Mason P, Mears G, Monsieurs K, Montgomery W, Morley P, Nichol G, Nolan J, Okada K, Perlman J, Shuster M, Steen PA, Sterz F, Tibballs J, Timerman S, Truitt T, 44 Zideman D; International Liason Committee on Resusitation: Cardiac arrest and cardiopulmonary resuscitation outcome reports: update and simplification of the Utstein templates for resuscitation registries A statement for healthcare professionals from a task force of the international liaison committee on resuscitation (American Heart Association, European Resuscitation Council, Australian Resuscitation Council, New Zealand Resuscitation Council, Heart and Stroke Foundation of Canada, InterAmerican Heart Foundation, Resuscitation Council of Southern Africa) Resuscitation 2004;63:233-429 79 Mort TC: Unplanned tracheal extubation outside the operating room: A quality improvement audit of hemodynamic and tracheal airway complications associated with emergency tracheal reintubation Anesth Analg 1998;86:1171-1176 80 Fuhrman BP, Zimmerman JJ: Pediatric critical care (Second Edition) St Louis, MO, Mosby, Inc 1998 81 Harley A, Starmer CF, Greenfield Jr JC: Pressure-flow studies in man An evaluation of the duration of the phases of systole J Clin Invest 1969;48:895-905 82 Jensen BN, Jensen FS, Madsen SN, Lolssl K: Accuracy of digital tympanic, oral, axillary, and rectal thermometers compared with standard rectal mercury thermometers Eur J Surg 2000;166:848-851 83 Goel G, Chou IC, Voit EO: Biological systems modeling and analysis: a biomolecular technique of the twenty-first century J Biomol Tech 2006;17:252-269 84 Panniers TL, Feuerbach RD, Soeken KL: Methods in informatics: using data derived from a systematic review of health care texts to develop a concept map for use in the neonatal intensive care setting J Biomed Inform 2003;36:232-239 45 85 Engels JM, Diehr P: Imputation of missing longitudinal data: a comparison of methods J Clin Epidemiol 2003;56:968-976 86 Myers WR: Handling missing data in clinical trials: an overview Drug Information Journal 2000; 34:525-533 87 Shao J, Jordan DC, Pritchett YL: Baseline observation carry forward: reasoning, properties, and practical issues J Biopharm Stat 2009;19:672-684 88 Birkhahn RH, Gaeta TJ, Terry D, Bove JJ, Tloczkowski J: Shock index in diagnosing early acute hypovolemia Am J Emerg Med 2005;23:323-326 89 Kaufman BS, Rackow EC, Falk JL: The relationship between oxygen delivery and consumption during fluid resuscitation of hypovolemic and septic shock Chest 1984;85:336-340 90 Tropsha A, Gramatica P, VK: The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models QSAR & Combinatorial Science 2003;22:69-77 91 Han J, Micheline Kamber: Data mining: Concepts and techniques San Francisco, CA, Morgan Kaufmann Publishers 2006 92 Ian H Witten, Frank E: Data mining: Practical machine learning tools and techniques Amsterdam, Netherlands, Morgan Kaufman Publishers 2005 93 Stein R: Benchmarking default prediction models: Pitfalls and remedies in model validation Journal of Risk Model Validation 2007;1:77-113 94 Altman DG, Royston P: What we mean by validating a prognostic model? Stat Med 2000;19:453-473 46 95 Dreiseitl S, Ohno-Machado L: Logistic regression and artificial neural network classification models: A methodology review J Biomed Inform 2002;35:352-359 96 Guyon I, Elisseeff A: An introduction to variable and feature selection The Journal of Machine Learning Research 2003;3:1157-1182 97 Hsu C, Chang C, Lin C: A practical guide to support vector classification Available online at: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf Accessed June 15, 2010 98 Kohavi R: A study of cross-validation and bootstrap for accuracy estimation and model selection Proc 14th Intl Joint Conference on Artificial Intelligence 1995;2:1137-43 99 Lim T, Loh W, Shih Y: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms Machine Learning 2000;40:203-228 100 Byvatov E, Schneider G: Support vector machine applications in bioinformatics Appl Bioinformatics 2003;2:67-77 101 Lobo JM, Jimenez-Valverde A, Real R: AUC: A misleading measure of the performance of predictive distribution models Global Ecology and Biogeography 2008;17:145-151 47 Figures Figure – Overview of shock Figure 1: Determinants of shock include variables that represent supply, demand, measures of end organ function, and modulators of the metabolic environment Clinically useful concepts that are not explicitly measured but directly relate to supply and demand of oxygen are represented as explicitly calculated latent variables Figure – Determinants of shock: variable source and data type Figure 2: Physiologic variables selected as determinants of shock Each variable has been matched to a source that contains the raw measurement For measurements with more than one source, the preferred source was used for the match Figure – Data management in the prearrest timeframe Figure 3: Multivariable variables were always assigned as the most recent measurement taken before the reference point: the event in this illustration For continuously measured variables, a multivariable representative was assigned according to the multivariable parameters The remaining elements were considered time series elements: 59 minute by minute measurements in the one hour preceding the event, and hour by hour measurements in the 12 hours preceding the event Figure – Data classes Figure 4: Latent variables were derived from the raw physiologic Multivariable and Time Series data sets Clinical Latent Variables were based on calculations used in clinical medicine in 48 assessing for shock Trend Analysis Latent Variables were based on slopes, intercepts, means, and the ratios of these features for 5, 10, 15, and 60 minute windows that preceded the arrest event Figure – Data subsets obtained by combining data classes Figure 5: Five candidate modeling subsets of data were created to determine the impact of time series and trend analysis latent features (separately) to baseline multivariable model accuracy Clinical latent variables were compared to multivariable + time series features to determine their relative impact to model accuracy Finally, all candidate features were combined to determine the net impact of time series + clinical latent + time series latent features on model accuracy 49 Figure Figure Figure Figure Figure ...Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU Curtis E Kennedy1§, James P Turley2 Department of Pediatrics, Baylor College... series analysis informs the user of how a variable relates to itself In particular, time series analysis provides two types of information about a variable of interest: trends and seasonality(65)... prediction were measured by multiple means Heart rate can be measured by ECG signals or by pulse oximetry Blood pressures can be measured continuously by arterial lines or intermittently by blood pressure

Ngày đăng: 13/08/2014, 16:20

TỪ KHÓA LIÊN QUAN