1. Trang chủ
  2. » Y Tế - Sức Khỏe

Methods in molecular biology vol 1598 neuroproteomics methods and protocols 2nd edition

423 1,1K 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 423
Dung lượng 12,46 MB

Nội dung

Methods in Molecular Biology 1598 Firas H Kobeissy Stanley M Stevens, Jr Editors Neuroproteomics Methods and Protocols Second Edition Methods in Molecular Biology Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK For further volumes: http://www.springer.com/series/7651 Neuroproteomics Methods and Protocols Second Edition Edited by Firas H Kobeissy Department of Psychiatry, University of Florida McKnight Brain Institute, Gainesville, FL, USA Stanley M Stevens, Jr Department of Cell Biology, Microbiology, & Molecular Biology, University of South Florida, Tampa, FL, USA Editors Firas H Kobeissy Department of Psychiatry University of Florida McKnight Brain Institute Gainesville, FL, USA Stanley M Stevens, Jr Department of Cell Biology,   Microbiology, & Molecular Biology University of South Florida Tampa, FL, USA ISSN 1064-3745     ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-4939-6950-0    ISBN 978-1-4939-6952-4 (eBook) DOI 10.1007/978-1-4939-6952-4 Library of Congress Control Number: 2017935482 © Springer Science+Business Media LLC 2017 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Cover illustration: Painting: “Art of expression”, 2016 Acrylic on canvas 30x40 inches American University of Beirut, Located at Dr Elie El-Chaer Office By the artist Iman Karout, MSc, Email: karouteman@yahoo.com Printed on acid-free paper This Humana Press imprint is published by Springer Nature The registered company is Springer Science+Business Media LLC The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A Dedication To my brother Rabih Moshourab, and to those who have faith in science Firas H. Kobeissy To the memory of my mother, Mary E. Stevens, and sister, Donna Stevens Miraoui Stanley M. Stevens, Jr v Foreword I Major health care challenges remain to be the diagnosis and treatment of stroke and traumatic brain injury (TBI) An improved understanding of the neurotrauma biological attributes of proteins and peptides is expected to enable a better understanding of molecular changes prompted by brain conditions Such understanding will substantially improve patient care Neuroproteomics is an emerging and dynamic area of research that is deservedly drawing immense attention This book edited by Stevens and Kobeissy is timely and provides concise sets of articles that capture recent advancements in neuroproteomics and clinical application of this dynamic area of research to understand the molecular protein changes that are directly related to the development and progression of many central nervous system diseases, stroke, and traumatic brain injury The book is divided into three parts covering a wide spectrum of neuroproteomics The first part of the book is entitled “current reviews of neuroproteomics approaches and applications.” This part of the book encompasses six chapters that review and highlight advances in a wide range of research activities pertaining to neuroproteomics, including exhaustive review of the advantages and challenges associated with neuroproteomics, potential of imaging mass spectrometry to study TBI and other central nerve system conditions, advances in degradomics and proteomics to study TBI, systems biology approach to study PTSD, and neuroproteomics in Alzheimer’s disease An attractive feature of this book that makes it useful for new and advanced researchers is the breadth of topics covered, not only in this part but throughout the book Another attractive feature of the book is the fact that it included chapters that reviewed the current state-of-the-art areas of research as well as chapters that discussed and described experimental methods The second part of the book is dedicated to discussing and describing experimental methods of neuroproteomics This part of the book constitutes the heart of the book and included ten chapters that describe experimental methods related to photoaffinity labeling, quantitative phosphoproteomics of brain tissue, glycoprotein enrichment in CNS, 2-DE proteomics, neuroproteomics CSF profiling by multiplexed affinity arrays, brain proteomics by IMS of parafilm-assisted microdissection-based LC-MS/MS, SILAC of primary microglia, and TBI neuroproteomics by 2-DE and Western blotting The comprehensiveness of the book is evident by dedicating the third part of the book to “Bioinformatics and Computational Methods.” This is an important area of research and the success of all the activities described in Parts I and II hinges on the development and implementation of bioinformatics and computational tools that are critical for the automated interpretation and quantitation of the “big data” generated by neuroproteomics analytical approaches This part includes five chapters The first chapter in this part describes an algorithm capable of degradomics prediction This chapter is aligned with the review of degradomics (Chapter 4) A systems biology and bioinformatics approach to the effect of secondhand tobacco smoke on the nitration of brain proteome is the subject of the second chapter in this part Advanced “Omic” approach to identify co-regulated clusters and transcription regulation network with AGCT and SHOE methods is discussed in the third chapter of this part AutoDock and AutoDock tools for protein-ligand docking and an vii viii Foreword I integration of decision tree and visual analysis to analyze intracranial pressure are the subjects of the fourth and fifth chapters of this part, respectively Stevens and Kobeissy should be commended for the fine job they have done editing this book The collection of topics and the quality of the chapters are excellent and a perfect fit for an edited book in neuroproteomics The book is timely, and the breadth and depth of topics are outstanding This book will be an excellent resource for the new and expert researcher Students and researchers will benefit from reading the book and keeping a copy handy A world-renowned expert in biomolecular mass spectrometry proteomics/glycoproteomics and glycomics, Lubbock, TX, USA Yehia Mechref, Ph.D Foreword II Over the past few centuries, a number of technological advances have uncovered new horizons for the scientific study of the nervous system From uncovering the electrical excitability of neurons and the invention of the microscope to modern imaging techniques capable of visualizing molecules in a functional brain, we have come a long way in refining our speculations about brain function Today, it is possible to correlate the molecular dynamics of neuronal circuits with the mechanisms of sensorimotor transformations in the brain and to connect them all with observable behavior With every new technique, the excitement for novelty and the promise of discovery had to be disciplined with a word of caution: a reminder that the brain is different from other organs and studying it requires vigilance against overindulgence in interpreting results When Dr Firas Kobaissy first mentioned to me that he was about to write the second edition of this book, I said to myself here’s a much needed revision of Neuroproteomics waiting to be written! I have known Firas for more than years, through which he has been focused on the use of proteomics in the study of disease and injury, including brain injury His passion for proteomics is rivaled only by his interest in the mechanisms of brain injury In the first edition, “Neuroproteomics” presented a number of experimental proteomic approaches to the study of the central nervous system (CNS) and its dysfunction in trauma and disease In four contiguous sections, it covered animal models used in neuroproteomics research, methods for separating and analyzing subcomponents of the neuroproteome, wide-ranging approaches for proteome characterization and quantification in the CNS, in addition to other methods to translate neuroproteomic results clinically This second edition offers more updated and novel protocols that encompass both brain-wide and targeted neuroproteomic topics It includes exploration of advanced methods used for neuroproteomics research including protein quantitation by mass spectrometry, characterization of post-translational modifications, as well as bioinformatics and computational approaches Methodology chapters follow a well-organized presentation of their respective topics, starting with an introduction, followed by a list of materials and reagents, step-by-step reproducible protocols, and instructions on troubleshooting and addressing potential pitfalls It is a cookbook for established and new scientists looking for molecular and biochemical markers of brain function and disease I have studied the brain and its mechanisms for nearly three decades using neurophysiology, neuroanatomy, neuropharmacology, molecular, behavioral, and imaging techniques and I have taught the same over the same period My work spanned the fields of discovery and translational sciences, with clinical applications in a couple of instances If anything, my neurotrek has taught me one important lesson about the brain: it functions more like a Jeep than a Ferrari and it constantly adapts to changing circumstances This makes the outcomes of reductionist neuroscience techniques—be they physiological, cellular, molecular, or proteomic—too precise and limited to the experimental question at hand, reflecting mere snapshots of the brain state at a given point in time; fleeting moments that vary with changing conditions ix x Foreword II Reconstructing behavioral and cognitive states from these snapshots requires more integrated conceptual questions that put together the observations of many disciplines, and push them far beyond what a single technique can offer Along those lines, an amazing unification within the biological sciences has taken place over the past few decades and it has set the stage for addressing this challenge Genomics and proteomics have unmasked surprising similarities among proteins, their functions, and their mechanisms of action throughout the body including the nervous system This has resulted in a common conceptual framework for all cell biology including the neuron However, the more daunting challenge remains a unification between the many disciplines of biology to explain the neural basis of behavior This final unification requires an admission, by reductionists, of the impossibility of a bottom-up reconstruction of biological systems, and an integrationist approach that does not deny or ignore the validity and results of successful reduction This book is a step in the right direction towards unifying cellular and molecular methodologies in the study of neurons Hopefully, it will be followed by similarly successful steps towards a general biological unification Professor & Chairperson Department of Anatomy, Cell Biology and Physiological Sciences Faculty of Medicine Professor and Chairman, Interfaculty Neuroscience Graduate Program American University of Beirut, Bliss Street, Beirut, Lebanon Elie D Al-Chaer, Ph.D, JD Preface The application of proteomics to the study of the central nervous system (CNS) has greatly enhanced our understanding of fundamental neurobiological processes and has enabled the identification of proteins and pathways related to the complex molecular mechanisms underlying various diseases of the CNS. This field, termed neuroproteomics, has facilitated scientific discovery through major technological and methodological advances in recent years As part of the Methods in Molecular Biology series, this new edition will include several exciting areas of advanced methods used for neuroproteomics research including relative and absolute protein quantitation by mass spectrometry, characterization of post-­ translational modifications, as well as bioinformatics and computational approaches In the introductory part of the book (Current Reviews of Neuroproteomic Approaches and Applications), we have six timely reviews of various neuroproteomic approaches such as neuroproteomics genesis, degradomics, proteomic analysis for the identification of biofluid biomarkers, mass spectrometry-based imaging, and computational methods In addition to methodology, the application of neuroproteomic approaches to understand CNS disorders such as posttraumatic stress disorder and Alzheimer’s disease is also reviewed The second part of the book focuses on experimental methods in neuroproteomics We are excited to present updated approaches for the global-scale analysis of post-translational modification analysis These post-translational modifications include phosphorylation, glycosylation, as well as proteolytic cleavage In addition to post-translational modification analysis, several chapters detail procedures for quantitation of protein expression using both label-free and also novel stable isotope labeling approaches In terms of label-free quantitation, both mass spectrometry and multiplexed affinity arrays are described in relation to protein profiling in cerebrospinal fluid and also microvesicles and exosomes derived from neuronal cells In relation to stable isotope labeling methods in neuroproteomics, two chapters detail stable isotope labeling by amino acids in cell culture (SILAC) approaches for the analysis of primary or ex vivo microglia The SILAC chapters are focused on a single CNS cell type; however, the approach can be potentially applied to other CNS cell types after appropriate optimization Moreover, specialized method chapters are presented including proteomic approaches for identification of allosteric ligand binding sites, matrix-­ assisted laser desorption/ionization-based imaging, and targeted analysis of protein expression in a tissue-specific approach related to neuroendocrine response In addition to experimental protocol chapters, we present five chapters in the last part of the book that are related to bioinformatic and computational approaches in neuroproteomics These chapters include a novel degradomics prediction algorithm as well as systems biology and bioinformatics approaches to characterize the global-scale effects of protein nitration and to determine transcriptional regulation networks in the context of the CNS. Specialized protocols are also presented that describe methods for computational assessment of protein-ligand interactions as well as a detailed decision tree for the analysis of intracranial pressure Overall, this new edition provides updated and novel protocols of neuroproteomics methods that encompass both global-scale as well as targeted and specialized topics, which xi Visual Analysis to Analyze Intracranial Pressure 409 ent frequency ranges is applied to isolate frequency components into certain subbands This process results in isolating small changes in data mainly in high-frequency subband Several important features are analyzed to determine rapid changes DWT is particularly good for local analysis in representing fast time varying data The merits of using DWT are (a) capturing the nonstationary nature of the data in time-frequency domain, (b) detecting any rapid changes in the data, and (c) revealing important information from the data DWT decomposes data into different levels by calculating its correlation with a set of chosen wavelet basis function Wavelets are obtained from mother wavelet by dilation and shifting [22–24] Among various wavelets, Dubechies (or Harr) mother wavelet is used to each region and decomposed to single level For 2D WT, any image Ii having n × n matrix, the 1D wavelet transform is first applied to the column of the image and then applied to the rows Only diagonal detail coefficient is used to extract DWT texture features from each region The DWT feature is defined as the sum of the absolute values of the diagonal detail coefficient divided by the total pixel count of the diagonal coefficients [6, 22] 3.3  Predictive Model Generation CART, designed by L. Breiman [25], applies information-­theoretic concepts to create a decision tree It allows capturing rather complex patterns in data and their expressions in the form of transparent grammatical rules [26] One of the major advantages of using CART is that it deals with multiple attribute types such as numerical and categorical variables [25, 27] For the categorical variables, CART simply uses substitution values, defined as patterns similar to the best split values in the node [25] In addition, it supports an exhaustive search for all variables and split values to find optimal splitting rules for each node The splitting stops at the pure node containing fewest examples Tenfold cross validation is commonly used for fair comparison However, sixfold cross-validation is also considered due to the small sample size of the data.  3.4  Interactive Visual Analysis A visual analytics approach is utilized to have a better understanding on the relationship among the features In particular, a known visual analytics system (called iPCA [16]) is used to conduct an interactive factor analysis When conducting an interactive visual analysis, all input variables are considered For the ICP values, the exact ICP values instead of the grouped values (i.e., Low and High) are used With iPCA, factor analysis and outlier detection are performed iPCA is designed to represent the results of PCA using multiple coordinated views and a rich set of user interactions to support an interactive analysis of multivariate datasets Within iPCA, the user is allowed to select patients’ data in one coordinate space and immediately see the corresponding data highlighted in the other coordinate space to help the user understand the relationship between the two It is important to note that whenever 410 Soo-Yeon Ji et al data modification is applied by removing data or adjusting dimension contributions in iPCA, recomputation of PCA is performed Mathematically, PCA is defined as an orthogonal linear transformation that assumes all basis vectors are an orthonormal matrix [28] It involves a calculation of a covariance matrix of a dataset to minimize the redundancy and maximize the variance Since it determines eigenvectors and eigenvalues of input data (i.e., a matrix), it is broadly applied to factor and trend analysis, exploratory data analysis, and dimension reduction PCA determines the eigenvectors and eigenvalues from covariance matrix of the input data The covariance matrix is used to measure how much variables (i.e., features or dimensions) vary from the mean with respect to each other The covariance of two random variables is their tendency to vary together as cov ( X ,Y ) = E éëE [ X ] - X ùû × E éëE [Y ] - Y ùû where E[X] and E[Y] denote the expected value of X and Y respectively For a sampled dataset, this can be represented as i =1 cov(X ,Y ) = å N (xi - x )(y i - y ) N with x = mean(X ) and y = mean(Y ), where N is the dimension of the input data The covariance matrix a matrix A with elements Ai, j = cov(i, j) to center the dataset by subtracting the mean of each column vector In the covariance matrix, exact value is not as important as its sign (i.e., positive or negative) If the value is positive, it indicates that both dimensions increase, meaning that as the value of dimension X increased, so did the dimension Y If the value is negative, then as one dimension increases, the other decreases In this case, the dimensions end up with opposite values In the final case, where the covariance is zero, the two dimensions are independent of each other Because of the commutative attribute, the covariance of X and Y(cov(X, Y)) is equal to the covariance of Y and X(cov(Y, X)) The eigenvectors are unit eigenvectors (lengths are 1) Once the eigenvectors and the eigenvalues are calculated, the eigenvalues are sorted in descending order This provides the components in order of significance The eigenvector with the highest eigenvalue is the most dominant principal component of the dataset (PC1) It expresses the most significant relationship between the data dimensions Therefore, principal components are calculated by multiplying each row of the eigenvectors with the sorted eigenvalues As mentioned above, PCA is used as a dimension reduction method by finding the principal components of input data But to map a high-dimensional dataset to lower dimensional space, the Visual Analysis to Analyze Intracranial Pressure 411 best low-dimensional space has to be determined by the eigenvectors of the covariance matrix The best low-dimensional space is defined as having the minimal error between the input dataset and the principal components by using the following criterion: K ål i =1 N i ål i =1 i > q ( e.g.,q is 0.9 or 0.95) where K is the selected dimension from the original matrix dimension N, θ is a threshold, and λ is an eigenvalue Based on this criterion, the N × N matrix is linearly transformed to an N × K matrix Even though the number of dimensions is decreased through the PCA calculation, the difference between the input and the output matrix is minor A K value of or is often used to map the dataset into a 2D or 3D coordinate system By default, data will be projected with principal components (PC1 and PC2) in iPCA. Since a common method for finding eigenvectors and eigenvalues in nonsquare matrices is Singular Value Decomposition (SVD) [29], iPCA uses an approximation method based on SVD called Online SVD to maintain real-time user interactions when interacting with large-scale datasets [30] A detailed explanation about iPCA can be found in [16]) 4  Experiment Results A comparative understanding is performed to see the distribution of the data in considering lower and higher ICP values For this, mean deviation is computed and compared among the features (see Fig. 2) Data normalization scales the values of each continuous attribute into a well-proportioned range so that one attribute cannot affect others Several attributes in each ICP group (Low and High) feature set have large variances among them Data normalization needs to be applied to remove such large variances For this, mean range normalization technique (i.e., min-max normalization) is applied because it is an effective and relatively inexpensive technique It simply performs the normalization after identifying the minimum and maximum values of given attributes As explained above, a predictive model is generated using CART Overall, six trees are generated since sixfold cross-­validation is performed Figure 3 represents one of the trees Sensitivity and specificity are measured to understand how well the model predicts low and high ICP cases effectively The sensitivity and specificity are determined as 80 and 85.7%, respectively 412 Soo-Yeon Ji et al Fig A feature comparison of the patients who has a high and low ICP < 31 Age < 69.05 High ISS >= 31 >= 69.05 < 25.5 Mean of Variance < 1224 Low Low Vent Days >= 25.5 < 1551.5 >= 1224 < 7536 High FT feature >= 7536 Low BloodCount >= 1551.5 BloodCount Low < 5402.5 High >= 5402.5 Low Fig A tree example generated with CART Figure indicates that ISS is determined as the most important feature to represent the ICP level (i.e., High or Low) It is important to note that each tree identifies a different feature as the most important feature By observing the generated six trees based on the sixfold cross-validation, the features including ISS, blood count, ventilation days (i.e., Vent Days), and DWT feature are identified as important features positioned on the top of the generated trees Visual Analysis to Analyze Intracranial Pressure 413 5  Visual Analysis As mentioned above, a visual analytics approach is utilized to perform an interactive visual analysis on the ICP data Visual analytics has been known as a new research area that focuses on performing analytical reasoning with interactive visual interfaces [31] In here, an extended version of iPCA is used to conduct an interactive factor analysis 5.1  Factor Analysis The TBI dataset consisting of nine features (i.e., variables) is used For the ICP, exact ICP values are utilized instead of using the categorized ICPs Patients’ outcome status (Rehab, Nursing Home, Transfer, or Death) is used to represent each patient with distinctive colors iPCA supports the change of dimension contributions by moving slider bars in each feature to support an analysis of the data nonlinearly When applying the dimension contribution, it is extremely important for the user to maintain an awareness of this change by the contribution since the projection of data will be modified The user can easily become disoriented if the meaning of changes is unclear For this dimension contribution change in iPCA, there is a clear mathematical precedent to the use of dimension contributions In Weighted Principal Component Analysis (WPCA), different variables can have different weights s1 , s2 ,…, sn [28] It assumes that data are not always linearly increasing or decreasing, and there may be reason to allow different observations to have different weights Based on this assumption, WPCA is adopted by researchers when analyzing complex data to set different weights to each variable, to find missing data by giving zero weight to possible missing data, to create a nonlinear multivariate data analysis Depending on the existence of skull fracture, the data are spread out in forming two distinctive linear patterns as shown in Fig. 4a The linear pattern on the left side of the projection space represent when there are skull fractures Instead, the pattern appeared on the right side of the projection space represent that  there is no skull fracture When the dimension contribution is changed from 100 to 0% on the skull fracture feature, a clearly separated pattern is destroyed as shown in Fig. 4d 0% indicates that the selected variable is not used to contribute to the final PCA computation The dimension contribution with the dimension sliders is performed on all variables to identify the relationship between the variables and the principal components (Fig. 5e) From the analysis of not  considering the skull fracture feature, the two features (mean of variance and FT feature) are identified as insignificant features In addition, it has been found that the patients data (death) are form- 414 Soo-Yeon Ji et al Fig Dimension contribution is applied by using a slider bar in the variable of Skull from 100 to 0% The trails show how the patients’ data move in PCA space in response to the change Patient outcome status is mapped with different color attributes as rehab (green), nursing home, transfer (yellow), and death (blue) (a) 100% dimension contribution (initial state) (b) 66% dimension contribution (c) 29% dimension contribution (d) 0% dimension contribution Fig The three highlighted regions (a–c) indicate the patient outcome status (death) The patients’ data is presented in parallel coordinates (d) They are well separated from the other patients’ outcome status with the dimension contribution changes on the variables of mean of variance, FT feature, and skull (a) Four data items (death) with a possible outlier (nursing home) (b) Center positioned six data items (death) (c) Isolated six data items (death) (d) Data representation in parallel coordinates (e) Dimension contribution change with slider bars Visual Analysis to Analyze Intracranial Pressure 415 ing clear regions (Fig. 5a–c) separated from other patients’ data Although the regions need to be validated by ­applying statistical methods or clustering algorithms, the interactive technique of changing the dimension contribution provides good implications for us to further extended analysis In the region A (Fig. 5a), possible outliers (nursing home) are identified Since outlier detection and analysis is important to validate possible outliers, we performed an outlier analysis in this study A detailed explanation of the analysis is provided in the following section 5.2  Outlier Detection and Analysis Outlier detection is important in data analysis since outliers may carry significant information In data analysis, it is often helpful to remove outliers because they are numerically distant from the rest of the data and often represent errors in the data On the other hand, outliers can be important results and, in this case, their relation to the rest of the data should be studied in detail With an outlier skewing calculations, the data cannot be fully analyzed or can give a misleading understanding Detecting outliers can be difficult, and there has been much research on automated outlier detection PCA calculation is one of the methods used to detect outliers in medical domains [32, 33] In iPCA, outlier detection is performed empirically by evaluating dimension contributions or scatterplots representing the relationship between two variables As we discussed above, possible outliers have been observed (Fig. 5a) With the possible outliers, it is important to conduct an analysis of identifying the answer to the question how the patient data (nursing home) are positioned near to other patients’ data (death) Figure 6 shows how outlier analysis is conducted and found a reason to the question After selecting the patients’ data in Fig.  6a, an outlier analysis is conducted by adjusting dimension contributions From the analysis (see Fig. 6b), it has been found that the distinction between the possible outliers (nursing home) and the data (death) is clearly separable with the feature (skull fracture) Depending on the existence of the skull fracture, the patient data (nursing home) may become outliers Since Pearson-correlation coefficients and the relationships (scatter plot) between each pair of features are represented in iPCA, outlier analysis and trend analysis can be performed In the view (Fig. 7a), the diagonal displays the names of all features The bottom triangle shows the coefficient value between two features with different colors indicating positive (red), neutral (white), and negative (blue) correlations The top triangle contains cells of scatter plots in which all data are projected onto the two intersecting features Within this view, selection of each scatterplot can be performed In Fig. 7, the user is trying to identify and select a scatterplot that has a positive correlation coefficient (γ = 0.72) The scatterplot (Fig. 7b) between FT feature and mean of variance represents all patients data with showing possible outliers After selecting the possible outliers appeared in Fig. 7c, the user performed a deletion 416 Soo-Yeon Ji et al Fig Outlier analysis is performed to determine the factor that makes the data element (nursing home) become an outlier (a) Selection of five data items for outlier analysis (b) Analysis of the selected data items Fig An example analysis of identifying possible outliers (a) Analysis of correlation coefficients (b) Scatterplot view of the highest correlation coefficient (c) Selection and removal of possible outliers (d) Identification of an additional possible outlier operation to remove the selected patients’ data items Since they are not primary consideration (representing “rehab”) of the analysis, the removal of the patients’ data items can help the user observe the changes of the representations in the scatter plot After the removal, the scatterplot (Fig. 7d) brought a possible outlier (i.e., death) that was not visible previously With the outlier appeared in the scatterplot in Fig. 7d, an extended analysis is performed to identify statistically similar patients For this, four similarity measures are used as cosine similarity, Euclidean similarity, Pearson correlation coefficient, and Extended Jaccard Coefficient Cosine similarity measures the angle between the two data by computing the cosine similarity of two u ×v < 0.05 ) Euclidean similarity measures data ( sim (u,v ) = uv the Euclidean distance between two data and determine items < 0.05) based on the inverse of the distance (sim(u,v ) = u,v + Visual Analysis to Analyze Intracranial Pressure 417 Pearson correlation coefficient is the most widely used correlation coefficient measure that computes the strength and the direction of the linear relationship between two data corr (u,v ) (sim(u,v ) = < 0.05 ), where N  is the number of - corr (u,v )2 N -2 variants Extended Jaccard coefficient measures the similarity by comparing the size of the overlap against the size of the two data EJ (u,v ) < 0.05 ), where N is the number of vari(sim(u,v ) = - EJ (u,v )2 N -2 ants Among the four similarity measures, it has been found that the Pearson-correlation coefficient is the best method for analyzing and identifying similar patterns After identifying a possible outlier in Fig. 7, two patients’ data are determined as similar data with the Pearson-correlation coefficient similarity measure (p < 0.05) The three patients’ data are highlighted in Fig. 8 Although the similarity among the patients data is not clear in the scatterplot (Fig. 8a), the data are appeared nearby in the PCA projection space (Fig. 8b) Since the data and the calculated eigenvectors are displayed in parallel coordinates at the bottom and the right, respectively (see Fig. 8), it has been observed that the three patients’ data are appeared as having similar patterns Since eigenvectors are linear combinations of data dimensions, identifying data dimension contributions to the eigenvector is a key consideration in the data analysis with iPCA in comprehending how the coordinate spaces are related to each other As shown in the bottom parallel coordinates, there are noticeable differences in the data within the first four features (from the left— blood count, mean of variance, FT feature, and DWT feature) But, it is apparent that the patients’ data maintain somewhat similarities in the right parallel coordinates 6  Conclusion A computational method to noninvasively predict the intracranial pressure based on CT images and demographic data using image processing and decision tree algorithm is introduced This method is designed to replace invasive catheter-based monitoring systems for some patients and therefore avoid further complications The method predicts the range of ICP using CART It shows 80% or higher sensitivity and specificity From the texture analysis performed to extract features, it has been found that CT images contain vital information that may not be visible to human eyes To enhance the ability of analyzing the exact ICP value and patients’ outcome status, a visual analytics approach is used to conduct a factor analysis and outlier detection With this approach, it has been found 418 Soo-Yeon Ji et al Fig Based on the possible outlier identified from the previous analysis, statistically similar data items are determined from the Pearson-correlation coefficient similarity measure (p 

Ngày đăng: 16/05/2017, 23:21

TỪ KHÓA LIÊN QUAN