Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 162 trang
THÔNG TIN TÀI LIỆU
Cấu trúc
Preface
List of Contributors
Contents
Introduction
Introduction
Why perform microarray experiments?
What is a microarray?
Microarray production
Where can I obtain microarrays?
Extracting and labeling the RNA sample
RNA extraction from scarse tissue samples
Hybridization
Scanning
Typical research applications of microarrays
Experimental design and controls
Suggested reading
Affymetrix Genechip system
Affymetrix technology
Single Array analysis
Detection p-value
Detection call
Signal algorithm
Analysis tips
Comparison analysis
Normalization
Change p-value
Change call
Signal Log Ratio Algorithm
Genotyping systems
Introduction
Methodologies
Genotype calls
Suggested reading
Overview of data analysis
cDNA microarray data analysis
Affymetrix data analysis
Data analysis pipeline
Experimental design
Why do we need to consider experimental design?
Choosing and using controls
Choosing and using replicates
Choosing a technology platform
Gene clustering v. gene classification
Conclusions
Suggested reading
Basic statistics
Why statistics are needed
Basic concepts
Variables
Constants
Distribution
Errors
Simple statistics
Number of subjects
Mean (m)
Trimmed mean
Median
Percentile
Range
Variance and the standard deviation
Coefficient of variation
Effect statistics
Scatter plot
Correlation (r)
Linear regression
Frequency distributions
Normal distribution
t-distribution
Skewed distribution
Checking the distribution of the data
Transformation
Log2-transformation
Outliers
Missing values and imputation
Statistical testing
Basics of statistical testing
Choosing a test
Threshold for p-value
Hypothesis pair
Calculation of test statistic and degrees of freedom
Critical values table
Drawing conclusions
Multiple testing
Analysis of variance
Basics of ANOVA
Completely randomized experiment
Statistics using GeneSpring
Simple statistics
Tranformations
Scatter plot and histogram
Correlation
Linear regression
One-sample t-test
Independent samples t-test and ANOVA
Suggested reading
Analysis
Preprocessing of data
Rationale for preprocessing
Missing values
Checking the background reading
Calculation of expression change
Intensity ratio
Log ratio
Fold change
Handling of replicates
Types of replicates
Time series
Case-control studies
Power analysis
Averaging replicates
Checking the quality of replicates
Quality check of replicate chips
Quality check of replicate spots
Excluding bad replicates
Outliers
Filtering bad data
Filtering uninteresting data
Simple statistics
Mean and median
Standard deviation
Variance
Skewness and normality
Linearity
Spatial effects
Normalization
Similarity of dynamic range, mean and variance
Examples using GeneSpring
Importing data
Background subtraction
Calculation of expression change
Replicates
Checking linearity
Normality
Filtering
Suggested reading
Normalization
What is normalization?
Sources of systematic bias
Dye effect
Scanner malfunction
Uneven hybridization
Printing tip
Plate and reporter effects
Batch effect and array design
Experimenter issues
What might help to track the sources of bias?
Normalization terminology
Normalization, standardization and centralization
Per-chip and per-gene normalization
Global and local normalization
Performing normalization
Choice of the method
Basic idea
Control genes
Linearity of data matters
Basic normalization schemes for linear data
Special situations
Mathematical calculations
Mean centering
Median centering
Trimmed mean centering
Standardization
Lowess smoothing
Ratio statistics
Analysis of variance
Spiked controls
Dye-swap experiments
Some caution is needed
Graphical example
Example of calculations
Using GeneSpring for normalization
Suggested reading
Finding differentially expressed genes
Identifying over- and underexpressed genes
Filtering by absolute expression change
Statistical single chip methods
Noise envelope
Sapir and Churchill's single slide method
Chen's single slide method
Newton's single slide method
What about the confidence?
Only some treatments have replicates
All the treatments have replicates: two-sample t-test
All the treatments have replicates: one-sample t-test
GeneSpring examples
Suggested reading
Cluster analysis of microarray information
Basic concept of clustering
Principles of clustering
Hierarchical clustering
Self-organizing map
K-means clustering
Principal component analysis
Pros and cons of clustering
Visualization
Programs for clustering and visualization
Function prediction
GeneSpring and clustering
Clustering tool
Principal components analysis tool
Predict parameter value tool
Suggested reading
Data mining
Gene regulatory networks
What are gene regulatory networks?
Fundamentals
Bayesian network
Calculating Bayesian network parameters
Searching Bayesian network structure
Conclusion
Suggested reading
Data mining for promoter sequences
Introduction
Introduction
Finding promoter region sequences
Using EnsMart to retrieve promoter regions
Comparison of EnsMart and UCSC searches
Pattern search without prior knowledge
Summary
GeneSpring and promoter analysis
Suggested reading
Annotations and article mining
Retrieving annotations from public databases
Retrieving annotations using BLAST
Article mining
Annotation and gene ontologies using GeneSpring
Annotations
Ontologies
Tools and data management
Reporting results
Why the results should be reported
What details should be reported: the MIAME standard
How the data should be presented: the MAGE standard
MAGE-OM
MAGE-ML; an XML-translation of MAGE-OM
MAGE-STK
Where and how to submit your data
ArrayExpress and GEO
MIAMExpress
GEO
Other options and aspects
MIAME-compliant sample attributes in GeneSpring
Suggested reading
Software issues
Data format conversions problems
A standard file format
Programming
Perl
Awk
R
Freeware software packages
Cluster and treeview
Expression profiler
ArrayViewer
MAExplorer
Bioconductor
Commercial software packages
VisualGene
GeneSpring
Kensington
J-Express
Expression Nti
Rosetta Resolver
Spotfire
Index
Nội dung
TOMI PASANEN, JANNA SAARELA, ILANA SAARIKKO, TEEMU TOIVANEN, MARTTI TOLVANEN, MAUNO VIHINEN AND GARRY WONG EDITORS JARNO TUIMALA AND M. MINNA LAINE CSC DNAMicroarrayDataAnalysisDNAMicroarrayDataAnalysisDNAMicroarrayDataAnalysis Editors Jarno Tuimala M. Minna Laine CSC, the Finnish IT center for Science CSC – Scientific Computing Ltd. is a non-profit organization for high- performance computing and networking in Finland. CSC is owned by the Ministry of Education. CSC runs a national large-scale facility for compu- tational science and engineering and supports the university and research community. CSC is also responsible for the operations of the Finnish Uni- versity and Research Network (Funet). All rights reserved. The PDF version of this book or parts of it can be used in Finnish universities as course material, provided that this copyright notice is included. However, this publication may not be sold or included as part of other publications without permission of the publisher. c The authors and CSC – Scientific Computing Ltd. 2003 ISBN 952-9821-89-1 http://www.csc.fi/oppaat/siru/ Printed at Picaset Oy Helsinki 2003 DNAmicroarraydataanalysis 5 Preface This is the first edition of the DNAmicroarraydataanalysis guidebook. Although inventedin the mid-90s, DNA microarrays are still novelties as biomedical research tools. DNA microarrays generate large amounts of numerical data, which should be analyzed effectively. In this book, we hope to offer a broad view of basic theory and techniques behind the DNAmicroarraydata analysis. Our aim was not to be comprehensive, but rather to cover the basics, which are unlikely to change much over years. We hope that especially researchers starting their dataanalysis can benefit from the book. The text emphasizes gene expression analysis. Topics, such as genotyping, are discussed shortly. This book does not cover the wet-lab practises, such as sam- ple preparation or hybridization. Rather, we start when the microarrays have been scanned, and the resulting images analyzed. In other words, we take the files with signal intensities, which usually generate questions such as: “How is the data nor- malized?” or “How do I identify the genes which are upregulated?”. We provide some simple solutions to these specific questions and many others. Each chapter has a section on suggested reading, which introduces some of the relevant literature. Several chapters also include dataanalysis examples using GeneSpring software. This edition of the book was written by M. Minna Laine (chapters 4, 8 and 14), Tomi Pasanen (chapter 11), Janna Saarela (chapters 2 and 3), Ilana Saarikko (chapter 8), Teemu Toivanen (chapter 14), Martti Tolvanen (chapter 12), Jarno Tu- imala (chapters 4, 6, 7, 8, 9, 10, 13 and 15), Mauno Vihinen (chapters 10, 11 and 12), and Garry Wong (chapters 1 and 5). Juha Haataja and Leena Jukka are warmly acknowledged for their support during the production of this book. We are very interestedinreceiving feedback aboutthispublication. Especially, if you feel that some essential technique has been missed, let us know. Please send your comments to the e-mail address Jarno.Tuimala@csc.fi. Espoo, 19th May 2003 The authors 6 DNAmicroarraydataanalysis List of Contributors M. Minna Laine CSC, the Finnish IT center for Science Tekniikantie 15 a D 02101 Espoo Finland Tomi Pasanen Institute of Medical Technology Lenkkeilijänkatu 8 33520 Tampere Finland Janna Saarela Biomedicum Biochip Center Haartmaninkatu 8 00290 Helsinki Finland Ilana Saarikko Centre for Biotechnology Tykistökatu 6 20521 Turku Finland Teemu Toivanen Centre for Biotechnology Tykistökatu 6 20521 Turku Finland Martti Tolvanen Institute of Medical Technology Lenkkeilijänkatu 8 33520 Tampere Finland Jarno Tuimala CSC, the Finnish IT center for Science Tekniikantie 15 a D 02101 Espoo Finland Mauno Vihinen Institute of Medical Technology Lenkkeilijänkatu 8 33520 Tampere Finland Garry Wong A. I. Virtanen -institute University of Kuopio 70211 Kuopio Finland Contents 7 Contents Preface 5 List of Contributors 6 I Introduction 14 1 Introduction 15 1.1 Why perform microarray experiments? 15 1.2 What is a microarray? 15 1.3 Microarray production 16 1.4 Where can I obtain microarrays? 17 1.5 Extracting and labeling the RNA sample 19 1.6 RNA extraction from scarse tissue samples 19 1.7 Hybridization 20 1.8 Scanning 20 1.9 Typical research applications of microarrays 21 1.10 Experimental design and controls 22 1.11 Suggested reading 23 2 Affymetrix Genechip system 25 2.1 Affymetrix technology 25 2.2 Single Array analysis 25 2.3 Detection p-value 26 2.4 Detection call 26 2.5 Signal algorithm 26 2.6 Analysis tips 27 2.7 Comparison analysis 27 2.8 Normalization 28 2.9 Change p-value 28 2.10 Change call 29 2.11 Signal Log Ratio Algorithm 29 3 Genotyping systems 31 3.1 Introduction 31 8 DNAmicroarraydataanalysis 3.2 Methodologies 31 3.3 Genotype calls 32 3.4 Suggested reading 33 4 Overview of dataanalysis 34 4.1 cDNA microarraydataanalysis 34 4.2 Affymetrix dataanalysis 35 4.3 Dataanalysis pipeline 35 5 Experimental design 38 5.1 Why do we need to consider experimental design? 38 5.2 Choosing and using controls 38 5.3 Choosing and using replicates 39 5.4 Choosing a technology platform 39 5.5 Gene clustering v. gene classification 40 5.6 Conclusions 41 5.7 Suggested reading 41 6 Basic statistics 42 6.1 Why statistics are needed 42 6.2 Basic concepts 42 6.2.1 Variables 42 6.2.2 Constants 42 6.2.3 Distribution 42 6.2.4 Errors 43 6.3 Simple statistics 43 6.3.1 Number of subjects 43 6.3.2 Mean (m) 43 6.3.3 Trimmed mean 43 6.3.4 Median 43 6.3.5 Percentile 44 6.3.6 Range 44 6.3.7 Variance and the standard deviation 44 6.3.8 Coefficient of variation 44 6.4 Effect statistics 44 6.4.1 Scatter plot 44 6.4.2 Correlation (r) 45 6.4.3 Linear regression 46 6.5 Frequency distributions 47 6.5.1 Normal distribution 47 6.5.2 t-distribution 49 6.5.3 Skewed distribution 49 6.5.4 Checking the distribution of the data 50 Contents 9 6.6 Transformation 51 6.6.1 Log 2 -transformation 52 6.7 Outliers 52 6.8 Missing values and imputation 53 6.9 Statistical testing 54 6.9.1 Basics of statistical testing 54 6.9.2 Choosing a test 55 6.9.3 Threshold for p-value 55 6.9.4 Hypothesis pair 55 6.9.5 Calculation of test statistic and degrees of freedom 56 6.9.6 Critical values table 57 6.9.7 Drawing conclusions 57 6.9.8 Multiple testing 57 6.10 Analysis of variance 58 6.10.1 Basics of ANOVA 58 6.10.2 Completely randomized experiment 58 6.11 Statistics using GeneSpring 60 6.11.1 Simple statistics 60 6.11.2 Tranformations 60 6.11.3 Scatter plot and histogram 60 6.11.4 Correlation 61 6.11.5 Linear regression 61 6.11.6 One-sample t-test 62 6.11.7 Independent samples t-test and ANOVA 62 6.12 Suggested reading 64 II Analysis 65 7 Preprocessing of data 66 7.1 Rationale for preprocessing 66 7.2 Missing values 66 7.3 Checking the backgroundreading 68 7.4 Calculation of expression change 69 7.4.1 Intensity ratio 69 7.4.2 Log ratio 70 7.4.3 Fold change 71 7.5 Handling of replicates 71 7.5.1 Types of replicates 71 7.5.2 Time series 71 7.5.3 Case-control studies 72 7.5.4 Power analysis 72 7.5.5 Averaging replicates 72 7.6 Checking the quality of replicates 72 [...]... expression data These aspects are discussed in Chapter 14 Data file manipulation and analysis tools are introduced in Chapter 15 4.2 Affymetrix dataanalysis Putting model-based methods aside, exploratory dataanalysis using Affymetrix chips is very similar to cDNA microarraydataanalysis The biggest difference is normalization If comparison analysis (see 2.7) is conducted, the Affymetrix data can be... This chapter was written by Janna Saarela 34 4 DNAmicroarraydataanalysis Overview of dataanalysis In this book, we emphasize to microarraydataanalysis after the microarrays have been hybridized, scanned and the images have been analyzed with an image analysis software Before any experiments in the laboratory have been initiated, the experiment and its analysis should be planned carefully Chapter... ordered series of samples (DNA, RNA, protein, tissue) The type of microarray depends upon the material placed onto the slide: DNA, DNA microarray; RNA, RNA microarray; protein, protein microarray; tissue, tissue microarray Since the samples are arranged in an ordered fashion, data obtained from the microarray can be traced back to any of the samples This means that genes on the microarray are addressable... outlines the basic analysis steps of exploratory dataanalysis 4.1 cDNA microarraydataanalysis We start dataanalysis from the results of scanned images At this point, images have been evaluated, bad spots have been investigated and the spots have preferably been scored with flags indicating whether the spot was good, bad, or borderline This is crucial, because in the later stages of the analysis the visual... Finnish DNAmicroarray Center Norwegian Microarray Consortium Ontario Cancer Institute www.helsinki.fi/biochipcenter microarrays.btk.utu.fi www.med.uio.no/dnr /microarray/ www.microarrays.ca Figure 1.2: Work flow of a typical expression microarray experiment 1 Introduction 19 1.5 Extracting and labeling the RNA sample A typical workflow of the microarray experiment has been summarized in Figure 1.2 Once microarrays... facility, microarrays can be produced at your own convenience Microarrays once made store well in dark dessicated plastic slide boxes Some manufacturers suggest storage at −20 ◦ C while others find room temperature adequate The shelf life of microarrays has been claimed to be up to 6 months although this has not been empiracally tested 18 DNAmicroarray data analysis Table 1.1: Places where microarrays... aldehydes or primary amines) that help to stabilize the DNA 16 DNAmicroarray data analysis onto the slide, either by covalent bonds or electrostatic interactions An alternative technology allows the DNA to be synthesized directly onto the slide itself by a photolithographic process This process has been commercialized and is widely available DNA microarrays are used to determine 1 The expression levels... beginning of the challenging part of the data analysis Next we need to link the observations to biological data, to regulation of genes, and to annotations of functions and biological processes This part, data mining, is described in Chapters 11, 12 and 13 4 Overview of data analysis 35 With an enormous amount of data, we need standardized systems and tools for data management in order to publish the... Metspalu, A.,Peltonen, L., Syvanen, A-C (1997) Minisequencing: a specific tool for DNAanalysis and diagnostics on oligonucleotide arrays Genome Res 7, 606-614 24 DNAmicroarray data analysis 10 Schena, M Shalon, D., Davis, R W., and Brown, P O (1995) Quantitative monitoring of gene expression patterns with a complementary DNAmicroarray Science 270, 467-470 This chapter was written by Garry Wong 2 Affymetrix... corrected cDNA data If single array analysis is performed, the basic normalization scheme can be similar to the one presented in section 8.9 4.3 Dataanalysis pipeline To get an overview of the dataanalysis pipeline, consult Figure 4.1 It covers the basic methods introduced in this book The flow chart also helps to choose the right method for the situation The Table 4.1 contains a short list of analysis . GARRY WONG EDITORS JARNO TUIMALA AND M. MINNA LAINE CSC DNA Microarray Data Analysis DNA Microarray Data Analysis DNA Microarray Data Analysis Editors Jarno Tuimala M. Minna Laine CSC, the Finnish. 31 8 DNA microarray data analysis 3.2 Methodologies 31 3.3 Genotype calls 32 3.4 Suggested reading 33 4 Overview of data analysis 34 4.1 cDNA microarray data analysis 34 4.2 Affymetrix data analysis. at Picaset Oy Helsinki 2003 DNA microarray data analysis 5 Preface This is the first edition of the DNA microarray data analysis guidebook. Although inventedin the mid-90s, DNA microarrays are still