1. Trang chủ
  2. » Khoa Học Tự Nhiên

dna microarray data analysis

162 1.4K 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • Preface

  • List of Contributors

  • Contents

  • Introduction

    • Introduction

      • Why perform microarray experiments?

      • What is a microarray?

      • Microarray production

      • Where can I obtain microarrays?

      • Extracting and labeling the RNA sample

      • RNA extraction from scarse tissue samples

      • Hybridization

      • Scanning

      • Typical research applications of microarrays

      • Experimental design and controls

      • Suggested reading

    • Affymetrix Genechip system

      • Affymetrix technology

      • Single Array analysis

      • Detection p-value

      • Detection call

      • Signal algorithm

      • Analysis tips

      • Comparison analysis

      • Normalization

      • Change p-value

      • Change call

      • Signal Log Ratio Algorithm

    • Genotyping systems

      • Introduction

      • Methodologies

      • Genotype calls

      • Suggested reading

    • Overview of data analysis

      • cDNA microarray data analysis

      • Affymetrix data analysis

      • Data analysis pipeline

    • Experimental design

      • Why do we need to consider experimental design?

      • Choosing and using controls

      • Choosing and using replicates

      • Choosing a technology platform

      • Gene clustering v. gene classification

      • Conclusions

      • Suggested reading

    • Basic statistics

      • Why statistics are needed

      • Basic concepts

        • Variables

        • Constants

        • Distribution

        • Errors

      • Simple statistics

        • Number of subjects

        • Mean (m)

        • Trimmed mean

        • Median

        • Percentile

        • Range

        • Variance and the standard deviation

        • Coefficient of variation

      • Effect statistics

        • Scatter plot

        • Correlation (r)

        • Linear regression

      • Frequency distributions

        • Normal distribution

        • t-distribution

        • Skewed distribution

        • Checking the distribution of the data

      • Transformation

        • Log2-transformation

      • Outliers

      • Missing values and imputation

      • Statistical testing

        • Basics of statistical testing

        • Choosing a test

        • Threshold for p-value

        • Hypothesis pair

        • Calculation of test statistic and degrees of freedom

        • Critical values table

        • Drawing conclusions

        • Multiple testing

      • Analysis of variance

        • Basics of ANOVA

        • Completely randomized experiment

      • Statistics using GeneSpring

        • Simple statistics

        • Tranformations

        • Scatter plot and histogram

        • Correlation

        • Linear regression

        • One-sample t-test

        • Independent samples t-test and ANOVA

      • Suggested reading

  • Analysis

    • Preprocessing of data

      • Rationale for preprocessing

      • Missing values

      • Checking the background reading

      • Calculation of expression change

        • Intensity ratio

        • Log ratio

        • Fold change

      • Handling of replicates

        • Types of replicates

        • Time series

        • Case-control studies

        • Power analysis

        • Averaging replicates

      • Checking the quality of replicates

        • Quality check of replicate chips

        • Quality check of replicate spots

        • Excluding bad replicates

      • Outliers

      • Filtering bad data

      • Filtering uninteresting data

      • Simple statistics

        • Mean and median

        • Standard deviation

        • Variance

      • Skewness and normality

        • Linearity

      • Spatial effects

      • Normalization

      • Similarity of dynamic range, mean and variance

      • Examples using GeneSpring

        • Importing data

        • Background subtraction

        • Calculation of expression change

        • Replicates

        • Checking linearity

        • Normality

        • Filtering

      • Suggested reading

    • Normalization

      • What is normalization?

      • Sources of systematic bias

        • Dye effect

        • Scanner malfunction

        • Uneven hybridization

        • Printing tip

        • Plate and reporter effects

        • Batch effect and array design

        • Experimenter issues

        • What might help to track the sources of bias?

      • Normalization terminology

        • Normalization, standardization and centralization

        • Per-chip and per-gene normalization

        • Global and local normalization

      • Performing normalization

        • Choice of the method

        • Basic idea

        • Control genes

        • Linearity of data matters

        • Basic normalization schemes for linear data

        • Special situations

      • Mathematical calculations

        • Mean centering

        • Median centering

        • Trimmed mean centering

        • Standardization

        • Lowess smoothing

        • Ratio statistics

        • Analysis of variance

        • Spiked controls

        • Dye-swap experiments

      • Some caution is needed

      • Graphical example

      • Example of calculations

      • Using GeneSpring for normalization

      • Suggested reading

    • Finding differentially expressed genes

      • Identifying over- and underexpressed genes

        • Filtering by absolute expression change

        • Statistical single chip methods

        • Noise envelope

        • Sapir and Churchill's single slide method

        • Chen's single slide method

        • Newton's single slide method

      • What about the confidence?

        • Only some treatments have replicates

        • All the treatments have replicates: two-sample t-test

        • All the treatments have replicates: one-sample t-test

      • GeneSpring examples

      • Suggested reading

    • Cluster analysis of microarray information

      • Basic concept of clustering

      • Principles of clustering

      • Hierarchical clustering

      • Self-organizing map

      • K-means clustering

      • Principal component analysis

      • Pros and cons of clustering

      • Visualization

      • Programs for clustering and visualization

      • Function prediction

      • GeneSpring and clustering

        • Clustering tool

        • Principal components analysis tool

        • Predict parameter value tool

      • Suggested reading

  • Data mining

    • Gene regulatory networks

      • What are gene regulatory networks?

      • Fundamentals

      • Bayesian network

      • Calculating Bayesian network parameters

      • Searching Bayesian network structure

      • Conclusion

      • Suggested reading

    • Data mining for promoter sequences

      • Introduction

      • Introduction

      • Finding promoter region sequences

      • Using EnsMart to retrieve promoter regions

      • Comparison of EnsMart and UCSC searches

      • Pattern search without prior knowledge

      • Summary

      • GeneSpring and promoter analysis

      • Suggested reading

    • Annotations and article mining

      • Retrieving annotations from public databases

      • Retrieving annotations using BLAST

      • Article mining

      • Annotation and gene ontologies using GeneSpring

        • Annotations

        • Ontologies

  • Tools and data management

    • Reporting results

      • Why the results should be reported

      • What details should be reported: the MIAME standard

      • How the data should be presented: the MAGE standard

        • MAGE-OM

        • MAGE-ML; an XML-translation of MAGE-OM

        • MAGE-STK

      • Where and how to submit your data

        • ArrayExpress and GEO

        • MIAMExpress

        • GEO

        • Other options and aspects

      • MIAME-compliant sample attributes in GeneSpring

      • Suggested reading

    • Software issues

      • Data format conversions problems

      • A standard file format

      • Programming

        • Perl

        • Awk

        • R

      • Freeware software packages

        • Cluster and treeview

        • Expression profiler

        • ArrayViewer

        • MAExplorer

        • Bioconductor

      • Commercial software packages

        • VisualGene

        • GeneSpring

        • Kensington

        • J-Express

        • Expression Nti

        • Rosetta Resolver

        • Spotfire

  • Index

Nội dung

TOMI PASANEN, JANNA SAARELA, ILANA SAARIKKO, TEEMU TOIVANEN, MARTTI TOLVANEN, MAUNO VIHINEN AND GARRY WONG EDITORS JARNO TUIMALA AND M. MINNA LAINE CSC DNA Microarray Data Analysis DNA Microarray Data Analysis DNA Microarray Data Analysis Editors Jarno Tuimala M. Minna Laine CSC, the Finnish IT center for Science CSC – Scientific Computing Ltd. is a non-profit organization for high- performance computing and networking in Finland. CSC is owned by the Ministry of Education. CSC runs a national large-scale facility for compu- tational science and engineering and supports the university and research community. CSC is also responsible for the operations of the Finnish Uni- versity and Research Network (Funet). All rights reserved. The PDF version of this book or parts of it can be used in Finnish universities as course material, provided that this copyright notice is included. However, this publication may not be sold or included as part of other publications without permission of the publisher. c  The authors and CSC – Scientific Computing Ltd. 2003 ISBN 952-9821-89-1 http://www.csc.fi/oppaat/siru/ Printed at Picaset Oy Helsinki 2003 DNA microarray data analysis 5 Preface This is the first edition of the DNA microarray data analysis guidebook. Although inventedin the mid-90s, DNA microarrays are still novelties as biomedical research tools. DNA microarrays generate large amounts of numerical data, which should be analyzed effectively. In this book, we hope to offer a broad view of basic theory and techniques behind the DNA microarray data analysis. Our aim was not to be comprehensive, but rather to cover the basics, which are unlikely to change much over years. We hope that especially researchers starting their data analysis can benefit from the book. The text emphasizes gene expression analysis. Topics, such as genotyping, are discussed shortly. This book does not cover the wet-lab practises, such as sam- ple preparation or hybridization. Rather, we start when the microarrays have been scanned, and the resulting images analyzed. In other words, we take the files with signal intensities, which usually generate questions such as: “How is the data nor- malized?” or “How do I identify the genes which are upregulated?”. We provide some simple solutions to these specific questions and many others. Each chapter has a section on suggested reading, which introduces some of the relevant literature. Several chapters also include data analysis examples using GeneSpring software. This edition of the book was written by M. Minna Laine (chapters 4, 8 and 14), Tomi Pasanen (chapter 11), Janna Saarela (chapters 2 and 3), Ilana Saarikko (chapter 8), Teemu Toivanen (chapter 14), Martti Tolvanen (chapter 12), Jarno Tu- imala (chapters 4, 6, 7, 8, 9, 10, 13 and 15), Mauno Vihinen (chapters 10, 11 and 12), and Garry Wong (chapters 1 and 5). Juha Haataja and Leena Jukka are warmly acknowledged for their support during the production of this book. We are very interestedinreceiving feedback aboutthispublication. Especially, if you feel that some essential technique has been missed, let us know. Please send your comments to the e-mail address Jarno.Tuimala@csc.fi. Espoo, 19th May 2003 The authors 6 DNA microarray data analysis List of Contributors M. Minna Laine CSC, the Finnish IT center for Science Tekniikantie 15 a D 02101 Espoo Finland Tomi Pasanen Institute of Medical Technology Lenkkeilijänkatu 8 33520 Tampere Finland Janna Saarela Biomedicum Biochip Center Haartmaninkatu 8 00290 Helsinki Finland Ilana Saarikko Centre for Biotechnology Tykistökatu 6 20521 Turku Finland Teemu Toivanen Centre for Biotechnology Tykistökatu 6 20521 Turku Finland Martti Tolvanen Institute of Medical Technology Lenkkeilijänkatu 8 33520 Tampere Finland Jarno Tuimala CSC, the Finnish IT center for Science Tekniikantie 15 a D 02101 Espoo Finland Mauno Vihinen Institute of Medical Technology Lenkkeilijänkatu 8 33520 Tampere Finland Garry Wong A. I. Virtanen -institute University of Kuopio 70211 Kuopio Finland Contents 7 Contents Preface 5 List of Contributors 6 I Introduction 14 1 Introduction 15 1.1 Why perform microarray experiments? 15 1.2 What is a microarray? 15 1.3 Microarray production 16 1.4 Where can I obtain microarrays? 17 1.5 Extracting and labeling the RNA sample 19 1.6 RNA extraction from scarse tissue samples 19 1.7 Hybridization 20 1.8 Scanning 20 1.9 Typical research applications of microarrays 21 1.10 Experimental design and controls 22 1.11 Suggested reading 23 2 Affymetrix Genechip system 25 2.1 Affymetrix technology 25 2.2 Single Array analysis 25 2.3 Detection p-value 26 2.4 Detection call 26 2.5 Signal algorithm 26 2.6 Analysis tips 27 2.7 Comparison analysis 27 2.8 Normalization 28 2.9 Change p-value 28 2.10 Change call 29 2.11 Signal Log Ratio Algorithm 29 3 Genotyping systems 31 3.1 Introduction 31 8 DNA microarray data analysis 3.2 Methodologies 31 3.3 Genotype calls 32 3.4 Suggested reading 33 4 Overview of data analysis 34 4.1 cDNA microarray data analysis 34 4.2 Affymetrix data analysis 35 4.3 Data analysis pipeline 35 5 Experimental design 38 5.1 Why do we need to consider experimental design? 38 5.2 Choosing and using controls 38 5.3 Choosing and using replicates 39 5.4 Choosing a technology platform 39 5.5 Gene clustering v. gene classification 40 5.6 Conclusions 41 5.7 Suggested reading 41 6 Basic statistics 42 6.1 Why statistics are needed 42 6.2 Basic concepts 42 6.2.1 Variables 42 6.2.2 Constants 42 6.2.3 Distribution 42 6.2.4 Errors 43 6.3 Simple statistics 43 6.3.1 Number of subjects 43 6.3.2 Mean (m) 43 6.3.3 Trimmed mean 43 6.3.4 Median 43 6.3.5 Percentile 44 6.3.6 Range 44 6.3.7 Variance and the standard deviation 44 6.3.8 Coefficient of variation 44 6.4 Effect statistics 44 6.4.1 Scatter plot 44 6.4.2 Correlation (r) 45 6.4.3 Linear regression 46 6.5 Frequency distributions 47 6.5.1 Normal distribution 47 6.5.2 t-distribution 49 6.5.3 Skewed distribution 49 6.5.4 Checking the distribution of the data 50 Contents 9 6.6 Transformation 51 6.6.1 Log 2 -transformation 52 6.7 Outliers 52 6.8 Missing values and imputation 53 6.9 Statistical testing 54 6.9.1 Basics of statistical testing 54 6.9.2 Choosing a test 55 6.9.3 Threshold for p-value 55 6.9.4 Hypothesis pair 55 6.9.5 Calculation of test statistic and degrees of freedom 56 6.9.6 Critical values table 57 6.9.7 Drawing conclusions 57 6.9.8 Multiple testing 57 6.10 Analysis of variance 58 6.10.1 Basics of ANOVA 58 6.10.2 Completely randomized experiment 58 6.11 Statistics using GeneSpring 60 6.11.1 Simple statistics 60 6.11.2 Tranformations 60 6.11.3 Scatter plot and histogram 60 6.11.4 Correlation 61 6.11.5 Linear regression 61 6.11.6 One-sample t-test 62 6.11.7 Independent samples t-test and ANOVA 62 6.12 Suggested reading 64 II Analysis 65 7 Preprocessing of data 66 7.1 Rationale for preprocessing 66 7.2 Missing values 66 7.3 Checking the backgroundreading 68 7.4 Calculation of expression change 69 7.4.1 Intensity ratio 69 7.4.2 Log ratio 70 7.4.3 Fold change 71 7.5 Handling of replicates 71 7.5.1 Types of replicates 71 7.5.2 Time series 71 7.5.3 Case-control studies 72 7.5.4 Power analysis 72 7.5.5 Averaging replicates 72 7.6 Checking the quality of replicates 72 [...]... expression data These aspects are discussed in Chapter 14 Data file manipulation and analysis tools are introduced in Chapter 15 4.2 Affymetrix data analysis Putting model-based methods aside, exploratory data analysis using Affymetrix chips is very similar to cDNA microarray data analysis The biggest difference is normalization If comparison analysis (see 2.7) is conducted, the Affymetrix data can be... This chapter was written by Janna Saarela 34 4 DNA microarray data analysis Overview of data analysis In this book, we emphasize to microarray data analysis after the microarrays have been hybridized, scanned and the images have been analyzed with an image analysis software Before any experiments in the laboratory have been initiated, the experiment and its analysis should be planned carefully Chapter... ordered series of samples (DNA, RNA, protein, tissue) The type of microarray depends upon the material placed onto the slide: DNA, DNA microarray; RNA, RNA microarray; protein, protein microarray; tissue, tissue microarray Since the samples are arranged in an ordered fashion, data obtained from the microarray can be traced back to any of the samples This means that genes on the microarray are addressable... outlines the basic analysis steps of exploratory data analysis 4.1 cDNA microarray data analysis We start data analysis from the results of scanned images At this point, images have been evaluated, bad spots have been investigated and the spots have preferably been scored with flags indicating whether the spot was good, bad, or borderline This is crucial, because in the later stages of the analysis the visual... Finnish DNA microarray Center Norwegian Microarray Consortium Ontario Cancer Institute www.helsinki.fi/biochipcenter microarrays.btk.utu.fi www.med.uio.no/dnr /microarray/ www.microarrays.ca Figure 1.2: Work flow of a typical expression microarray experiment 1 Introduction 19 1.5 Extracting and labeling the RNA sample A typical workflow of the microarray experiment has been summarized in Figure 1.2 Once microarrays... facility, microarrays can be produced at your own convenience Microarrays once made store well in dark dessicated plastic slide boxes Some manufacturers suggest storage at −20 ◦ C while others find room temperature adequate The shelf life of microarrays has been claimed to be up to 6 months although this has not been empiracally tested 18 DNA microarray data analysis Table 1.1: Places where microarrays... aldehydes or primary amines) that help to stabilize the DNA 16 DNA microarray data analysis onto the slide, either by covalent bonds or electrostatic interactions An alternative technology allows the DNA to be synthesized directly onto the slide itself by a photolithographic process This process has been commercialized and is widely available DNA microarrays are used to determine 1 The expression levels... beginning of the challenging part of the data analysis Next we need to link the observations to biological data, to regulation of genes, and to annotations of functions and biological processes This part, data mining, is described in Chapters 11, 12 and 13 4 Overview of data analysis 35 With an enormous amount of data, we need standardized systems and tools for data management in order to publish the... Metspalu, A.,Peltonen, L., Syvanen, A-C (1997) Minisequencing: a specific tool for DNA analysis and diagnostics on oligonucleotide arrays Genome Res 7, 606-614 24 DNA microarray data analysis 10 Schena, M Shalon, D., Davis, R W., and Brown, P O (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray Science 270, 467-470 This chapter was written by Garry Wong 2 Affymetrix... corrected cDNA data If single array analysis is performed, the basic normalization scheme can be similar to the one presented in section 8.9 4.3 Data analysis pipeline To get an overview of the data analysis pipeline, consult Figure 4.1 It covers the basic methods introduced in this book The flow chart also helps to choose the right method for the situation The Table 4.1 contains a short list of analysis . GARRY WONG EDITORS JARNO TUIMALA AND M. MINNA LAINE CSC DNA Microarray Data Analysis DNA Microarray Data Analysis DNA Microarray Data Analysis Editors Jarno Tuimala M. Minna Laine CSC, the Finnish. 31 8 DNA microarray data analysis 3.2 Methodologies 31 3.3 Genotype calls 32 3.4 Suggested reading 33 4 Overview of data analysis 34 4.1 cDNA microarray data analysis 34 4.2 Affymetrix data analysis. at Picaset Oy Helsinki 2003 DNA microarray data analysis 5 Preface This is the first edition of the DNA microarray data analysis guidebook. Although inventedin the mid-90s, DNA microarrays are still

Ngày đăng: 11/04/2014, 09:40

TỪ KHÓA LIÊN QUAN