16S metagenomics analysis of endometrial microbiota in vietnamese woman (khóa luận tốt nghiệp)

VIETNAM NATIONAL UNIVERSITY OF AGRICULTURE FACULTY OF BIOTECHNOLOGY ******* GRADUATION THESIS “16S METAGENOMICS ANALYSIS OF ENDOMETRIAL MICROBIOTA IN VIETNAMESE WOMAN” HA NOI, 2021 VIETNAM NATIONAL UNIVERSITY OF AGRICULTURE FACULTY OF BIOTECHNOLOGY ******* GRADUATION THESIS “16S METAGENOMICS ANALYSIS OF ENDOMETRIAL MICROBIOTA IN VIETNAMESE WOMAN” Student’s name : Le Doan Quoc ID : 610760 Class : K61CNSHE Supervisor : Dr Pham Dinh Minh Prof Phan Huu Ton HA NOI, 2021 STATEMENT OF ORIGINAL AUTHORSHIP The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other education institution To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made Signature: Date: ACKNOWLEDGEMENTS This thesis, like any other, would not have been possible without the involvement and support of many people They have all earned my deepest gratitude, even if it has been poorly expressed at times First and foremost, my sincere thank goes to my principal supervisors, Dr Pham Dinh Minh and Prof Phan Huu Ton, for their ongoing support and suggestions through my graduation thesis Without them, this project would not have been possible I would like to thank to all Staff at Gentis Company who have had to put up with me going through all kinds of emotional swings also deserve my thanks They were all very friendly and helpful to me, I was lucky to study in a great working environment Last, but not least, I thank my family and friends for all their help, encouragement and for supporting me through all these years TABLE OF CONTENT STATEMENT OF ORIGINAL AUTHORSHIP i ACKNOWLEDGEMENTS TABLE OF CONTENT i LIST OF ABBREVIATIONS Error! Bookmark not defined LIST OF TABLES LIST OF FIGURES Error! Bookmark not defined PART ONE: INTRODUCTION 1.1 Introduction 1.2 Aims and requirements 12 1.2.1 Aims 12 1.2.2 Requirements 12 PART TWO: LITERATURE REVIEW 13 2.1 The importance of 16S metagenomics 13 2.2 Choosing the Appropriate Target Region Error! Bookmark not defined 2.3 Taxonomic Classification: The Growing of 16S rRNA Databanks Error! Bookmark not defined 2.4 Bioinformatics pipelines for micrbial 16S rRNA amplicon sequencingError! Bookmark not defined 2.5 Human Endometrial Microbiota Error! Bookmark not defined 2.5.1.Relevant researches over the world Error! Bookmark not defined 2.5.2.Asia Error! Bookmark not defined 2.5.3 ASEAN Error! Bookmark not defined 2.5.4 Vietnam Error! Bookmark not defined PART THREE: MATERIALS AND METHODS 27 3.1 Location and duration of the study 27 3.2 Materials 27 3.2.1 DNA Extraction Error! Bookmark not defined 3.2.1.1 Samples Error! Bookmark not defined 3.2.1.2 DNA Purification from Body Fluids Error! Bookmark not defined 3.2.2 Amplication V3-V4 region on 16S micrbial genesError! Bookmark not defined 3.2.3 Library Preparation Error! Bookmark not defined 3.2.3.1 PCR Clean-up Error! Bookmark not defined 3.2.3.2 Index PCR Error! Bookmark not defined 3.2.3.3 PCR Clearn-up Error! Bookmark not defined 3.2.3.4 Library quantification, Normalization, and PoolingError! Bookmark not defined 3.3 Methods Error! Bookmark not defined 3.3.1 Data analysis Error! Bookmark not defined 3.3.1.1 Amplicon bioinformatics: from raw reads to tablesError! Bookmark not defined 3.3.1.2 Filter and trim Error! Bookmark not defined 3.3.1.3 Learn the Error Rates Error! Bookmark not defined 3.3.1.4 Merge paired reads Error! Bookmark not defined 3.3.1.5 Construct sequence table Error! Bookmark not defined 3.3.1.6 Remove chimeras Error! Bookmark not defined 3.3.1.7 Assign taxonomy Error! Bookmark not defined PART IV RESULTS AND DISCUSSION 36 4.1.Bioinformatics pipeline Error! Bookmark not defined 4.1.1.Quality score of sequences Error! Bookmark not defined 4.1.2 Filter and Trim Error! Bookmark not defined 4.1.3 Finding true sequence variants Error! Bookmark not defined 4.1.4 Merging Paired Reads Error! Bookmark not defined 4.1.5 Contructing ASV table and removing chimerasError! Bookmark not defined 4.1.6 Assign taxonomy Error! Bookmark not defined 4.1.7 Visualizing alpha-diversity Error! Bookmark not defined 4.1.8 Abundance bar plot Error! Bookmark not defined 4.2 Discussion Error! Bookmark not defined PART FIVE REFERENCES Error! Bookmark not defined LIST OF ABBREVIATIONS µL microlitre Bp Base pair CP Common region CR Common region DNA Deoxyribonucleic acid dNTP Deoxynucleotide triphotphate dsDNA Double DNA mL milliliter PCR Polymerase Chain Reaction RCA Rolling circle amplification RE Restriction enzyme REP Replacation protein rpm revolutions per minute NGS Next Generation Sequencing ssDNA Singe strand DNA TAE Tris – acetate – EDTA Taq Thermus aquatic ABSTRACT Diagnosis of bacteria with NGS methods has facilitated the research of low biomass microbiomes in tissues and organs previously considered sterile, for instance, the endometrium Therefore, an abnormal endometrial microbiota has been proved to link with implantation failure, pregnancy loss, and other gynecological and obstetrical conditions Future investigation of the endometrial microbiota could enhance a further insight of bacterial communities’ role in both physiology and pathophysiology, with the hope that can aid the ability to increase of pregnancy rate and create a healthy pregnancy 16S rRNA gene sequencing, or simply 16S sequencing, utilizes PCR to target and amplify portions of the hypervariable regions (V1-V9) of the bacterial 16S rRNA gene1 In this project, sequences generated from Illumina MiSeq was created with V3-V4 region After sequencing, raw data is analyzed with a bioinformatics pipeline which includes trimming, error correction, and comparison to a 16S reference database In this study, the method is applied for a small sample of three Vietnamese Women with the hope that could lead to a little contribution to the development of research relevant to endometrial microbiota as well as study using 16S metagenomics pipeline PART ONE: INTRODUCTION 1.1 Introduction The main part of microbial cells that can be seen in microscope and shown to be living with various staining procedures cannot be invoked to create colonies on Petri plates Actually, there was only 0.1 to percentage of the living bacteria when culture soils with standard conditions Furthermore, when culture microbes in aquatic environments the number of microbes could be thousand times lower Moderate and high throughput nutrients were put to help the existing problems But the result of cultivation of microbe in isolation continues to be low The problem could be harder because for most of the time these bacteria need community to grow Under laboratory cultivation conditions, this always favor the identification of organisms that can live best with these special nutrients This mean in turn not appropriate for those microbes which are dominant or most influential in the native environment of bacteria These limitations have given the rise of culture-independent methods for identifying and enumerating microbes in the community Over the decades these gradually have played a bigger role in the isolation of bacteria The technique has been used widely is rRNA phylotyping, which is predominant among microbes This 3.2.1.3 Learn the Error Rates - The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates The learnErrors method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors) - The error rates for each possible transition (A→C, A→G, …) are shown Points are the observed error rates for each consensus quality score The black line shows the estimated error rates after convergence of the machine-learning algorithm 33 The red line shows the error rates expected under the nominal definition of the Qscore Here the estimated error rates (black line) are a good fit to the observed rates (points), and the error rates drop with increased quality as expected Everything looks reasonable and we proceed with confidence 3.3.1.4 Merge paired reads - Merging the forward and reverse reads together to obtain the full denoised sequences Merging is performed by aligning the denoised forward reads with the reverse-complement of the corresponding denoised reverse reads, and then constructing the merged “contig” sequences By default, merged sequences are only output if the forward and reverse reads overlap by at least 12 bases, and are identical to each other in the overlap region (but these conditions can be changed via function arguments) 3.3.1.5 Construct sequence table - Constructing an amplicon sequence variant table (ASV) table, a higherresolution version of the OTU table produced by traditional methods 3.3.1.6 Remove chimeras - The core dada method corrects substitution and indel errors, but chimeras remain Fortunately, the accuracy of sequence variants after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs Chimeric sequences are identified if they can be exactly reconstructed by combining a leftsegment and a right-segment from two more abundant “parent” sequences 34 3.3.1.7 Assign taxonomy - It is common at this point, especially in 16S/18S/ITS amplicon sequencing, to assign taxonomy to the sequence variants The DADA2 package provides a native implementation of the naive Bayesian classifier method for this purpose The assignTaxonomy function takes as input a set of sequences to be classified and a training set of reference sequences with known taxonomy, and outputs taxonomic assignments with at least minBoot bootstrap confidence 35 PART IV RESULTS AND DISCUSSION 4.1 Bioinformatics pipeline The DADA2 R package implements the full amplicon workflow: filtering, dereplication, sample inference, chimera identification, and merging of paired-end reads 4.1.1 Quality score of sequences The sequencing quality score of a given base, Q, is defined by the following equation: Q = -10log10(e) where e is the estimated probability of the base call being wrong   Higher Q scores indicate a smaller probability of error Lower Q scores can result in a significant portion of the reads being unusable They may also lead to increased false-positive variant calls, resulting in inaccurate conclusions Relationship Between Sequencing Quality Score and Base Call Accuracy Quality Score Probability of Incorrect Base Call Inferred Base Call Accuracy 10 (Q10) in 10 90% 20 (Q20) in 100 99% 30 (Q30) in 1000 99.9% The below demonstrates the QC of forward and reverse sequences As from the figure, reads are at their best quality above 30 which are listed as in the Illumina MiSeq sequencer Most Illumina sequencing data shows a trend of decreasing average quality towards the end of sequencing reads From the QC 36 table, the next step in the pipeline will be defined according to where the position of reads got decreasing quality In this experiment, the first nucleotides will be truncated in order to assure the output 37 4.1.2 Filter and Trim In this case, although toward some of the end of reads in the Y-45 sample the quality decreasing but that could be acceptable However, at the beginning of each reads in samples the quality is reduced so these should be removed The number of reads after Removing low quality reads 38 4.1.3 Finding true sequence variants Sequence variants are drawn from unique sequences These actually generated from repetitive reads in each sample There are 56 sequence variants inferred from 2584 input unique sequences Forward Reads Sample 16798 reads in 2584 unique sequences Reverse Reads 16798 reads in 2858 unique sequences Forward Reads Sample 11898 reads in 1902 unique sequences Reverse Reads 11898 reads in 2414 unique sequences Sample Sample Forward Reads 5155 reads in 756 unique sequences Reverse Reads 5155 reads in 1622 unique sequences Forward Reads 2769 reads in 656 unique sequences Reverse Reads 2769 reads in 1136 unique sequences Numbers of unique sequences in each sample 4.1.4 Merging paired reads In this step, merging without overlapping algorithm is applied due to this is appropriate to the design of experiment In fact, amplicons were sequences from the both end toward the central of the target sequence and the length of V3-V4 region is around 443 bps 39 4.1.5 Constructing ASV table and removing chimeras From samples, there are 190 ASVs obtained In addition, 71 bimeras were identified out of 190 input sequences Here when accounting for the abundances of those variants, they account for only about 6% of the merged sequence reads 4.1.6 Assign taxonomy In this step, a training set of reference sequences with know taxonomy was implemented Silva and RDP 16S databases were used for the species-assignment training fastas 40 4.1.7 Visualizing alpha-diversity With the Alpha Diversity Measure Table, sample showed a great diversity compared to the rest Interestingly, the negative control sample – sample also has a relatively high diversity This could be the contaminations However, to ensure the reason, further experiment should be put into investigating 4.1.8 Abundance Bar Plot - 119 ASVs sequence with the length of 280 bp The composition of microbiota in samples This only include top 20 abundance microbiota out of 119 due to the limitation on the display of screen size 41 Sample – Y45 Sample – NEG Sample – ERA2- Sample – ERA302 Lactobacillaceae, Carnobacteriaceae, Brevibacteriaceae, Moraxellaceae Clostridiaceae, Flavobacteriaceae, Intrasporangiaceae, Intrasporangiaceae, Moraxellacea, Moraxellaceae, Nocardioidaceae, Rhizobiaceae Rhizobiaceae, Spingomonadaceae 03 Brevibacteriaceae, Clostridiaceae, Flavobacteriaceae, Moraxellaceae, Rhizobiaceae Several studies confirm that a vaginal microbiota rich in Lactobacillus spp without bacterial vaginosis, either clinical or subclinical, leads to more positive outcomes with ART (Babu et al., 2017; Eckert et al., 2003; Mangot-Bertrand et al., 2013; Moore et al., 2000) Haahr et al (2016a) studied the microbiota of 84 women undergoing IVF and found a strong relationship between microbiota composition and pregnancy In addition, Moraxellaceae was reportedly linked with endometriosis in women8 A vast number of papers has been published with the 42 increasing evidence of the correlation between endometrial microbiota and gynecology and fertility heath This little project could be a tiny contrition for further research to take advantage of Illumina NGS technology The contamination in negative control sample could be explained by the common fact that the contamination in the QIA kit was relatively diverse in comparison to the other kits, and included higher proportions of Flavobacteriaceae than the other kits7 43 4.2 Discussion Metagenomics method used in this study successfully identified a great number of bacteria To our knowledge, this is the first of such study aimed at identifying the microbiota of endometrium in small number of Vietnamese women using the Next Generation Sequencing technology This method allows a more complex view of the composition of the microbiota of endometrium and its environment with high precision of taxonomic classification using thousands of reads In this project, the use of kit could affect to the precision of microbiota so for more accurate microbiota further experiments should be invested to reduce the contamination For deeper searching at subspecies level, the V3-V5 fragments were recommended20 Regarding sensitivity and specificity analyses, both indicators exhibited higher values for the three taxonomic levels (subspecies, species and genus) when the largest internal fraction (V3–V5) was used, compared to V3–V4 Results also demonstrate that internal fragments lost specificity at the deepest taxonomic level (i.e., species and subspecies) Moreover, V3–V5 fragments registered the lowest proportion of false positive results, followed by V3–V4 fragments respectively In fact, with the available DADA2 pipeline the downstream analysis of 16S metagenomics makes the work of investigating microbiota more precise, flexible or even deeper and further as it gives researchers free customization of the software not like license software – MiSeq Reporter from Illumina 44 PART FIVE REFERENCES Laudadio I, Fulci V, Stronati L, Carissimi C Next-Generation Metagenomics: Methodological Challenges and Opportunities Omics a Journal of Integrative Biology 2019 23(7): 327-333 Sharon G, Segal D, Ringo JM, Hefetz A, Zilber-Rosenberg I, Rosenberg E Commensal bac-teria play a role in mating preference of Drosophila melanogaster Proc Natl Acad Sci USA 2010; 107:2005 Jost T, Lacroix C, Braegger CP, Chassard C New insights in gut microbiota establishment in healthy breast-fed neonates PLoS One 2012 Prodan, A., Tremaroli, V., Brolin, H., Zwinderman, A H., Nieuwdorp, M., & Levin, E (2020) Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing PLOS ONE, 15(1), e0227434 doi:10.1371/journal.pone.0227434 Benner, M., Ferwerda, G., Joosten, I., & Van der Molen, R G (2018) How uterine microbiota might be responsible for a receptive, Fertile endometrium Human Reproduction Update, 24(4), 393-415 doi:10.1093/humupd/dmy012 16S Metagenomic Sequencing Library Preparation – Available online: https://support.illumina.com/documents/documentation/chemistry_document ation/16s/16s-metagenomic-library-prep-guide-15044223-b.pdfb (accessed on 22 February 2021) Salzberg, S (2014) Faculty opinions recommendation of reagent and laboratory contamination can critically impact sequence-based microbiome analyses Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature https://doi.org/10.3410/f.725233607.793501834 Khan, K N., Fujishita, A., Masumoto, H., Muto, H., Kitajima, M., Masuzaki, H., & Kitawaki, J (2016) Molecular detection of intrauterine 45 microbial colonization in women with endometriosis European Journal of Obstetrics & Gynecology and Reproductive Biology, 199, 69-75 https://doi.org/10.1016/j.ejogrb.2016.01.040 Prodan, A., Tremaroli, V., Brolin, H., Zwinderman, A H., Nieuwdorp, M., & Levin, E (2020) Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing PLOS ONE, 15(1), e0227434 https://doi.org/10.1371/journal.pone.022743410 Kolbert CP, Persing DH (1999) Ribosomal DNA sequencing as a tool for identification of bacterial pathogens Current Opinion in Microbiology (3), 299–305 11 Human Endometrial Microbiota at Term of Normal Pregnancies Claudia Leoni,1,† Oronzo Ceci,2,† Caterina Manzari,3,† Bruno Fosso,3 Mariateresa Volpicella,1,3 Alessandra Ferrari,2 Paola Fiorella,2 Graziano Pesole,1,3 Ettore Cicinelli,2,* and Luigi Ruggiero Ceci3,* 12 Uterine Microbiota: Residents, Tourists, or Invaders? James M Baker,1,2 Dana M Chase,3 and Melissa M Herbst-Kralovetz1,4,* 13.Mallik, S., Akashi, H., & Kundu, S (2015) Assembly constraints drive Coevolution among ribosomal constituents Nucleic Acids Research, 43(11), 5352-5363 14 How uterine microbiota might be responsible for a receptive, fertile endometrium Marilen Benner, Gerben Ferwerda, Irma Joosten, Renate G van der Molen 15.Lloyd-price J, Mahurkar A, Rahnavard G et al Strains, functions and dynamics in the expanded Human Microbiome Project Nature 2017 46 16 The microbiota continuum along the female reproductive tract and its relation to uterine-related diseases (https://www.nature.com/articles/s41467-01700901-0) 17 Vaginal microbiome profiles of pregnant women in Korea using a 16S metagenomics approach (https://onlinelibrary.wiley.com/doi/10.1111/aji.13124) 18 Metagenomic analysis of bacterial community structure and diversity of lignocellulolytic bacteria in Vietnamese native goat rumen (https://pubmed.ncbi.nlm.nih.gov/28920414/) 19 Dietary coconut water vinegar for improvement of obesity-associated inflammation in high-fat-diet-treated mice (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5642190/) 20 Martínez-Porchas, M., Villalpando-Canchola, E., & Vargas-Albores, F (2016) Significant loss of sensitivity and specificity in the taxonomic classification occurs when short 16S rRNA gene sequences are used Heliyon, 2(9), e00170 https://doi.org/10.1016/j.heliyon.2016.e00170 47

Định dạng
Số trang	48
Dung lượng	1,68 MB