Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 129 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
129
Dung lượng
2,27 MB
Nội dung
ORTHOGONAL PARTIAL LEAST SQUARES
DISCRIMINANT ANALYSIS IN METABOLOMICS
FOR KIDNEY AND CATARACT DISEASE
CHARACTERIZATION
CHEW AI PING
(B.Sc. (Hons.), NUS)
A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF CHEMISTRY
NATIONAL UNIVERSITY OF SINGAPORE
2012
Declaration
I hereby declare that this thesis is my original work and
it has been written by me in its entirety.
I have duly acknowledged all the sources of information
which have been used in the thesis.
This thesis has also not been submitted for any degree
in any university previously.
__________________________
Chew Ai Ping
25 June 2012
i
Acknowledgements
It is my honour to thank the following who have made this thesis possible.
Firstly, I thank Professor Sam Li, my main supervisor, for the support and
patient guidance these few years, from the start of the project to the end of
the write-up for this thesis.
I also thank Dr. Ong Eng Shi for being the co-supervisor for this project, and
for starting me on this project with the kind and thoughtful help in obtaining
and running the samples.
I also thank Professor Ong Choon Nam, NUS, for kindly agreeing to release
the samples, and for his prompt replies to my questions and offering
assistance in any way possible.
I thank my lab mates for their support in my studies, research, and also for
giving valuable advice where needed. They are Drs. Lau Hiu Fung, Law Wai
Siang, Tok Junie, Zuo Xinbing, Wu Huanan, Liu Feng, Grace Birungi, Jiang
Zhangjian, Ms Elaine Tay, Ms Fang Guihua, Ms Gan Peipei, Ms Lü Min, Ms
Huang Yan, Mr Jon Ashley, Mr Chen Baisheng, and Mr Lin Junyu. I also
thank Mr Ting Aik Leong, whose help in running the samples has also made
this thesis possible.
ii
I thank the National University of Singapore for giving me the financial support
and the chance to take up this degree under the Research Scholarship
programme.
I sincerely thank the pastors, full-time staff, elders, leaders, and brothers and
sisters of the Tabernacle Church and Missions, Singapore, for loving,
teaching, guiding, and spurring me towards completing my thesis.
I sincerely thank my family for their unfailing support and love given these few
years while I undertook studies for my Master’s degree.
Finally, all thanks and glory be to God, who has made all things possible
through Him and in Him.
iii
Table of Contents
Declaration ...................................................................................................... i
Acknowledgements....................................................................................... ii
Table of Contents ......................................................................................... iv
Summary ...................................................................................................... vii
List of Tables .............................................................................................. viii
List of Figures............................................................................................... ix
List of Abbreviations ................................................................................... xii
List of Symbols........................................................................................... xiii
Chapter 1 Introduction ................................................................................. 1
1.1 Metabolomics ....................................................................................... 1
1.1.1 Overview...................................................................................... 1
1.1.2 Metabolomics in Disease Diagnosis ............................................ 2
1.1.3 Non-targeted and Targeted Approaches in Metabolomics........... 4
1.1.4 Using Urine for Metabolomic Analysis ......................................... 6
1.2 Analytical and Separation Techniques in Metabolomics ...................... 8
1.2.1 Nuclear Magnetic Resonance...................................................... 8
1.2.2 Mass Spectrometric Techniques in Metabolomics ..................... 10
1.2.3 Separation Techniques in Metabolomics ................................... 12
1.2.3.1 Overview............................................................................ 12
1.2.3.2 Gas Chromatography ........................................................ 12
1.2.3.3 High Performance Liquid Chromatography........................ 14
1.3 Chemometrics in Metabolomics ......................................................... 16
1.3.1 Overview.................................................................................... 16
1.3.2 Principal Component Analysis ................................................... 18
iv
1.3.3 Partial Least Squares/ Projection to Latent Structures............... 19
1.3.4 Orthogonal Partial Least Squares Discriminant Analysis ........... 20
1.3.5 Pre-treatment of Data for Chemometric Analysis....................... 21
1.4 Chronic Kidney Disease ..................................................................... 24
1.4.1 Overview of Chronic Kidney Disease......................................... 24
1.4.2 Diagnosis of Chronic Kidney Disease ........................................ 27
1.4.3 Metabolomics and Chemometrics for Chronic Kidney Disease . 30
1.5 Cataract Disease................................................................................ 31
1.5.1 Overview of Cataract Disease ................................................... 31
1.5.2 Diagnosis of Cataract Disease................................................... 32
1.5.3 Metabolomics and Chemometrics for Cataract Disease ............ 33
1.6 Approach and Scope of Study............................................................ 34
Chapter 2 Materials and Methods ............................................................. 36
2.1 Materials............................................................................................. 36
2.2 Urine Sample Collection..................................................................... 36
2.3 Equipment and Procedure for HPLC-MS/MS ..................................... 36
2.4 Extraction and Normalization of Chromatogram Peak Areas ............. 37
2.5 Chemometric Analysis........................................................................ 38
2.6 Statistical Analysis.............................................................................. 39
Chapter 3 Results and Discussion for Chronic Kidney Disease ............ 40
3.1 Results for Chronic Kidney Disease ................................................... 40
3.1.1 Results for Control vs. Chronic Kidney Disease ESI+ Dataset .. 40
3.1.2 Results for Control vs. Chronic Kidney Disease ESI- Dataset ... 51
3.1.3 Results for Combined ESI+ and ESI- Dataset ........................... 60
3.2 Discussion for Chronic Kidney Disease.............................................. 66
v
3.3 Summary............................................................................................ 74
Chapter 4 Results and Discussion for Cataract Disease ........................ 76
4.1 Results for Cataract Disease.............................................................. 76
4.1.1 Results for Control vs. Cataract Disease ESI+ Dataset ............. 76
4.1.2 Results for Control vs. Cataract Disease ESI- Dataset .............. 83
4.1.3 Results for Combined ESI+ and ESI- Dataset ........................... 90
4.2 Discussion for Cataract Disease ........................................................ 95
4.3 Summary............................................................................................ 98
Chapter 5 Conclusion and Future Work ................................................... 99
References ................................................................................................. 103
vi
Summary
This thesis shows how metabolomics and multivariate statistical methods
such as Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA)
can be used to study and enhance understanding of two diseases.
The study utilizes univariate and multivariate statistical techniques to
determine the differences in a targeted set of metabolites for healthy controls
and two groups of diseased persons. Urine samples were collected from
healthy controls and patients suffering from chronic kidney disease (CKD).
High performance liquid chromatography-tandem mass spectrometry analysis
was
performed
on
each
sample,
and
chromatographic
and
mass
spectrometric data were obtained. After pre-treatment of the data through
normalization and scaling, principal component analysis and OPLS-DA were
used to visualize the differences in these two classes. Further statistical
analysis was employed to determine fluctuations in target metabolites to
understand disease pathology, and also to identify potential biomarker
candidates for CKD. This same method was also employed for a separate
group of patients suffering from cataract disease for further validation.
The thesis is then concluded with a summary of the main findings, a
discussion on the challenges faced, and suggestions for future work in
metabolomic studies of CKD and cataract disease.
vii
List of Tables
Table 1 Description of stages of chronic kidney disease (adapted from [50,
117, 128], originally from [131]) .................................................................... 28
Table 2 Metabolites identified in human urine samples for Controls and CKD
patients in ESI+ mode ................................................................................... 48
Table 3 Metabolites identified in human urine samples for Controls and CKD
patients in ESI- mode .................................................................................... 58
Table 4 Metabolites identified in human urine samples for Controls and
cataract disease patients in ESI+ mode ........................................................ 81
Table 5 Metabolites identified in human urine samples for Controls and
cataract disease patients in ESI- mode ......................................................... 88
viii
List of Figures
Figure 1 Representative TICs (ESI+) of (A) Control, and (B) Patient with CKD.
...................................................................................................................... 40
Figure 2 (A) PCA scores plot for Control ESI+ data; (B) DModX scores plot
for Control ESI+ data; (C) PCA scores plot for CKD ESI+ data. .................... 42
Figure 3 PCA scores plot for Control and CKD ESI+ dataset ....................... 43
Figure 4 OPLS-DA scores plot for Control against CKD ESI+ dataset ......... 44
Figure 5 Cross-validation scores plot for Control and CKD ESI+ dataset ..... 45
Figure 6 Random permutation test scores plot for Control and CKD ESI+
dataset........................................................................................................... 46
Figure 7 (A) VIP and (B) Loadings plot for Control-CKD ESI+ dataset. Interval
bars denote the jack-knife confidence intervals for each metabolite.............. 50
Figure 8 Representative TICs (ESI-) of (A) Control, and (B) Patient with
Chronic Kidney Disease. ............................................................................... 51
Figure 9 (A) PCA scores plot for Control ESI- data; (B) DModX scores plot for
Control ESI- data; (C) PCA scores plot for CKD ESI- data............................ 53
Figure 10 PCA scores plot for Control and CKD ESI- dataset ...................... 54
Figure 11 OPLS-DA scores plot for Control against CKD ESI- dataset ........ 55
Figure 12 Cross-validation scores plot for Control and CKD ESI- dataset.... 55
Figure 13 Random permutation test scores plot for Control and CKD ESIdataset........................................................................................................... 56
Figure 14 (A) VIP and (B) Loadings plot for Control-CKD ESI- dataset.
Interval bars denote the jack-knife confidence intervals for each metabolite. 59
Figure 15 (A) PCA scores plot for Control combined ESI+/ESI- data; (B)
DModX scores plot for Control combined ESI+/ESI- data; (C) PCA scores plot
for CKD combined ESI+/ESI- data ................................................................ 61
Figure 16 PCA scores plot for Control and CKD combined dataset.............. 61
Figure 17 OPLS-DA scores plot for Control against CKD combined dataset 62
Figure 18 Cross-validation scores plot for Control and CKD combined dataset
...................................................................................................................... 63
Figure 19 Random permutation test scores plot for Control and Cataract
Disease combined dataset ............................................................................ 64
ix
Figure 20 (A) VIP and (B) Loadings plot for Control and CKD combined
dataset. Interval bars denote the jack-knife confidence intervals for each
metabolite. ..................................................................................................... 65
Figure 21 Representative TICs (ESI+) of (A) Healthy Control, and (B) Patient
with Cataract Disease.................................................................................... 76
Figure 22 (A) PCA scores plot and (B) DModX scores plot for Cataract
Disease ESI+ data......................................................................................... 77
Figure 23 PCA scores plot for Control and Cataract Disease ESI+ dataset . 78
Figure 24 OPLS-DA scores plot for Control against Cataract Disease ESI+
dataset........................................................................................................... 79
Figure 25 Cross-validation scores plot for Control and Cataract Disease ESI+
dataset........................................................................................................... 79
Figure 26 Random permutation test scores plot for Control and Cataract
Disease ESI+ dataset .................................................................................... 80
Figure 27 (A) VIP and (B) Loadings plot for Control-Cataract Disease ESI+
dataset. Interval bars denote the jack-knife confidence intervals for each
metabolite. ..................................................................................................... 82
Figure 28 Representative TICs (ESI-) of (A) Healthy control, and (B) Patient
with Cataract Disease.................................................................................... 83
Figure 29 (A) PCA scores plot for Cataract Disease ESI- data; (B) DModX
scores plot for Cataract Disease ESI- dataset ............................................... 84
Figure 30 PCA scores plot for Control and Cataract Disease ESI- dataset .. 85
Figure 31 OPLS-DA scores plot for Control against Cataract Disease ESIdataset........................................................................................................... 86
Figure 32 Cross-validation scores plot for Control and Cataract Disease ESIdataset........................................................................................................... 86
Figure 33 Random permutation test scores plot for Control and Cataract
Disease ESI- dataset..................................................................................... 87
Figure 34 (A) VIP and (B) Loadings plot for Control-Cataract Disease ESIdataset. Interval bars denote the jack-knife confidence intervals for each
metabolite. ..................................................................................................... 89
Figure 35 (A) PCA scores plot and (B) DModX scores plot for Cataract
Disease combined ESI+/ESI- data ................................................................ 90
Figure 36 PCA scores plot for Control and Cataract Disease combined
ESI+/ESI- dataset.......................................................................................... 91
x
Figure 37 OPLS-DA scores plot for Control and Cataract Disease combined
dataset........................................................................................................... 92
Figure 38 Cross-validation scores plot for Control and Cataract Disease
combined dataset .......................................................................................... 92
Figure 39 Cross-validation scores plot for Control and Cataract Disease
combined dataset .......................................................................................... 93
Figure 40 (A) VIP and (B) Loadings plot for Control and Cataract disease
combined datasets. Interval bars denote the jack-knife confidence intervals for
each metabolite. ............................................................................................ 94
xi
List of Abbreviations
ANN
Artificial neural network
CKD
Chronic kidney disease
CV
Cross validation
DA
Discriminant analysis
EIC
Extracted ion chromatogram
ESI+/-
Electrospray ionization (positive/ negative mode)
GC
Gas chromatography
GFR
Glomerular filtration rate
HPLC
High performance liquid chromatography
LC
Liquid chromatography
MS
Mass spectrometry
MS/MS
Tandem mass spectrometry
NMR
Nuclear magnetic resonance
OPLS
Orthogonal projections to latent structures/ orthogonal partial
least squares
PCA
Principal component analysis
PLS
Partial least squares/ Projection to latent structures
RT
Retention time
SIMCA
Soft independent modelling of class analogy
SPE
Solid-phase extraction
TIC
Total ion chromatogram
UPLC
Ultra Performance Liquid Chromatography
VIP
Variable importance plot/ Variable influence on projection
xii
List of Symbols
D-crit
Critical distance
DModX
Distance to the model in X-space
m/z
Mass-to-charge ratio
Q2X(cum)
Cross-validation parameter representing the predictability of
the model
Q2Y(cum)
Cross-validation parameter showing the cumulative predicted
variation in the Y matrix, representing the predictive ability of
the model
R2X(cum)
Cumulative modelled variation in the X matrix, representing
the total explained variance in the model
R2Y(cum)
Coefficient of determination of OPLS-DA model, showing the
cumulative modelled variation in the Y matrix, and
representing the goodness of fit of the model in explaining the
variation by the components in the model
t[a]
X-score of component a in the model
tcv
Cross-validated X-score of component a in the model
to
Orthogonal X-scores of (uncorrelated) component in the
OPLS-DA model, also representing within class variation
tp
Predictive component in the OPLS-DA model, also
representing between-class variation
w[a]
Loading vector of component a
X
Matrix of predictor variables
Y
Matrix of response variables
xiii
Chapter 1 Introduction
1.1 Metabolomics
1.1.1 Overview
Metabolomics is the area of study that is concerned with the metabolome,
which comprises small molecule components (of size less than 1 kDa [1])
associated with the biochemical processes of a given organism [2]. Examples
of such small molecules include simple sugars, fatty acid amides, and amino
acids. The presence of and quantity of these metabolites are a reflection of
what goes on within and outside the cell. The goal of metabolomics is not only
to determine the disease pathology, the role of metabolites in the biochemistry
of the organism, and potential biomarkers, but also ultimately to determine the
molecular structure of these biomarkers [3]. Overall, metabolomic studies
greatly aid our understanding of the biology of an organism at a systems level.
Nicholson et al. in their landmark paper have defined metabolomics as the
study of the “dynamic multiparametric metabolic response of living systems to
pathophysiological stimuli or genetic modification” [4], helping us to
understand how living systems actually work. Specifically, when organisms
are under a state of stress as a result of disease (“pathophysiological stimuli”)
or perturbations to the genetic content of the cells (“genetic modification”),
metabolomics as a discipline becomes useful [5]. Knowledge of the cellular
response under such conditions helps researchers identify potential
therapeutic targets. In this manner, therapy does not end with just
symptomatic treatment to just address the metabolic flux but indeed has the
end-goal of a total cure in mind.
1
Metabolomics, or metabonomics, has been gaining new ground in the field of
systems-level “omics” research [6], i.e. genomics, transcriptomics, and
proteomics. It is a relatively new area of study compared to its sister
disciplines [7] and complements these fields [8]. The advantage of
metabolomics over its counterparts is that the metabolome is much more
closely correlated to the actual cellular response than the genome or
proteome [3, 7, 8]. Also, the amount of data generated is less due to the lower
number of metabolites compared to the number of genes or proteins [2, 7]. In
addition, each metabolite may be involved in one or several pathways,
contributing to the complex expressed phenotype of the organism. It is this set
of downstream biochemical pathways, and not just a single pathway, that
metabolomics aims to map out [2, 8]. This allows us to obtain a more timely
and accurate understanding of cellular and systemic processes.
1.1.2 Metabolomics in Disease Diagnosis
Metabolomics is increasingly becoming a valuable tool to study disease
pathology, and to screen, diagnose and determine the effect of treatment on
diseases as well. A wide variety of diseases is being studied, with the
analytical methods used varying with the disease and the aim of the study.
For example, Jung et al. have successfully used proton NMR (1H-NMR) and
targeted metabolic profiling with multivariate analysis to distinguish patients
with cerebral infarctions from healthy controls by analysis of urine and plasma
[9]. Also, Kim et al. have combined toxicology with metabolomics to determine
urinary biomarkers for human gastric cancer using a mouse model [10]. In a
2
recent study, Bao et al. have also devised a novel method of measuring the
systemic effects of various drug treatments on type 2 diabetes mellitus (T2DM)
instead of just obtaining the conventional glucose measurement for T2DM [11].
Further, in an attempt to obtain a comprehensive understanding of and to
diagnose renal cell carcinoma, Kind et al. have successfully utilized various
separation techniques coupled with mass spectrometry and subsequent
multivariate analysis to analyze and discriminate patient urine from healthy
controls in a small pilot study [3]. Given the increasing number of parameters
in metabolomic analysis, there is an even greater need for reliable and
informative multivariate techniques to analyse this data.
The combination of multivariate statistical tools with metabolomics has been
shown to be powerful for disease screening involving non-targeted
determinations. One such study of interest is that by Michell et al. In their
metabolomic analysis of Parkinson’s disease patient serum and urine
samples, they were able to separate female Parkinson’s patients from their
age-matched controls using partial least squares discriminant analysis (PLSDA) based on the urine data, despite not finding strong individual biomarkers
responsible for this separation. They surmise that there is a unique metabolic
pattern of Parkinson’s disease contributed by certain metabolites [12]. Also, in
a separate study by Kemperman et al., they observe that while multivariate
statistical analysis was able to show discriminatory peptide peaks, univariate
analysis failed to show these as discriminatory due to “a very large biological
variation among the proteinuric patient group” [13]. These studies show the
3
necessity of multivariate techniques in view of the nature of samples and data
obtained.
1.1.3 Non-targeted and Targeted Approaches in Metabolomics
There are two general approaches towards metabolomic studies – nontargeted and targeted. Non-targeted or global profiling approaches in
metabolomics aim to capture as many features of an organism’s metabolic
profile as possible. This approach allows researchers to obtain a holistic
picture of the types and concentrations (relative or absolute) of the
metabolites, so that comparisons can be made between study groups in order
to determine patterns of changes which are useful for diagnosis [14].
Non-targeted approaches as that in metabolic fingerprinting may not identify
the specific metabolites involved in disease pathology, but consider the total
combination of analytes and their concentrations in totality [15]. This approach
allows for the “simultaneous analysis of multiple end products”, allowing for a
“more powerful and robust means by which to stratify disease severity,
progression and to assess drug efficacy than the analysis of any single
marker over a patient population” [16]. For example, Vallejo et al. have used
capillary electrophoresis coupled with ultraviolet detection and subsequent
metabolic fingerprinting to distinguish between normal rats and diabetic rats
on antioxidant treatment [17]. Issaq et al. have also successfully utilized
metabolomic profiling with high performance liquid chromatography-mass
spectrometry (HPLC-MS) to detect bladder cancer using urine samples in
their proof-of-concept study. Their study does not use the traditional
4
techniques which are less sensitive towards low-grade tumours (i.e. through
urine cytology) or more invasive in terms of methodology (i.e. cystoscopy) [5].
Novel biomarkers may also be identified in non-targeted approaches, e.g. by
structural studies through NMR or tandem mass spectrometry.
Given the knowledge of metabolites and their interactions in specific
biochemical pathways, one can also capitalise on targeted approaches to
study specific metabolites or groups of metabolites [18] using reference
spectra for analysis [19]. The duration for post-acquisition data processing
and identification of metabolites are shorter as well [19]. Metabolite and
pathway databases and search engines such as the Human Metabolome
Database [20], Kyoto Encyclopaedia of Genes and Genomes database [21,
22] and the METLIN Metabolite Database [23] are useful resources in this
area of pathway analysis. Researchers can also make use of targeted
analysis to determine how the concentrations of particular metabolites in a
system vary with concentration changes of other metabolites. For example,
Grison et al. have successfully used targeted profiling to determine a
metabolic signature for chronic caesium exposure [24]. Also, Wu et al. have
compared the metabolite profiles of salt-tolerant and salt-intolerant soybean
plants, and through multivariate analysis, have found that secondary
metabolites such as isoflavones and saponins distinguished these two
varieties [25]. One limitation of this targeted approach is that since only known
metabolites can be identified and quantified, it is not possible to discover
novel compounds as biomarkers through this approach [18]. Yet, the
5
numerous successes using this approach show that there is a need and use
for such targeted studies.
1.1.4 Using Urine for Metabolomic Analysis
While many types of body fluids (biofluids) have been used for metabolomic
studies, the choice of biofluid is highly dependent on the disease being
studied. The choices of biofluid include blood serum [12, 26-29], plasma [27,
30-32], cerebrospinal fluid [33], urine [3, 5, 7, 12, 29, 34-43], saliva [44, 45],
tears [46], and even vitreous humour [15]. Urine has an advantage of being
easily obtained in large enough volumes for multiple analyses [1, 18, 47]. It is
also one of the least invasive body fluids to collect from patients [10], allowing
for multiple collections at different times [18], and at the minimal level of
discomfort to study subjects [1]. Furthermore, urine is the biofluid through
which the majority of metabolic waste products are excreted from most organ
systems in the body and therefore can provide much information about the
body’s biochemical processes as a whole system [3], since it is not subject to
strict homeostatic regulation as is serum [48]. In addition, obtaining or
preparing urine samples is usually more straightforward than for other
biofluids such as blood [49], serum [18], plasma [1], or tears.
In addition, the concentrations of metabolites are often higher in urine [47],
which makes it easier for determination and detection. It has also been found
that in the study of renal diseases, measurements of kidney function are
generally more accurate when using urine measurements than plasma,
provided sufficient and accurate volumes of urine samples can be obtained
6
[50]. Metabolic changes that take place at the cellular level are easily reflected
in the urine, as, other than blood, it is the biofluid which most of the kidney is
exposed to [2, 3].
Urine as a biofluid for analysis, however, also has its disadvantages. There
may be large variations in terms of volume and therefore the degree of
dilution of metabolites [17], resulting in a very wide dynamic range [1] and
concentration differences of five-thousand fold or more [17, 51]. These
differences represent natural variation, and may be exacerbated under
conditions of disease [1]. In addition, as with other body fluids, the
concentrations of metabolites may not correspond to their importance in
disease pathology [17]. Also, xenobiotics may be present [1], and these may
or may not be directly related to the organism’s core metabolism; if so, they
may provide valuable information on the varied interactions of the organism
with its environment. The analysis method chosen must therefore be able to
deal with these problems associated with metabolic studies involving urine, in
addition to being reliable and reproducible [14].
Despite these limitations in terms of variation of urine volume, metabolite
concentration differences, and the presence of xenobiotics, urine has been
one of the choice candidates for metabolomics studies. This is because the
advantages of using urine for this current study far outweigh the
disadvantages, as will be discussed further in the foregoing sections. and is
therefore the choice of body fluid for this study of chronic kidney disease
(CKD).
7
1.2 Analytical and Separation Techniques in Metabolomics
1.2.1 Nuclear Magnetic Resonance
The expanding area of research in metabolomics can also be attributed to the
improvement in technologies that allow for sensitive, specific, and
reproducible studies to be carried out. Traditionally, NMR was, and still is, a
major analysis technique employed for the purposes of mapping the
metabolome [18, 52]. The main advantage that NMR affords is its
reproducibility [53] over different runs and across different instruments [2] and
its ability to detect a wide range of metabolites [19], allowing for the building of
compound libraries [46]. Furthermore, sample preparation is usually minimal
[2, 19] and non-destructive [8], and analysis times are short as well [8]. NMR
is also able to analyse intact tissue through high-resolution magic-angle
spinning [54].
In addition, NMR allows for the molecular structures of biomarkers to be
discerned in two-dimensional structural studies [55]. It allows researchers to
determine metabolite profile patterns through metabolic fingerprinting to
classify groups of subjects without actual identification of the molecules
involved [53]. For example, Brindle et al. have used 1H-NMR to successfully
profile human serum for the accurate diagnosis of coronary heart disease [56],
while Keun et al. have successfully used
13
C-NMR to investigate urine in
metabolomic studies [57]. Further, Kang et al. have also successfully used
NMR with orthogonal partial least squares discriminant analysis (OPLS-DA) –
a multivariate statistical tool – to discriminate between Korean and Chinese
herbal medicines [58].As the number of variables being analysed increases, it
8
is apparent that multivariate tools become necessary in order to obtain a more
complete understanding of the systems being studied. These multivariate
tools must also allow for a logical and systematic way of handling the
information obtained. It is with this thought in mind that multivariate statistical
techniques feature in our study, and which will be further reviewed in this
chapter.
However, a main drawback of NMR is its inherent lack of analytical sensitivity
[2, 8, 46], which results in the inability to detect metabolites which have a
concentration lower than 5 µM [18]. Spin-spin coupling also causes
complications in data interpretation [19]. Several recent advances in NMR
technology include microprobes and miniature probe coils for smaller volumes
of sample [53] and cryoprobes for better sensitivity and shorter acquisition
times [53, 59]. However, high-throughput profiling does not seem possible if
the issues of complicated spectra and difficult compound identification are to
be resolved [19]. In addition, the high cost and space requirements of
equipment [48] may also mean that not all laboratories will be appropriately
equipped. Therefore, while NMR is a very useful and powerful technology for
metabolite profiling, it does not allow for high-throughput studies on
metabolites of very low concentrations. In view of these considerations, other
analytical methods such as mass spectrometry and chromatographic
separation techniques need to be considered.
9
1.2.2 Mass Spectrometric Techniques in Metabolomics
MS is a useful and often necessary tool for the identification and quantification
of metabolites in metabolomics investigations [2] through their molecular
fragmentation patterns [18]. It also has a higher sample salt tolerance than
NMR and has improved to reach picomole detection levels [2]. In addition, MS
with its inherent higher sensitivity than NMR also means that it is usually used
for targeted studies [8]. It must, however, be noted that MS and NMR
techniques are complementary as each has their own limitations and
advantages.
For MS, the choice of ionisation techniques is very important as it needs to be
suitable for the type of metabolites under study. Commonly used ionization
techniques include electron ionization (EI), chemical ionization (CI), and
atmospheric ionization. For EI, it is the most commonly used ionization
technique with gas chromatography (GC), but suffers from the lack of
molecular ions for some compounds [60]. A common alternative would
therefore be CI, which helps to produce the molecular ions for compounds
that do not do so with EI [60].
The most commonly used atmospheric
ionization technique, electrospray ionization (ESI) [61], is suitable for the
metabolomic analysis of urine as the metabolites are usually polar or ionic.
The lack of extensive fragmentation also means that molecular ions can be
detected with high sensitivity [1]. ESI also has both positive and negative-ion
modes, allowing for wider coverage of metabolites [1]. ESI is therefore the
choice of ionisation technique for this study.
10
The choice of mass analysers is also dependent on the type of metabolic
analysis being carried out. If “global, untargeted metabolic profiling studies”
are to be carried out, high resolution mass analysers such as time-of-flight
(TOF) and quadrupole-TOF (Q-TOF) instruments are suitable for resolution of
“co-eluting metabolites having the same nominal mass” [1]. On the other hand,
if targeted studies on a select group of metabolites are to be carried out, lowresolution mass analysers such as single quadrupole, triple quadrupole, and
ion trap instruments are sufficient for detection and quantification of the
metabolites being investigated [1]. Triple-quadrupole instruments are usually
chosen over single-quadrupole mass analyzers due to the former’s higher
sensitivity and ability for selective reaction monitoring [62]. As this study
utilises a targeted approach, a triple quadrupole mass analyser would be
sufficient for detection and quantification of the metabolites under study.
As the composition of biofluids is highly complex, it is advantageous for
metabolomic studies to utilise separation techniques prior to MS analysis.
Direct infusion techniques have been used in metabolomics to determine
metabolic profiles in both plants [63] and animals with high sensitivity and
selectivity [53], but the quality of chromatograms is usually adversely affected
by matrix effects [18, 46], incomplete ionization [18], or ion suppression or
enhancement [64]. Although direct injection techniques such as desorption
ESI and extractive ESI could be used to counter matrix effects encountered in
urine analysis [18], coupling MS with an orthogonal separation method is
allows for a more complete and accurate measurement of the metabolites
present [1]. In light of these considerations and the complexity of the samples
11
obtained, it was decided that a separation method prior to ionization was
necessary to improve the quality of data obtained.
1.2.3 Separation Techniques in Metabolomics
1.2.3.1 Overview
The advancement of metabolomics has also been largely supported by
developments in separation and mass spectrometric technologies. As the
number of variables under investigation increases, there is a need to
distinguish and analyse each metabolite separately so that a more accurate
understanding of the condition being studied can be obtained [2]. Separation
technologies include chromatographic methods such as HPLC, gas
chromatography (GC), and capillary electrophoresis. Other extraction
methods also have earned favour with metabolomics, and these include solid
phase extraction [65]. The use of GC and HPLC in metabolomics will be
discussed in more detail in this sub-section.
1.2.3.2 Gas Chromatography
GC, specifically capillary GC, has been widely used in the field of
metabolomics in conjunction with mass spectrometry due to the reproducible
high quality spectra obtained using this method [66]. As Kind et al. note, GCMS has been extensively used since the 1970s [3]. Compound libraries for
reference can be compiled in-house, obtained from commercial sources, or
imported from other external sources such as the National Institutes of
Science and Technology [61]. These extensive libraries available make the
combination of GC-MS a tool of choice for identification of metabolites [46].
12
GC-MS has been widely used in metabolomic studies involving multivariate
determination of the diseased state through analysis of urine. Halket et al.
have explored a method of determining urinary organic acids using GC-MS
with pattern recognition techniques to identify metabolic disorders [67]. Zhang
et al. have also successfully used multivariate OPLS-DA modelling to
determine 40 differentiating metabolites for osteosarcoma in GC-MS analysis
of serum and urine, as well as discern the energy metabolism disruptions
through their targeted analysis [29]. More recently, Pasikanti et al. have
devised and validated a method where GC-MS was coupled with principal
component analysis (PCA) and OPLS-DA to differentiate between genders
based on a global metabolomic analysis on urine samples [49]. These studies
show the utility of GC-MS coupled with multivariate statistical tools in
metabolomic studies.
However, for GC, sample derivatization is necessary to obtain volatile forms
of the non-volatile analytes [18]. As in the study by Pasikanti et al., sample
derivatization using BSTFA and the presence of co-eluting compounds made
it difficult to identify especially the low-abundance metabolites [18]. Sample
pre-treatment to remove interfering molecules and to enhance the
concentration of desired metabolites is therefore one main limitation
associated with techniques such as GC-MS [18] as well as HPLC-MS. Many
metabolites in urine are non-volatile and polar or ionic, and tedious sample
derivatization is required prior to GC analysis [1] in order to decrease their
polarity and increase volatility [66]. Thermal degradation of metabolites may
13
also occur due to the high temperatures utilised in GC [46], further warranting
the need for sample derivatization [66]. In the case of urine, urease treatment
is necessary to protect the column and enhance the quality of spectra
obtained [18, 66]. These preparation steps complicate and lengthen the total
amount of time needed for analysis, and unwanted artefacts may also be
introduced.
1.2.3.3 High Performance Liquid Chromatography
Compared with GC, high performance liquid chromatography (HPLC) has also
been extensively used in research, as reviewed by Kind et al. [3]. As with GC,
HPLC is frequently coupled with MS in analysis. The combination of
orthogonal separation techniques improves separation and identification of
metabolites [3].
Once thought of as only a potentially powerful tool for metabolomics [68],
HPLC-MS has proven to be very useful in this field. For example, Jia et al.
have used HPLC-MS to determine the plasma phospholipid profiles of mice
with Immunoglobin A nephropathy – which is “the most common form of
glomerulonephritis” [69]. They have found that the combination of HPLC-MS
with multivariate modelling by PCA and partial least-squares discriminant
analysis (PLS-DA) is successful in differentiating healthy mice from their
diseased counterparts and identifying relevant biomarkers [69]. The inherent
sensitivity, specificity and efficiency of MS [69] coupled with the high peak
capacity of HPLC have made it possible to accurately determine large
numbers of metabolites in a short length of time [46]. In other works, Plumb et
14
al. have successfully used HPLC-MS to screen rat urine in drug development
and detect drug metabolites in biological fluids [31, 70]. Idborg-Björkman et al.
have also used HPLC-MS and two-way data analysis to screen for biomarkers
in rat urine [71].
Similarly, Yang et al. have similarly used HPLC-based metabonomics in the
diagnosis of liver cancer to decrease the false-positive rate [72], and further
built on this work by exploring strategies for HPLC-based metabonomics
research [73]. Furthermore, Chen et al. in their targeted analysis of urine
metabolites have utilised Rapid Resolution Liquid Chromatography (RRLC)
and multivariate analysis, and leveraged on metabolic correlation networks to
determine potential biomarkers of breast cancer and gain a greater
understanding on the interactions between the putative biomarkers [74]. Yin et
al. have also used a similar method to study liver cirrhosis and hepatocellular
carcinoma [75]. Therefore, HPLC has proven to be a necessary and powerful
tool for studies involving disease screening and diagnosis.
In addition, urine, which is the choice of biofluid for this study, is particularly
suitable for analysis by reversed-phase HPLC-MS [1]. As mentioned
previously, urine contains many dissolved non-volatile analytes with various
degrees of polarity. Urine can be injected directly into the column either in a
diluted or neat form [14]. Apart from removing particulates and appropriate
dilution, there is minimal sample preparation for HPLC-MS analysis of the low
molecular weight metabolites [1, 14]. Although compound identification is
more difficult than in GC-MS due to the lack of standard reference spectra
15
libraries [61], the use of reference standards coupled with the use of
metabolite databases containing reference spectra alleviates this difficulty
associated with HPLC-MS.
Therefore, comparing GC-MS and HPLC-MS, it is felt that the use of HPLCMS would be more advantageous due to the nature of the target metabolites
for this study. There is no need for derivatization of analytes in HPLC-MS,
unlike GC-MS [76], reducing the likelihood of mistakes introduced in the
sample preparation step, and increasing the likelihood of accurately
identifying novel compounds [48]. Also, the overall duration for sample
analysis and post-processing would be lower as time is not needed for sample
derivatization.
1.3 Chemometrics in Metabolomics
1.3.1 Overview
The trend towards having many variables (such as chromatographic and
spectroscopic information) describing one observation (each sample) has
been fuelled by the advances in the above-mentioned analysis technologies,
as well as a need to determine disease pathology not in terms of only one
metabolite but as a combination of metabolite responses in various
physiological states. Moreover, there is a large dynamic range of metabolite
concentrations, and the most important metabolites may not be the most
abundant ones [77]. Appropriate multivariate chemometrics tools therefore
have to be employed in order to summarise, interpret, and visualise this
wealth of data generated from metabolomic experiments [77].
16
Chemometrics techniques are useful tools to help the researcher understand
the acquired data. The underlying principle of chemometrics technologies is
this – mathematical operations and transformations are used to determine if
there are underlying patterns or trends in multi- or megavariate data, known
as latent variables [78]. These latent variables are able to summarise the
variation shown in the data as the absolute data obtained are usually highly
collinear [78]. In addition, these latent structures may or may not be the
variables being measured themselves.
Alternatively, a priori information
about the sample can be used in the analysis, and the data summarised to
show whether the measured variables are correlated with the known
information. Multivariate analysis methods are therefore necessary as they
can help researchers to determine underlying patterns across large sets of
data, and also because disease causation is seldom due to a single
metabolite, but are usually multifactorial in origin [2].
Many chemometrics techniques have been used in the field of metabolomics.
These multi- and megavariate techniques can be broadly divided into a few
categories, namely supervised and unsupervised methods. These include
hierarchical classification analysis [79], PCA [80], linear discriminant analysis,
PLS methods [81], OPLS [82], soft independent modelling of class analogy
[83], support vector machines [84] and artificial neural networks (ANNs) [85].
The tool of choice is largely dependent on two factors: the nature of data
obtained, and the nature of information that is needed about the data obtained.
Given these tools, the researcher must still make wise decisions based on
17
careful experimental and study designs, and the researcher’s assumptions will
also affect the kind of data-processing performed using these chemometrics
tools.
1.3.2 Principal Component Analysis
The tool of choice for preliminary visualisation of underlying trends in data is
PCA [86]. PCA has been described as the “workhorse of chemometrics” [87].
Indeed, PCA renders itself useful as it allows researchers to not only visualise
data in a reduced dimensionality, it also shows up any groupings or clustering
in the samples observed, representing similarities in samples [2, 34]. This also
explains its use in what is known as exploratory data analysis [86], and is
useful in giving the researcher an overview of the data [17].
In addition, PCA is also able to show differences in data by calculation of
orthogonal components to maximise the variance [5], represented as
separation between different groups or clusters along the orthogonal
components [2, 5]. Also, outliers are easily recognised in the score plots, and
these inform researchers if there is a need to remove these samples [12, 86].
If classification information about the samples is unavailable, PCA can be
used to show the trends and patterns that allow us to classify the samples
accordingly, and carry out further analysis. Examination of the loadings plots
will usually reveal the variable or variables most responsible for the groupings
observed [48]. It can be seen that PCA is an unsupervised technique, as class
information is not used in data analysis, as is done in supervised models.
18
PCA can also show whether such between-class variations are significant
enough to outweigh the within-class variation.
However, as stated, PCA is but a preliminary visualisation method. Being an
unsupervised technique, it tends to not be able to separate samples due to
large chemical noise and other sources of variation which may not be relevant
and also distracting, such as instrumental drift and artefacts [3]. There may
also be cases where the intra- or inter-subject variation is too large for
clustering to take place. There is therefore a need for supervised chemometric
tools which can help researchers focus on the relevant sources of variation
that are being studied [77].
1.3.3 Partial Least Squares/ Projection to Latent Structures
PLS, on the other hand, is an example of a supervised classification tool
which utilises known class information in data analysis. PLS is a powerful
form of multivariate analysis as it is able to handle data which are “strongly
collinear, noisy, and [contain] numerous X-variables” [88]. For metabolomics
studies, there are often two or more groups of samples being studied – these
could be control and diseased, different phases of disease, or different types
of treatment. These dependent variables are collectively termed as the Y
matrix, and may be discrete or continuous [77]. Discriminant analysis may
also be applied where the Y variables consist of variables denoting group
belonging. PLS when combined with discriminant analysis (PLS-DA)
maximises class separation and builds prediction models based on this
information given [2, 17].
19
1.3.4 Orthogonal Partial Least Squares Discriminant Analysis
OPLS is also a supervised classification tool that is a relatively new extension
of PLS [82]. Where class information is known about the samples, OPLS
becomes a very powerful dimension-reduction and visualisation tool. It affords
better interpretability and transparency compared to PLS, as the PLS model is
rotated [89] so that the variation in the data is separated into two components
– those that are related to the Y matrix (and therefore class separation [77]),
and those that are unrelated (orthogonal) to the Y matrix [77, 90-92]. Such
separation of components is important, as it allows researchers to understand
the main causes of variation that separate these two classes of samples by
relating it with known variables [34].
Like PLS, OPLS can also be used in conjunction with discriminant analysis in
metabonomics studies as well [93]. In this case, the values in the descriptive
Y matrix are dummy variables used solely to assign class belonging [94].
OPLS-DA
therefore
increases
class
separation,
interpretation,
and
identification of the metabolite information [34, 91]. For example, Whelehan
et al. have used it to detect ovarian cancer through analysis of the proteomic
profiles of 191 subjects [91], while Qiu et al. have used it to diagnose human
colorectal cancer [26]. These improvements therefore allow OPLS-DA to be
used as a chemometric tool for disease diagnosis.
For our investigation, we have chosen to use PCA and OPLS-DA as they are
linear methods, and produce models which are more easily interpreted than
20
those of non-linear methods such as ANNs [95, 96]. Also, as stated by
Wiklund et al., “multivariate models such as PLS and OPLS include both
statistical significance based on cross-validation and confidence intervals
based on jack-knifing estimations as well as magnitude and reliability of the
data provided by good visualization” [77]. These multivariate models more
powerful than univariate tests [77] as the latter does not show how groups of
biomarkers are more powerful than the individual biomarkers themselves [97].
1.3.5 Pre-treatment of Data for Chemometric Analysis
The data that are used in such pattern recognition techniques must also
necessarily be properly processed prior to the use of these techniques,
otherwise spurious correlations and patterns might be mistakenly identified.
For example, in the case of HPLC-MS data, peak retention times between
different runs must be aligned in the pre-processing analysis as retention
times and baselines may deviate from run to run [46]. Also, data reduction in
the form of peak-picking is necessary such that only the true analytical peaks
remain, reducing the noise in the spectra and correlations to structures
unrelated to class information [62].
Furthermore, it is imperative for the researcher to understand clearly the
nature of the data obtained so that the appropriate peak alignment tools and
scaling methods can be employed. Projection-based multivariate methods are
sensitive to the scaling of the data [77, 92], which is in turn dependent on the
data acquired for all samples. Scaling methods that could be used include
auto-, mean-centred, pareto, and level scaling [17]. Weiss and Kim in their
21
review of metabolomics in kidney disease also lend support that metabolomic
data need to be suitably scaled and transformed to attain symmetry and
normality [2].
In metabolomics, normalization is carried out in order to reduce systematic
errors in the data so that biologically significant changes in metabolite
concentrations may be discovered [98, 99]. Normalization also helps to make
the data obtained across samples comparable in size, such as correcting for
urine dilution effects [100]. There are generally six different methods of
normalization for mass spectra: (1) with reference to mean, (2) with reference
to median, (3) linear rescaling according to the largest and smallest values, (4)
with reference to total ion count, (5) with reference to the peak of maximum
intensity, and (6) with reference to an internal standard ([101, 102], cited in
[103]).
For studies using urine as the biofluid, there are also the options of
normalization to total urine volume, urine osmolality, range of peak intensities,
or to an endogenous compound such as creatinine [17, 18, 99]. Depending on
the nature of the dataset, the choice of normalization method is important as it
also affects the identity and ranking of biomarkers found [103]. It is known that
urine samples have a wide dynamic range in terms of the metabolite
concentrations. Therefore, even though it has been found that there is usually
a high level of consistency among methods in terms of the important
compounds identified [103], there is still a need to choose the best method of
normalization.
22
Normalization to total urine volume is usually not ideal as there are several
limitations in using this as the reference. This method tends to introduce
errors such as those due to inaccurate sample collection by controls and
patients, whether for 24-hour collections (especially for children [104]) or for
spot collections. It is also known that urine volume can be affected by
hydration level ([105, 106], cited in [34]). Also, as the concentrations of
metabolites changes with the total urine volume, normalization to total urine
volume may result in spurious correlations between metabolite concentrations
and disease state. Therefore, normalization to urine volume is not usually the
choice of reference as it varies too widely for meaningful comparisons to be
made, whether in an intra- or inter-subject manner [99].
In addition, it is not recommended to use a single compound such as
creatinine as the reference [13], since it may vary widely across individuals,
as is so especially in cases of kidney disease [18, 47]. Although urinary
creatinine excretion for each person is relatively stable, it is not a good
reference as many factors can affect its concentration in urine. Significant
changes in urine creatinine within a so-called healthy population have been
found in the study conducted by Saude et al [47]. It has also been shown that
creatinine fold change among the wider population can be highly varied as
well [104]. Further, it has been found that urinary excretion of creatinine is
affected especially in kidney disease due to its degradation in the body [18,
50]. These also lend support to the choice of not using urinary creatinine as
23
the choice of internal reference for the normalization of metabolite
concentrations.
As the correct choice of normalization can improve differentiation between
study groups, it is felt that normalization to a form of total ion count is
appropriate. This was observed in a study by Warrack et al [99], which
showed improved discrimination between dose groups in their study. They
recommend that urine samples be normalized to both the ‘mass spectrometry
total useful signal’ (MSTUS) as well as osmolality. In their study, it has been
found that normalizing to total urine volume or to creatinine levels actually
caused the group separation to become unclear, hence the recommendation
to use osmolality and the MSTUS instead [99]. Therefore, the current work in
normalizing to the 40 targeted metabolites and four m/z regions, instead of
normalizing to creatinine or urine volume, finds its support here as well.
1.4 Chronic Kidney Disease
1.4.1 Overview of Chronic Kidney Disease
CKD is a “life-threatening condition characterized by progressive and
irreversible loss of renal function” [107]. The term itself is non-generic, and
represents the declining kidney function which arises from various diseases
[108]. In terms of pathophysiology, kidney disease manifests itself in changes
in the “glomerular filtration rate (GFR), glomerular permeability, tubular
function, tubular damage, urinary reflux, obstruction to urinary flow, and
deposition of collagen” [109]. According to Eknoyan et al., it is a disease
which affects 5-10 % of the world population [110]. Sabanayagam has also
24
found that Singapore has a relatively high prevalence of CKD in her
population compared to the rest of the world, with a prevalence of 10.8%
([111], cited in [112]). Despite its high occurrence, there is a general lack of
awareness of the prevalence of CKD [113]. This underscores the need for an
investigation into possible methods of early detection based on local subjects.
In addition, the disease itself is also asymptomatic in its early stages, possibly
until when only 25% of GFR remains [109], making it difficult to treat as
patients’ conditions worsen [114]. Symptoms that may appear are also
generic and may not be a cause for alarm until quite late into the progression
of the disease [115].
Also, the rate of progression of disease can be
unpredictable as there is no one ‘standard’ rate at which the disease
progresses [50, 108, 109], and the rate of progression may also be dependent
on the individual, the underlying disease and other risk factors [50]. There is
therefore a need to understand the mechanisms of CKD so that earlier
diagnosis and effective treatments can be carried out.
Several risk factors that cause the incidence of CKD to be increased include
existing medical conditions such as hypertension and diabetes, poor choice of
lifestyle such as the use of tobacco, familial history of CKD, increased age,
and pre-natal conditions such as low birth weight [116]. In the United States,
approximately a quarter of non-diabetic chronic failure cases are caused by
hypertension alone [117]. It has also been found that Asians with a family
history of kidney disease are at a high risk of contracting non-diabetic kidney
disease as well [117]. In Singapore, there is an increasing rate of new patients
25
with end stage renal disease (ESRD), which is when renal transplants are
required for survival [118]. While there were 194 cases per million population
in 1999, the number vastly increased to 254 per million population in 2009
[118]. There has also been an increasing trend of diabetes-related ESRD,
with 58.8% of new ESRD cases being diabetics [118]. There is therefore a
need for methods to detect the early onset of CKD before ESRD sets in.
Besides diabetes, CKD has been found to be positively associated with other
debilitating or life-threatening conditions such as cardiovascular disease,
hypertension and other vascular diseases [116]. It has also been found that
inflammation, oxidative stress, insulin resistance and endothelial dysfunction
increase as the disease progresses, even from the early stages [119]. In
addition, when kidney function decreases, persons may suffer from other
physiological disturbances as metabolites, toxins, water, electrolytes, acidbase balance and endocrine function deviate from the norm [120]. Excess
retention of a number of solutes results in the uremic syndrome, which is a
state of increased oxidative stress [119] and a poor prognosis indicator. Also,
as the disease progresses, many bodily systems have to compensate for the
accumulation of metabolites within the blood, and the patient has to make
adjustments to his lifestyle, especially his diet, as his kidney function
decreases [119]. Clearly, many metabolic pathways are disrupted in kidney
disease patients [119], and there is a need for successful early screening
before the onset of more serious conditions.
26
It has also been found that most types of CKD lead to a common phenotype
[50, 107], and if untreated lead to chronic renal failure (CRF) [120-123]. CRF
has already reached epidemic levels [120, 124-127]. Current treatments are
ineffective as the mechanism of progression is not totally clear [108, 120], and
these treatments only address symptomatic issues to delay the onset of CRF.
Furthermore, when CKD has progressed to a substantial degree, the
underlying causes may be obscured such that it is difficult to determine the
root cause of CKD [115]. These make it even more important to diagnose
CKD in its early stages.
In the study of CKD, urine is arguably one of the best choices of body fluids to
be used [2, 7] as the kidneys are part of the urinary tract [7]. Hence, the
pathophysiological disturbances to the kidney in CKD would be closely and
clearly reflected in the metabolite composition of the urine. Also, other
system-wide biological effects will also be reflected in the urine as urine is not
a homeostatically controlled biofluid, and will therefore reflect these changes
as well [1]. Urine is therefore the choice of biofluid for this current CKD study.
1.4.2 Diagnosis of Chronic Kidney Disease
Diagnosis of CKD is accomplished by detection of abnormally high levels of
urinary protein, abnormal urinary sediments, abnormal results from imaging
tests or biopsies, and a measurement of the GFR [117]. Measurements of
GFR can be made through determining the levels of endogenous markers that
have a high correlation with the progression of the disease, such as serum
and urinary creatinine, blood urea nitrogen and urate [109, 128]. The serum
27
level of a low molecular weight endogenous protein, cystatin C, has been
found to be a more accurate early indicator of kidney dysfunction than that of
serum creatinine [115, 129]. Alternatively, exogenous markers such as
51
Cr
ethylenediamine tetra-acetic acid, iohexol, inulin, and iothalamate can be
used as well [109, 117, 128, 130]. Stages of the disease are assigned based
on the GFR, regardless of the aetiology. It has been advised that early
diagnosis and management of CKD (even before symptoms appear) are
important factors towards a better clinical outcome [113, 114, 131, 132].
Indeed, an accurate determination of the GFR is crucial as it best represents
the remaining amount of kidney function present in a person [50].
A diagnosis of CKD is made when the GFR falls below 60 mL/min/1.73 m2
body surface area for three or more months, with or without kidney damage
[131]. A brief description of the five stages of progression towards CKD are
summarised in Table 1. As the GFR decreases, the degree of severity of CKD
correspondingly increases [50]. When a patient reaches stage 5, renal
replacement therapy is necessary to sustain life, and preparation for therapy
commences usually in stage 4 [131].
Table 1 Description of stages of chronic kidney disease
(adapted from [50, 117, 128], originally from [131])
Stage Description
1
2
3
4
5
Kidney damage with normal or increased GFR
Kidney damage with mild decrease in GFR
Moderate decrease in GFR
Severe decrease in GFR
Kidney failure
GFR
(mL/min/1.73 m2)
≥ 90
60-89
30-59
15-29
< 15 or on dialysis
28
Albuminuria or proteinuria is often an indicator of both diabetic and nondiabetic kidney disease [114, 115, 117, 131, 133, 134]. Screening for CKD
can therefore be done in the form of a simple urine dipstick test, which detects
albuminuria of 300 mg/L [117]. It can also be done by measuring the urinary
albumin-to-creatinine (ACR) ratio [117]. If, over the course of three months,
the urine dipstick is positive, or if the urine ACR is increased, it means that the
person is suffering from CKD ([135] ,cited in [117]). However, one common
limitation associated with these tests (and other screening tests in general) is
that patients do not usually undergo routine screening unless they are known
to be at risk, or symptoms have started appearing. In the case of CKD, the
appearance of symptoms usually means that the CKD has progressed to a
rather severe stage, which usually necessitates immediate treatment.
As mentioned above, early diagnosis and determination of the rate of
progression of CKD promotes better clinical outcomes. It is with this thought in
mind that many researchers endeavour to come up with models which are
able to predict and diagnose CKD occurrence as well as determine the rate of
progression. The Cockcroft-Gault equation [136], Modification of Diet in Renal
Disease formula [50] and Brochner-Mortensen equation for children [130]
have all been used to help determine the GFR in CKD patients so that an
accurate diagnosis can be made, and subsequently the most effective
treatment plan can be implemented. Other predictive models of CKD
progression include those by Soares et al. for children [108] and that by
Chonchol et al. and Madero et al. which use uric acid levels [137, 138] . In
particular, Soares et al. found that two variables – a GFR lower than 30
29
mL/min and severe proteinuria – were indicators of poor clinical outcomes for
children with chronic renal insufficiency [108]. However, it must be noted that
all equations are derived from select populations, and need to be validated
prior to use in a new population; alternatively, new equations must be derived
if the current ones cannot be modified to suit the population under study [50].
Several biomarkers associated with CKD have been found by various
research groups. These include oxidative stress, insulin resistance,
hyperlipidemia, hyperuricemia, proteinuria, anemia, nitric oxide synthase
(NOS)/ asymmetric dimethylarginine (ADMA), aldosterone, tumour growth
factor β (TGFβ), and sympathetic nervous system activation (referenced by
[50]). More recently, determination of the levels of C-Reactive Protein (CRP)
in plasma has been found to be useful as well [139]. Some groups have also
studied combinations of biomarkers, as Peralta et al. have done, and found
that measuring creatinine, cystatin C, and the urinary ACR more accurately
determines the presence of CKD than with one of the markers alone [140].
Lederer and Ouseph note that CKD in its onset may not be diagnosed,
especially for older patients or patients who are chronically ill [116].
Metabonomics with the use of multivariate modelling therefore aims to screen
for potential patients based on the patterns in their urinary metabolite profiles,
and potentially avoiding biopsies which are highly invasive and non-routine.
1.4.3 Metabolomics and Chemometrics for Chronic Kidney Disease
As noted by Weiss and Kim, “the real, most tangible and immediate future
goal of the use of metabolomics in kidney disease, as with other renal
30
biomarker research, rests with its ability to predict disease occurrence before
either phenotypic changes or evidence of disease detected using standard
laboratory assays” [2]. Indeed, the ultimate goal of using more sensitive
instruments and sophisticated statistical techniques is to improve the current
state of detecting and diagnosing diseases in the early stages. Although it has
not been common practice to use metabolomics and chemometrics in the
study of kidney diseases [2], there are existing studies using these methods.
Jia et al. have successfully utilised UPLC-QTOF-MS with multivariate pattern
recognition techniques in a non-targeted serum metabonomics study to
classify controls and chronic renal failure patients [120]. It is to our knowledge
that while there have been many studies for the analysis of urine from CKD
patients, there has thus far been no comprehensive study on the targeted
metabolomic analysis of urine using HPLC-MS/MS and multivariate modelling
in screening for CKD.
1.5 Cataract Disease
1.5.1 Overview of Cataract Disease
Cataract disease is a condition whereby visual acuity is reduced due to the
lens of the eye turning opaque [141] as a result of malfunctioning of lens
metabolism [142]. In the 2010 report by the World Health Organisation
Prevention of Blindness and Deafness Programme, Pascolini and Mariotti
estimate that cataract disease is the second major cause of visual impairment
(33%), and is the main cause of blindness worldwide (51%) [143]. It is also “a
major contributing cause of low vision, blindness, and low visual function
scores” [144], after glaucoma. Cataract is known to be highly prevalent in
31
Singapore [144], and its incidence is likely to continue increasing due to the
aging population [145]. In addition, the incidence of cataract among the
younger population is increasing as well [146]. There is therefore a need for
more rapid ways of early detection for cataract disease.
In addition, cataracts tend to develop slowly, painlessly, and may not be
detectable until vision is noticeably affected [141, 147], partly due to the fact
that its signs and symptoms are common with other conditions [147]. The
causes of cataract disease are varied, including congenital conditions due to
infections and systemic malfunctions, adult cataracts due to aging, systemic
diseases such as diabetes mellitus or local eye disease such as uveitis,
trauma, or even unknown causes [148]. Also, the type and route of formation
of the cataracts are different as well, including nuclear, anterior or posterior
subcapsular, or cortical cataracts [148]. This underscores a need for better
understanding, early screening and treatment for cataract disease.
1.5.2 Diagnosis of Cataract Disease
Currently, diagnosis of cataract disease has to be done by an optometrist or
ophthalmologist through a visual acuity test and slit-lamp eye examination.
For the latter, the use of pupil-dilating eye drops is necessary [147].
Alternatively, tonometry, which involves testing the intra-ocular pressure of the
eyeball, may be done for a complete eye examination as well. The whole
procedure is painless, not very invasive, and takes about 45 minutes to an
hour. There has been some debate on the best method of the staging of
cataract progression, though common staging methods include the Lens
32
Opacities Classification System III and the Oxford Clinical Cataract
Classification
and
Grading
System,
which
are
based
on
slit-lamp
examinations and comparison against reference photographs [149]. However,
in the vast array of methods available, there is still a certain degree of
subjectivity, and it has been argued that it would be most useful clinically to
test the visual function of patients along with these objective measurements in
deciding whether surgery is necessary [149]. It would therefore be useful to
have a more objective way of screening for and staging the progression of
cataract disease before visual function is greatly reduced.
1.5.3 Metabolomics and Chemometrics for Cataract Disease
It has been only a very recent phenomenon that metabolomics or
metabonomics have been used in the study of the eye and its diseases [8].
One advantage of using metabolomics on body fluids is that the need for
extracting eye tissue samples is reduced, which is significant as it is difficult to
obtain such samples from the eye [8]. Chen et al. have recently performed
global metabolite profiling of human tears through HPLC-MS/MS, and have
also identified 60 metabolites in normal human tear fluid [46]. Tear fluid is
known to be challenging to analyse, not least because of the small volume
[150], but also because of the wide dynamic range of molecules in this
complex mixture [151]. The unprecedented work by Chen et al. signifies that
metabolomics is indeed versatile for helping researchers and clinicians
understand the expressed phenotype of an organism through its various
organs and systems.
33
As far as eye diseases are concerned, it is possible to use body fluids other
than those directly associated with the eye itself to diagnose and screen for
patients suffering from a particular disease. The body does not consist of
isolated systems which do not interact; it is indeed an integrated whole where
all the different bodily systems work together for the proper functioning of the
organism. Hammond et al. have recently reported their findings on the nontargeted metabolomic analysis of age-related nuclear cataract through GC
and HPLC-MS/MS analysis of plasma [32]. They report that cortical cataract is
strongly correlated with 3-methoxytyrosine, while nuclear cataract is strongly
correlated with laurate, 4-ethylphenyl sulphate and malate, reflecting the
change in metabolic pathways in the pathophysiology of cataract [32]. Their
work shows that it is definitely possible to study diseases of the eye using
body fluids other than those directly in contact with the eye.
1.6 Approach and Scope of Study
Based on the above, we have decided to use ESI with rapid polarity switching
coupled with a triple quadrupole for analysis of our forty targeted metabolites.
The speed, sensitivity, accuracy, range, ease of use, and medium throughput
make this combination suitable for targeted analysis of metabolites in urine
[14]. The multivariate HPLC-MS metabolomic data will then be analysed and
visualised by the chemometric tools PCA and OPLS-DA.
The approach of this study is as follows: urine samples from an existing local
cohort study are collected from healthy controls and patients suffering from
CKD.
HPLC-MS/MS
analysis
is
performed
on
each
sample,
and
34
chromatographic and mass spectrometric data are obtained. After pretreatment of the data, PCA and OPLS-DA will be used to visualise the
differences in these two classes. Further statistical analysis is also employed
to determine fluctuations in target metabolites to understand disease
pathology, and also identify potential biomarker candidates for CKD.
In order to show the applicability of this method for discrimination between
control and diseased groups utilising urine as the biofluid for analysis, it will
also be employed for a separate local cohort of patients suffering from
cataract disease. The urine samples were similarly obtained from an existing
local cohort study.
35
Chapter 2 Materials and Methods
2.1 Materials
HPLC grade methanol and acetonitrile were obtained from APS (Blacktown,
NSW, Australia). Pure water was obtained through an Millipore MilliQ water
system (Bedford, MA, USA). Formic acid was obtained from Merck
(Darmstadt, Germany).
2.2 Urine Sample Collection
Anonymised urine samples were obtained from the National University
Hospital, Singapore. Thirty samples each of healthy controls, patients with
chronic kidney disease (CKD), and patients with cataract disease were
acquired for this study. For this study, metadata such as physiological and
demographic information on the healthy and diseased subjects were not
released. All urine samples were stored at -20 °C until analysis.
2.3 Equipment and Procedure for HPLC-MS/MS
The liquid chromatography tandem mass spectrometry (HPLC-MS/MS)
method used follows that which was published by Law et al. [34, 38]. For the
analysis, 30 µL of each urine sample was diluted to 70 µL with deionised
water. The LC system used was an Agilent 1200 RRLC system (Waldbronn,
Germany) with a binary gradient pump, autosampler, column oven and diodearray detector. An Agilent 6410 triple quadrupole mass spectrometer was
coupled to this system. Gradient elution was performed using mobile phase (A)
0.1% formic acid in water and (B) 0.1% formic acid in acetonitrile. The
gradient profile used was 5% (B) at 0 min, 100% (B) in 10 min, and reverting
36
to initial conditions in 5 min, using a flow rate of 200 µL/min and an oven
temperature of 50 °C. For all analyses, 5 µL of sample were injected. A
reversed-phase Zorbax SB-C18, 50 × 2.0 mm, 1.8 µm (Agilent Technologies,
USA) was used for LC separation. Mass spectra were collected in both
positive and negative electrospray ionisation (ESI+ and ESI-) modes, with a
product ion m/z range of 100 to 800. Capillary temperature was set at 350 °C.
Drying gas flow rate was 10 L/min, and nebulizer nitrogen gas flow rate was
50 psi. Targets were set at 2 × 107 ions using automatic gain control. ESI
voltage was set at 4.5 kV, capillary voltage at 10 V, and lens tube offset at 0 V.
2.4 Extraction and Normalization of Chromatogram Peak Areas
To facilitate the chemometric analysis, retention time (RT), mass (m/z) and
peak area had to be extracted from the total ion chromatograms (TICs). This
was performed using the Agilent MassHunter Workstation Software
Qualitative Analysis programme (Version B01.02, Build 1.2.122.1, Agilent
Technologies, Inc., USA).
Extracted ion chromatograms (EICs) of m/z range 100-110, 200-210, 300-310,
and 400-410 were obtained from the TICs. The resultant EICs were subject to
manual peak detection and baseline-to-baseline integration. Chromatograms
were data-reduced by separating true analytical peaks from noise peaks and
baseline-to-baseline integration was performed on true peaks. Retention
times used were of chromatogram peak tops. Retention time-m/z pairs (RTm/z) of peak areas were tabulated in a Microsoft Excel® spreadsheet and
manually aligned with reference to the original chromatograms. The resultant
37
pre-processed data was a three-dimensional (3D) data table with retention
time, m/z ratios, and peak intensities. In addition, the relative concentrations
of 40 metabolites were determined for targeted analysis. The retention times
and m/z values were determined in previous works by Law et al. according to
their current HPLC-MS method [34, 38].
This 3D data were then tabulated into Microsoft Excel for every sample
analysed. Prior to further analysis, peak areas were normalised within each
sample to remove spurious correlations as a result of changes to urine
volume [99].
2.5 Chemometric Analysis
Normalised peak areas were then subject to chemometric analysis using the
SIMCA P+ software (Version 12.0, Umetrics, Umeå, Sweden) as described in
the foregoing paragraphs.
The normalised peak intensities were square-rooted, mean-centred, and
univariate scaled. All datasets were preliminarily screened using PCA to
ensure that there were no major outliers in the samples prior to further
chemometric analysis. Using the SIMCA P+ software, PCA models containing
two to four orthogonal components were obtained using the ‘Autofit’ setting.
For multivariate analysis by OPLS-DA, normalized data were also squarerooted, mean-centred and univariate scaled. Data were reduced to one
predictive component t and two or three non-predictive orthogonal
38
components to using the software’s ‘Autofit’ setting. Scores plots were
obtained for to against t. The R2X(cum) value obtained in the modelling
process is a measure of the amount of X variation explained by the model. In
addition, measures of fit to the Y data (R2Y(cum)) and predictive power
(Q2Y(cum)) were obtained for each model as well. If the R2Y(cum) and
Q2Y(cum) values are close to 1.0, it means that the model has good predictive
ability based on the X data [41]. The default seven-fold full cross validation
setting was used in construction of the OPLS-DA models using the SIMCA P+
software to minimise overfitting of the data to the constructed models [2, 7].
To further test the validity of the models, one third of all samples were used as
the test set, while the remaining samples were used to build the OPLS-DA
models.
2.6 Statistical Analysis
The normalised data were subjected to a two-tailed Mann-Whitney test using
the PASW-SPSS 18.0 statistical analysis software (SPSS, Chicago, IL, USA).
We took a P value of less than 0.05 as significant for consideration as a
potential biomarker, while a P value of less than 0.01 was considered very
significant for consideration as a potential biomarker. This non-parametric test
was chosen, as recommended by Gibbons and Chakraborti, since normality
of the data cannot be assumed and the sample size is rather small (30 for
each group) [152]. As this test uses rank values, extreme values will not affect
the data much [153].
39
Chapter 3 Results and Discussion for Chronic Kidney
Disease
3.1 Results for Chronic Kidney Disease
3.1.1 Results for Control vs. Chronic Kidney Disease ESI+ Dataset
A comparison of representative TICs obtained in the ESI+ mode for the
control and CKD datasets is as shown in Figure 1. Visual inspection of the
chromatograms shows the differences in the urine profile of healthy controls
and patients diagnosed with CKD. Based on the TICs alone, the time regions
of 2.3-3.0 min, 5.0-7.0 min, and 10.0-12.0 min show visible perturbations in
the diseased state.
A
B
Figure 1 Representative TICs (ESI+) of (A) Control, and (B) Patient with CKD.
40
To obtain an overview of underlying trends and potential outliers prior to
further multivariate statistical analysis, PCA was carried out for the control
ESI+ dataset. As shown in the scores plot for the control dataset (Figure 2A),
observation C9 lies outside the Hotelling’s T2 Range (significance level = 0.05)
and could be a potential outlier. In order to verify whether these two
observations should be excluded from subsequent analysis, a graph of the
distance of the observation to the X data plane (DModX) was plotted for this
observation (Figure 2B). Since the DModX value for observation C9 is below
the critical value (D-crit), it is not considered to be an outlier, and was kept for
subsequent analysis. A similar PCA plot was constructed for the CKD ESI+
dataset, and no significant outliers were found (Figure 2C). Hence, all the
observations for this dataset were retained for further analysis as well.
A
41
B
C
Figure 2 (A) PCA scores plot for Control ESI+ data; (B) DModX scores plot for Control ESI+
data; (C) PCA scores plot for CKD ESI+ data.
Multivariate statistical analysis using PCA resulted in a model which could not
satisfactorily separate the control and CKD classes based on the ESI+ data
alone (Figure 3). The constructed model had two components, designated as
t[1] and t[2]. R2X (cum) = 0.199, which indicated that 19.9% of the variation in
X (i.e. the profile of peak intensities of each variable for each sample) could
be explained by the model. A two-dimensional scatter plot of this
42
unsupervised model shows that the inter-class variation is concentrated in the
first principal component t[1]. However, the significant overlap between
classes signifies that there are other sources of variation in the samples, and
their contribution causes the inter-class variation to be insufficient for class
separation. The degree of overlap of the two classes shows that the variation
in metabolic profile is largely unrelated to the class differences [91]. This is
also reflected in the Q2(cum) value of -0.0952, showing that this model is
inadequate for class prediction.
Figure 3 PCA scores plot for Control and CKD ESI+ dataset
Due to the poor transparency and interpretability of the PCA model, an OPLSDA model was constructed to better visualize the separation between the
control and CKD groups. The OPLS-DA plot based on these data show that
there is a clear separation between the healthy controls and CKD patients
along the t[1] predictive component axis (Figure 4). The total variation in X
explains about 17% of the variation in Y (R2X(cum) = 0.174), the Y matrix
being the two different sample classes. It was also found that 8.12% of the
43
variation in the sample data directly correlated to class separation (R2X =
0.0812). Furthermore, a cumulative R2Y value of 0.973 indicates that the
model is able to account for the variation in Y well. In addition, as Q2Y(cum) =
0.830, the OPLS-DA model is able to predict class membership better than
chance.
Figure 4 OPLS-DA scores plot for Control against CKD ESI+ dataset
The first test of validation for the OPLS-DA model is to examine the crossvalidation score plots based on the internal seven-fold full cross-validation
employed in construction of the model. This is done by plotting both the score
of the predictive component, t and its cross-validated counterpart tcv for each
observation used to construct the model. As shown in Figure 5, most of the
samples were predicted to their own class in the cross-validation. Only
samples C36, C38, and KY16 were predicted to belong to the opposite
classes. However, these samples did not have absolute tcv values that were
too high, and the model was therefore taken as valid overall.
44
Figure 5 Cross-validation scores plot for Control and CKD ESI+ dataset
The validity of the OPLS-DA model and possible biomarkers was further
verified by using a random permutation test in SIMCA P+ [74]. A threecomponent PLS-DA model was constructed and the test was performed using
100 permutations with the option to recalculate the permutations. The
intercepts obtained of R2 and Q2 were 0.801 and -0.220 respectively (Figure
6). Also, all of the calculated Q2 values were lower than the Q2 value of the
PLS-DA model, which is another indication of the validity of the model.
Furthermore, using the criteria Q2Y > 0.5, 0 < R2Y – Q2Y < 0.3, intercepts of
R2 < 0.4 and intercepts of Q2 < 0.05 [74], this model was found to be
satisfactory in goodness of fit and validity, even though it did not meet the
requirement for the R2-intercept.
45
Figure 6 Random permutation test scores plot for Control and CKD ESI+ dataset
A further test of the validity of the model was carried out by randomly holding
out one-third of the samples from each of the control and CKD classes to form
a test set. The remaining observations constituted a working set, which were
used in the construction of an OPLS-DA model. This working set model was
then used to classify the samples in the test set. In this second test, a higher
percentage of test set samples that are correctly classified would also be an
indicator of the robustness of the model formed based on the control-CKD
ESI+ dataset. This procedure was carried out a total of three times. It was
found that the model achieved an average of 96.7% accuracy with Fisher’s
probability p[...]... the increasing number of parameters in metabolomic analysis, there is an even greater need for reliable and informative multivariate techniques to analyse this data The combination of multivariate statistical tools with metabolomics has been shown to be powerful for disease screening involving non-targeted determinations One such study of interest is that by Michell et al In their metabolomic analysis. .. Parkinson’s disease patient serum and urine samples, they were able to separate female Parkinson’s patients from their age-matched controls using partial least squares discriminant analysis (PLSDA) based on the urine data, despite not finding strong individual biomarkers responsible for this separation They surmise that there is a unique metabolic pattern of Parkinson’s disease contributed by certain... matrix, and may be discrete or continuous [77] Discriminant analysis may also be applied where the Y variables consist of variables denoting group belonging PLS when combined with discriminant analysis (PLS-DA) maximises class separation and builds prediction models based on this information given [2, 17] 19 1.3.4 Orthogonal Partial Least Squares Discriminant Analysis OPLS is also a supervised classification... profile human serum for the accurate diagnosis of coronary heart disease [56], while Keun et al have successfully used 13 C-NMR to investigate urine in metabolomic studies [57] Further, Kang et al have also successfully used NMR with orthogonal partial least squares discriminant analysis (OPLS-DA) – a multivariate statistical tool – to discriminate between Korean and Chinese herbal medicines [58].As the... groups in order to determine patterns of changes which are useful for diagnosis [14] Non-targeted approaches as that in metabolic fingerprinting may not identify the specific metabolites involved in disease pathology, but consider the total combination of analytes and their concentrations in totality [15] This approach allows for the “simultaneous analysis of multiple end products”, allowing for a “more... [74] Yin et al have also used a similar method to study liver cirrhosis and hepatocellular carcinoma [75] Therefore, HPLC has proven to be a necessary and powerful tool for studies involving disease screening and diagnosis In addition, urine, which is the choice of biofluid for this study, is particularly suitable for analysis by reversed-phase HPLC-MS [1] As mentioned previously, urine contains many... multivariate determination of the diseased state through analysis of urine Halket et al have explored a method of determining urinary organic acids using GC-MS with pattern recognition techniques to identify metabolic disorders [67] Zhang et al have also successfully used multivariate OPLS-DA modelling to determine 40 differentiating metabolites for osteosarcoma in GC-MS analysis of serum and urine, as well... further reviewed in this chapter However, a main drawback of NMR is its inherent lack of analytical sensitivity [2, 8, 46], which results in the inability to detect metabolites which have a concentration lower than 5 µM [18] Spin-spin coupling also causes complications in data interpretation [19] Several recent advances in NMR technology include microprobes and miniature probe coils for smaller volumes... successes using this approach show that there is a need and use for such targeted studies 1.1.4 Using Urine for Metabolomic Analysis While many types of body fluids (biofluids) have been used for metabolomic studies, the choice of biofluid is highly dependent on the disease being studied The choices of biofluid include blood serum [12, 26-29], plasma [27, 30-32], cerebrospinal fluid [33], urine [3, 5,... that the combination of HPLC-MS with multivariate modelling by PCA and partial least- squares discriminant analysis (PLS-DA) is successful in differentiating healthy mice from their diseased counterparts and identifying relevant biomarkers [69] The inherent sensitivity, specificity and efficiency of MS [69] coupled with the high peak capacity of HPLC have made it possible to accurately determine large ... 16 1.3.2 Principal Component Analysis 18 iv 1.3.3 Partial Least Squares/ Projection to Latent Structures 19 1.3.4 Orthogonal Partial Least Squares Discriminant Analysis 20 1.3.5... separate female Parkinson’s patients from their age-matched controls using partial least squares discriminant analysis (PLSDA) based on the urine data, despite not finding strong individual biomarkers... co-supervisor for this project, and for starting me on this project with the kind and thoughtful help in obtaining and running the samples I also thank Professor Ong Choon Nam, NUS, for kindly agreeing