1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo nghiên cứu khoa học: Status of CA125 marker in Vietnamese ovarian cancer patients and its related factors

31 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Status Of CA125 Marker In Vietnamese Ovarian Cancer Patients And Its Related Factors
Tác giả Thanh-Truong Pham, Lương Trí Đức, Trịnh Chí Dũng, Lê Ngọc Yến, Triệu Nguyễn Quế Anh
Người hướng dẫn Chu Đình Tới, Bùi Nhật Lệ
Trường học Vietnam National University, Hanoi
Chuyên ngành Informatics and Computer Engineering
Thể loại Student Research Report
Năm xuất bản 2024
Thành phố Hanoi
Định dạng
Số trang 31
Dung lượng 1,24 MB

Cấu trúc

  • 1. Introduction (8)
  • 2. Research Objective (9)
  • 3. Research Methology (10)
  • 4. Results and Discussion (14)
    • 4.1. CA125 results and discussion (14)
    • 4.2. Results of Random Forest Model discussion (21)
    • 4.3. Machine learning model discussions and future works (23)
  • 5. Research evaluation (27)
  • 6. Conclusion (28)
  • 7. Recommendation (28)

Nội dung

VIETNAM NATIONAL UNIVERSITY, HANOI INTERNATIONAL SCHOOL STUDENT RESEARCH REPORT Status Of CA125 Marker In Vietnamese Ovarian Cancer Patients And Its Related Factors Team Leader: Than

Introduction

Ovarian cancer poses a significant threat to women worldwide, including in Vietnam Advanced imaging and diagnostic technologies are contributing to its increasing prevalence Global estimates from Globocan indicate 314,000 cases and 207,000 deaths annually, with a projected 42% rise in cases by 2040 In Vietnam alone, 1404 cases were recorded in 2020.

923 fatalities from overian cancer [2] Global incidence of ovarian cancer increases yearly, affecting not only the life quality but also financial situation of affected patients Cost burdens such as surgery, chemotherapy and radiation all causes strain on the patient’s ability to pay for treatment, especially for low-income demographics [3] Zhang, Cheng [3] (2022) calculated the global burden of ovarian cancer worldwide from

1990 to 2019 and found the number of disability-adjusted life years (DALYs) in 2019 to be 5.36 million compared to an estimated 2.73 million in 1990 Additionally, the authors also found increasingly high morbidity and mortality rates in low socio- demographic index regions (i.e less developed regions)

Despite extensive research, the impact of ovarian cancer will reach an additional 19,680 patients and 12740 deaths in the USA in 2024 [4] O 38% of patients were diagnosed during stages I and II (2013-2019) according to US data [5], while a more recent publication in 2024 claimed 44% of diagnoses were made during stages I and II (2016-

2020) [4] It is critical that all forms of cancer be diagnosed in its early forms, consequently, the low early detection rate of ovarian cancer has caused barriers to proper treatment and management Five-year relative survival rates for stages III and IV are merely 36% and 17%, respectively [6] The cost burden of ovarian cancer treatment increases with cancer stage, up to 124,202 Canadian dollars at stage IV [7] Thus, an effective method for early detection is imperative for patient survival chances and economic outcomes

Early detection of ovarian cancer is crucial, and tumor biomarkers like CA125 and HE4 play significant roles CA125, a classic marker, has a high specificity of 90% in early-stage postmenopausal women, while HE4 shows elevated expression in malignant ovarian tissues However, CA125 can be absent in some ovarian cancers and elevated in benign conditions The combined use of CA125 and HE4 improves specificity, enhancing the accuracy of early detection.

Ovarian cancer biomarker research are promising and critical to advance testing and survival for patients In its current state, CA125 necessitates further research to increase the effectiveness of early screening tests incorporating this biomolecule Although a large body of literature exists surrounding this biomarker, studies pertaining to this specific biomarker in relation to the Vietnamese patient population is limited To fill in this gap, this study will present the results and analysis of the CA125 biomarker in a sample of Vietnamese ovarian cancer patients during the 2018-2021 period, contributing to a better understanding, detection and management of ovarian cancer in Vietnam From these results, we will introduce in this study a preliminary model that uses artificial intelligence to automate and improve the paramaters of CA125 testing.

Research Objective

Objective 1: The status of CA124 in ovarian cancer patients and several related factors

Our first research objective focuses on identifying relevant literature on CA125 in ovarian cancer patients, as well as identifying significant factors affecting CA125 from our gathered data and statistical analysis with 2 purposes as following:

 Providing a perspective on the current state of CA125 level and significant factors specific to the Vietnamese ovarian cancer patients, which lays the foundation for future research into ovarian cancer in Vietnam

 Providing data and a framework from which our machine learning model can be developed, refined and compared to existing methods

Objective 2: Develop a model to enhance the accuracy and specility of CA125

In the pursuit of enhancing early cancer screening, researchers have employed machine learning to improve the accuracy and specificity of CA125, a promising biomarker This study presents a novel machine learning model designed to optimize the sensitivity and specificity of CA125 analysis, paving the way for more effective detection and treatment strategies in the early stages of cancer development.

In this research study, we used the random forest algorithm to develop machine learning models With highly accurate computing capabilities and the ability to classify and synthesize predictions, we have applied the algorithm to the data block of ovarian cancer diagnosis medical records through blood concentration tests of patients diagnosed treatment of serum CA125 The purpose of testing and developing machine learning models is to achieve improved accuracy and sensitivity in calculation results.

Research Methology

Study group for CA125 data

This is a retrospective study accumulating data from medical records of patients who were initially diagnosed as ovarian cancer at Vietnam Naitional Cancer Hospital The study focuses on data relevant to CA125 and blood test results Medical records with incomplete information about CA125 testing level were excluded from our study data

Based on final diagnosis of doctors using the results of histopathological test and other para-clinical test, the patients were divided into ovarian cancer and non-ovarian cancer groups

CA125 data collection and measurements

Epidata Entry 3.1 (EpiData Association, Odense, Denmark) was used to collect, record and extract clinical and non-clinical data from targeted patient medical records and stored for use CA125 testing results and timeframe was also established to make sure that the CA125 levels of patients were assessed before any operation, which could greatly affect the accuracy of CA125 and blood test readings The stage of ovarian cancer diagnosis was defined based on the criteria of the International Federation of Gynecology and Obstetrics (FIGO) All information of participants was secured accordingly The ethical issues of this study were approved by the Insitute of Genome

Research institutional review board in bio-medical research according to the certificate of No: 02-2022/NCHG-HĐĐĐ on March 9, 2022 Data was then checked and pre- processed before applying statistical analysis We applied the commonly accepted cut- off value of CA125 at 35 U/mL Patients whose CA125 results were lower than 35 U/mL were classified into normal CA125 group, while patients whose CA125 results were higher than 35 U/mL were classified into abnormal CA125 group The nutritional status was determined based on Albumin with a cut-off threshold of 35 g/L

SPSS v.22 (IBM, USA) was used to perform statistical analysis Quantitative data were performed as Mean and Standard deviation (SD) while qualitative data were performed as n and percentage (%) Chi square test, Mann Whitney test and multivariate logistic regression were performed Significant statistic difference was defined as p-value `%

(e.g: Name, ID, Patients’ cancer history, Patients’ family diseases history, nutritional status, etc.)

Figure 1 Data feature importance analysis through trials

Random forest model workflow and characteristics:

To prepare the model for a binary classification task, the dataset is first loaded from a CSV file The target variable Diagnosis is then binarized, grouping categories 0 and 2 into a single category and isolating category 1 as another This simplification enables the application of binary classification metrics and methods to determine the probability of a patient having ovarian cancer.

Categorical and numerical feature identification facilitates tailored preprocessing Numerical features require mean imputation and scaling for model inputs uniformity, which is critical for scale-sensitive machine learning algorithms.

The preprocessing for categorical variables includes imputation of the most frequent category and one-hot encoding, which transforms categorical variables into a format that can be provided to machine learning algorithms By converting categorical variables into a binary vector format, one-hot encoding allows the model to treat each category as a separate entity, preventing the algorithm from misinterpreting the ordinal relationships between categories These preprocessing steps are encapsulated in a Pipeline within a ColumnTransformer, ensuring that each feature is processed appropriately before being used in model training, thereby automating the workflow and facilitating reproducibility and scalability

The model choice in the script is a RandomForestClassifier, an effective ensemble method that operates by building multiple decision trees and voting on the most popular output class This classifier is particularly suited for this task due to its ability to handle unbalanced data, as indicated by the use of the balanced_subsample class weight This setting helps in adjusting the weights inversely proportional to class frequencies in the input data, providing a countermeasure against the model’s bias towards the majority class Moreover, ensemble methods like Random Forest are less prone to overfitting compared to individual decision trees

The model building process employs a RandomForestClassifier integrated within an imbalanced-learn pipeline which also incorporates SMOTE (Synthetic Minority Over- sampling Technique) for addressing class imbalance Class imbalance is a common issue in medical datasets where one class significantly outnumbers the other, potentially leading to biased models SMOTE works by creating synthetic samples from the minority class to balance the class distribution, which can improve the classifier's performance on imbalanced datasets This approach is integrated into the modeling pipeline, ensuring that resampling occurs during the training phase only, thereby preventing information leakage between the train and test datasets

A grid search cross-validation method is used to find the best hyperparameters for the RandomForest model Parameters like the number of trees, maximum depth of trees, minimum samples split, and others are optimized based on the ROC-AUC score, a performance measurement for classification problems at various threshold settings The ROC curve plots the true positive rate against the false positive rate at various threshold levels, providing insights into the trade-off between sensitivity and specificity

Finally, the evaluation of the model includes various metrics such as the classification report, ROC-AUC score, and plots of the confusion matrix, ROC curve, and Precision-Recall curve These metrics and visualizations offer a detailed insight into the performance of the model across different aspects, such as its ability to correctly classify the positive class, its overall accuracy, and its balance between recall and precision Such comprehensive evaluation is crucial in medical applications where the cost of a false negative or false positive can be high, emphasizing the need for a robust predictive model.

Results and Discussion

CA125 results and discussion

In this study, a total of 153 patients were involved in this study The majority (68.63%) of the subjects fell within the 30-65 year range (Table 1) This is an age group of high interest due to the higher reported incidence of ovarian cancer compared to the younger (less than 30 years old) age group Most of involved patients belonged to the Kinh ethnic group, while only a small percentage of patients belonged to other minority groups There was high variation in the length of hospitalization, and most of the patients did receive some insurance funding

The majority of patients were diagnosed with ovarian cancer (75.82%), and a smaller percentage were diagnosed with benign forms of ovarian cancer Most patients do not have a family history or personal medical history related to cancer Nutritrional status among patients is equally divided A small occurrence of comorbidities, mostly hypertension and one case of diabetes was present The cancer stage with most diagnoses were Stage III (23.53%), followed by stage I (19.61%), and nearly 90% of patients were recorded with at least some improvement in health status (cured or improved) Tumor size is relatively unclear, although present data suggests an equal distribution between smaller (less than 10 cm) and bigger (more than 10 cm) tumor sizes

Table 1 Characteristics of studied group

Body weight (27 missing entries) (kg) 51.57 7.87 Height (55 missing entries) (meters) 1.56 0.06

Tumor in non- reproductive organs 2 1.31%

At least one relative having cancer 12 7.84%

Length of hospitalization significantly differs, with patients in the affected group requiring longer stays (86.96 days vs 71.25 days) Diagnosis also varies, with the abnormal group experiencing a disproportionate number of ovarian cancer cases (75.9% vs 50% in the normal group) However, age does not exhibit a significant difference between the two groups.

According to the analysis of each FIGO stage, the difference is considered significant (p

Ngày đăng: 08/10/2024, 02:12

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN