1. Trang chủ
  2. » Luận Văn - Báo Cáo

Bài tập lớn Xác suất thống kê Đại học Bách khoa thành phố Hồ Chí Minh

33 11 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 33
Dung lượng 1,19 MB

Nội dung

Bài tập lớn môn Xác suất thống kê Đại học bách khoa TPHCM A stroke happens when there is a disruption or reduction in the blood flow to different parts of the brain, which causes the cells there to stop receiving the nutrients and oxygen they need and to die. A stroke is a medical emergency that has to be treated right away. To stop additional damage to the damaged area of the brain and potential consequences in other body parts, early detection and adequate therapy are necessary. According to the World Health Organization (WHO), fifteen million people get strokes annually, with one victim passing away every four to five minutes. According to the Centers for Disease Control and Prevention (CDC), strokes are the sixth most common cause of death in the United States. About 11% of people die from noncommunicable diseases like stroke each year. Approximately 795,000 Americans experience the incapacitating symptoms of strokes on a regular basis. It is the fourth most common cause of death in India. There are two types of strokes: ischemic and hemorrhagic. In a hemorrhagic stroke, a weak blood artery bursts and bleeds into the brain; in a chemical stroke, clots prevent drainage. Stroke can be prevented by living a healthy, balanced lifestyle that excludes harmful habits like smoking and drinking, maintains a healthy body mass index (BMI), average blood glucose levels, and great heart and kidney function. Predicting a stroke is crucial, and it needs to be treated right away to prevent irreparable harm or death. With the advancement of medical technology, it is now possible to use... methods to predict the onset of a stroke.

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY International Study Program PROBABILITY AND STATISTICS GROUP ASSIGNMENT REPORT Lecturer: Group: - Class: - Semester: 222 No Name Student Faculty ID Task Bonus point Ho Chi Minh City, Table of contents Acknowledgment Project I Introduction Topic Problem II Problem solving .7 Data analysis and methods theory 1.1 Data analysis 1.1.1 Import the data 1.1.2 Clean the data 1.1.3 Plot graphs for each factor a Gender barplot and age histogram .9 b Hypertension barplot and heart disease barplot .10 c Martial status batplot and work type barplot 10 d Residence type barplot and average glucose level histogram 10 e BMI histogram and smoking status barplot .11 f Stroke barplot 12 1.1.4 Graphs of each factors in correlation to strokes 12 a Gender and hypertension 14 b Heart disease and marital status .15 c Residence type .15 d Age 16 e Average glucose level 16 f BMI .17 1.2 Theoretical basis of logistic regression model .18 Apply logistic model and obtain the prediction 23 2.1 Logistic regression model .23 2.2 Prediction 26 Conclusion 28 III Full R-code 29 References 33 Acknowledgment First of all, we would like to express our gratitude to Professor Nguyen Tien Dung for having enabled our group to have a chance to interact with R studio software in this research We are also grateful that you have shown us an abundant amount of knowledge about Probability and Statistics This is an opportunity for us to operate the R studio which is important material to the world of Mathematics nowadays The software increases not only our knowledge but also the ideas for future projects We also have a large amount of information about the disease to finish our research and it helps us a lot in taking care of our and our families health in the future Project I Introduction Topic A stroke happens when there is a disruption or reduction in the blood flow to different parts of the brain, which causes the cells there to stop receiving the nutrients and oxygen they need and to die A stroke is a medical emergency that has to be treated right away To stop additional damage to the damaged area of the brain and potential consequences in other body parts, early detection and adequate therapy are necessary According to the World Health Organization (WHO), fifteen million people get strokes annually, with one victim passing away every four to five minutes According to the Centers for Disease Control and Prevention (CDC), strokes are the sixth most common cause of death in the United States About 11% of people die from noncommunicable diseases like stroke each year Approximately 795,000 Americans experience the incapacitating symptoms of strokes on a regular basis It is the fourth most common cause of death in India There are two types of strokes: ischemic and hemorrhagic In a hemorrhagic stroke, a weak blood artery bursts and bleeds into the brain; in a chemical stroke, clots prevent drainage Stroke can be prevented by living a healthy, balanced lifestyle that excludes harmful habits like smoking and drinking, maintains a healthy body mass index (BMI), average blood glucose levels, and great heart and kidney function Predicting a stroke is crucial, and it needs to be treated right away to prevent irreparable harm or death With the advancement of medical technology, it is now possible to use methods to predict the onset of a stroke The algorithms used in are useful because they enable precise prediction and appropriate analysis The majority of earlier research on strokes concentrated, among other things, on heart attack prognosis There haven't been many studies on brain stroke This paper's primary goal is to show how logistic regression models can be used to predict when a brain stroke will occur The most significant element of the techniques used and the conclusions drawn is that of the four different classification algorithms evaluated, logistic regression models performed the best, outperforming the others in terms of accuracy metric One drawback of the model is that it is trained on textual data rather than real time brain images The implementation of logistic regression models classification methods is shown in this paper As mentioned earlier, the major contribution of this research is that we have used different machine learning models on a publicly available dataset In the previous work, most of the researchers used a significant model to predict the stroke disease All the results and comparisons are briefly discussed in the following section Problem Dataset: The original dataset is provided at: kaggle.com Information of dataset attributes:  id: unique identifier  gender: "Male", "Female"  age: age of the patient  hypertension: if the patient does not have hypertension, if the patient has hypertension  heart_disease: if the patient does not have any heart diseases, if the patient has a heart disease  ever_married: "No" or "Yes"  work_type: "children", "Govt_jov", "Never_worked", "Private" or "Selfemployed"  Residence_type: "Rural" or "Urban"  avg_glucose_level: average glucose level in blood  bmi: body mass index  smoking_status: "formerly smoked", "never smoked", "smokes"  stroke: if the patient had a stroke or if not Our aim: using Rstudio to predict whether a patient has a stroke or not based on other attributes Steps: - First, we analyze the dataset and from that choose suitable methods - Second, we get the prediction from the methods II Problem solving Data analysis and methods theory 1.1 Data analysis 1.1.1 Import the data Firstly, we import the database healthcare-dataset-stroke-data.csv into Rstudio by read.csv() and then view it in a table by View() #Import the data from csv file data data summary(data) The command sum(is.na()) is used to count the number of N/A values in the data, the result shows 31 in total Then “na.omit()->” is run to eliminate all rows containing them Lastly, summary() is used to demonstrate some traits of each column factor (Min, 1st quad, Med, Mean, 3rd quad, Max) 1.1.3 Plot graphs for each factor For continuous variable we use histograms and for categorical we use barplots #histogram, barplot of each factor y

Ngày đăng: 28/06/2023, 01:12

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w