Hypertension is a severe threat to human being’s health due to its association with many comorbidities. Many research works have explored hypertension’s prevalence and treatment. However, few considered impact of patient’s socioeconomic status and geographical disparities.
Int J Med Sci 2017, Vol 14 Ivyspring International Publisher 201 International Journal of Medical Sciences 2017; 14(3): 201-212 doi: 10.7150/ijms.16974 Research Paper Prevalence and Risk Factors of Comorbidities among Hypertensive Patients in China Jiaojiao Wang1*, Jian James Ma2*, Jiaqi Liu1*, Daniel Dajun Zeng1, 3*, Cynthia Song4, Zhidong Cao1 The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China College of Business, University of Colorado, Colorado Springs, CO, USA University of Chinese Academy of Sciences, Beijing, China Internal Medicine Physician, Abington Hospital – Jefferson Health, Abington, PA, USA * These authors contributed equally to this work Corresponding author © Ivyspring International Publisher This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY-NC) license (https://creativecommons.org/licenses/by-nc/4.0/) See http://ivyspring.com/terms for full terms and conditions Received: 2016.07.25; Accepted: 2017.01.23; Published: 2017.02.23 Abstract Hypertension is a severe threat to human being’s health due to its association with many comorbidities Many research works have explored hypertension’s prevalence and treatment However, few considered impact of patient’s socioeconomic status and geographical disparities We intended to fulfill that research gap by analyzing the association of the prevalence of hypertension and three important comorbidities with various socioeconomic and geographical factors We also investigated the prevalence of those comorbidities if the patient has been diagnosed with hypertension We obtained a large collection of medical records from 29 hospitals across China We utilized Bayes’ Theorem, Pearson’s chi-squared test, univariate and multivariate regression methods and geographical detector methods to analyze the association between disease prevalence and risk factors We first attempted to quantified and analyzed the spatial stratified heterogeneity of the prevalence of hypertension comorbidities by q-statistic using geographical detector methods We found that the demographic and socioeconomic factors, and hospital class and geographical factors would have an enhanced interactive influence on the prevalence of hypertension comorbidities Our findings can be leveraged by public health policy makers to allocate medical resources more effectively Healthcare practitioners can also be benefited by our analysis to offer customized disease prevention for populations with different socioeconomic status Key words: Hypertension, Prevalence, Comorbidity, Bayes’ Theorem, Geographical Detector, Public Health, Risk Factor Introduction and Background Non-communicable diseases have become major threats to global health [1] Over 36 million people die annually from non-communicable diseases, making up nearly two-thirds of deaths worldwide each year Cardiovascular diseases, along with diabetes mellitus, cancer, and respiratory diseases, are one of the main causes of non-communicable disease-related deaths [2] More importantly, when a patient is diagnosed with multiple comorbidities, it induces more challenges to the patient’s health condition [3] Some researchers have implied that comorbidities should have been paid sufficient attention in the differential diagnosis of patients, and that focusing treatment on comorbidities may be more beneficial for the treatment and control of diseases [4] Hypertension has been proved a major co-existing disease with cardiovascular diseases in many countries since the 1970s [5-11] In China, there has been a steady increase in the prevalence of hypertension during the past decade Interestingly the increasing trend of hypertension prevalence varies across different populations [12-18] That trend induces researchers’ interests and effort in investigating possible risk factors of hypertension, and in turn analyzing the risk factors of cardiovascular diseases as hypertension’s http://www.medsci.org Int J Med Sci 2017, Vol 14 comorbidities In our study, we investigated the prevalence and risk factors of coronary heart disease [19-21] when the patient was diagnosed with hypertension as well To make our investigation more comprehensive, we also analyzed diabetes mellitus [22-24] and hyperlipidemia [20, 25] as hypertension’s comorbidities The reason why those two diseases were chosen as hypertension’s major comorbidities was because, just like hypertension, those two diseases were also were considered strong indicating factors of coronary heart disease [26] The hazards, importance, and risk factors of the comorbidities of hypertension have been extensively studied [9-11, 27-37] However, most studies have been confined to a limited geographical region [38] China is a large country with the biggest population in the world Understanding the differences of prevalence of hypertension and its comorbidities across different geographic areas in China is vital to make strategic public health policy and to allocate public health-related resources In addition to patient’s geographical region, patient’s sex, age, and several other characteristics (e.g., income, education, occupation, control of tobacco consumption, and obesity, etc.) may also play a role in disease’s prevalence [2, 37, 39-41] The nature of hypertension-related health risk is similar in all populations, but the distribution of diseases with regard to those risk factors may vary [38, 42] In this study, we collected a large sample of patient’s electronic medical data from 29 cities across different areas in China We aimed to examine the relationship between the prevalence and distribution of hypertension and its three comorbidities and the associated factors such as hospital reputation, demographic factors, patient socioeconomic status, and geographical disparities (zone type, topography, etc) Our analysis revealed important risk factors that are associated to the prevalence of those diseases being studied Our work can be leveraged by health policy decision makers to better control hypertension and its comorbidities for specific populations and geographic areas in China Data Description We describe the real medical data set that we utilized, the challenges that we encountered during data analysis, and the risk factors that we chose to investigate the disease prevalence Ethical statement This study was approved by the institutional review board of the Institute of Automation, Chinese Academy of Sciences The data set was collected by the Chinese government for disease control All 202 patients gave their informed consent The patient’s privacy was strictly preserved in our study We only used the patient’s sex, age, and clinical diagnostic information to perform our analysis Patients’ identity-related information was masked before we started our study Data collection The Electronic Medical Records (EMRs) being used in this study were obtained from a national level health information organization in China Specifically EMRs from 29 hospitals in 29 different cities in China from January 1, 2011, to December 31, 2013 were collected The entire data cleaning and analysis procedure was completed using the very computer provided by the data provider in a monitored room at the data provider’s facility Throughout the process, we were not permitted to transfer or make copy of the data off of the computer that we were utilizing In total, 2,122,703 hypertensive outpatients were identified out of all the EMRs in the data set Each EMR included patient’s medical records, such as patient id, date of treatment, sex, age, diagnosis, and the information about hospital, e.g hospital class and location, etc EMRs with incomplete information were excluded from this study Original medical data was collected by the attending physicians, then proofed and summarized by trained staff before the data was submitted to the national health information organization Patient’s information was collected for diagnostic purposes, and was thus highly reliable and objective In China, the disease diagnostic process is fairly standard with regard to common diseases such as hypertension and the three comorbidities in our study [43] Note that throughout this study, a hypertensive patient was said to have comorbidity, i.e hyperlipidemia, meant that patient was simultaneously diagnosed with both hypertension and hyperlipidemia on at least one medical record while the patient was in hospital Since some data fields were input via free text, physicians from different hospitals sometimes chose slightly different terms to describe patient’s conditions Before we could start data analysis, we had to spend a great deal of time consolidating the data format Data cleaning and consolidation process was the most time-consuming and labor-intensive step in this study Risk factors To investigate disease prevalence and distribution, we chose the following six dimensions as the possible risk factors, namely patient’s gender, age, income level, hospital class, zone type, and topography In each risk factor dimension, we http://www.medsci.org Int J Med Sci 2017, Vol 14 categorized patients in several groups based on the patient’s or the hospital’s information Each patient belonged to one and exactly one group in each factor dimension To study the age impact on the prevalence of hypertension and the three comorbidities, we leveraged WHO criteria (http://www.who.int/ topics/ageing/en/) Specifically we defined people aged 0-44 as young group, 45-59 as middle-aged group, and ≥60 as elderly group Instead of using patient’s individual income level, we categorized patients based on the average income of the city where the patient resided We then compared the city average income to the national average income of China (http://www.stats.gov.cn/) Each patient was either in the group whose city income average was higher or lower than the national income average The hospital class in China recognizes a hospital’s quality and capacity of providing medical service, delivering medical education, and conducting medical research Based on the current Chinese health care policy, Chinese hospitals are categorized in tiers Tier hospitals are typically located in small towns and have less than 100 inpatient beds Tier hospitals tend to be at median-sized cities or districts Their inpatient bed counts are between 100 and 500 Tier hospitals are usually comprehensive or general hospitals at large cities with a bed count exceeding 500 Furthermore, based on the quality of medical services, infrastructure, equipment, and management efficiency, each tier is further categorized into subsidiary levels, A, B, and C, with A being the highest and C the lowest The 29 hospitals that we chose to perform our study involved five classes, namely 3A, 3B, 3C, 2A, and 2C Hospital locations were matched to the geocode of city-level divisions Based on socioeconomic status and geographical nature of the city where the hospital was located, we categorized the 29 hospitals into geographical zones, namely Northeast, North, East, South, Central, Northwest, and Southwest Figure showed the map of the geographical zones and the locations of the 29 cities The topography factor captured the altitude feature of the hospital The altitude data was from the GIS data sets published by The National Geomatics Center of China Four groups of hospitals were considered, namely plain (0~200 meters), hill (201~500 meters), mountain (501~1000 meters), and plateau (>1000 meters) Analytical Methods We emphasized three important comorbidities among the hypertensive patients from our data set, 203 namely diabetes mellitus, hyperlipidemia, and coronary heart disease We intended to conduct our analysis in two folds Firstly, we aimed to analyze the prevalence of a comorbidity given the patient diagnosed with hypertension We adopted Bayes’ Theorem to calculate the disease prevalence Our second goal was to investigate the correlation between associated risk factors and disease prevalence by using non-conditional logistic regression method ArcGIS v10.1 (Environmental Systems Research Institute Inc., Redlands, CA, USA) was used to visualize the prevalence of comorbidities of hypertension Figure Hospital locations and geographical zones Figure generated by ESRI ArcMap v10.1 Bayes’ theorem Bayes’ Theorem is a famous probability theorem named after the renowned Thomas Bayes [44] It calculates the probability of a random event A given the fact that another random event B occurs Bayes’ Theorem has been widely utilized in analyzing disease prevalence, spread forecasting, and other public health problems [45-47] The following formula is one of the frequently referred representations of Bayes’ Theorem 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵) = 𝑃𝑃(𝐴𝐴│𝐵𝐵) · 𝑃𝑃(𝐵𝐵) = 𝑃𝑃(𝐵𝐵│𝐴𝐴) · (𝑃𝑃(𝐴𝐴) Where 𝑃𝑃(𝐴𝐴) is the probability that random event A occurs 𝑃𝑃(𝐴𝐴│𝐵𝐵) is the probability of A given B occurs 𝐴𝐴 ∩ 𝐵𝐵 denotes the intersection of random events A and B Thus 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵) is the probability that both events A and B occur From the above formula, we have 𝑃𝑃(𝐴𝐴|𝐵𝐵) = 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵) 𝑃𝑃(𝐵𝐵) http://www.medsci.org Int J Med Sci 2017, Vol 14 We adopted Bayes’ Theorem to analyze the probability that a patient would be diagnosed with a comorbidity given that the same patient had been diagnosed with hypertension Note that the probability that a patient is diagnosed with a disease is usually estimated to be the prevalence of that disease We then derived 𝑃𝑃�𝐶𝐶𝑖𝑖𝑖𝑖 ∩ 𝐻𝐻 ∩ 𝑅𝑅𝑖𝑖 � 𝑃𝑃�𝐶𝐶𝑖𝑖𝑖𝑖 ∩ 𝐻𝐻 ∩ 𝑅𝑅𝑖𝑖 � · 𝑉𝑉 = 𝑃𝑃(𝐻𝐻 ∩ 𝑅𝑅𝑖𝑖 ) 𝑃𝑃(𝐻𝐻 ∩ 𝑅𝑅𝑖𝑖 ) · 𝑉𝑉 𝑉𝑉(𝐶𝐶𝑖𝑖𝑖𝑖 ∩ 𝐻𝐻 ∩ 𝑀𝑀) = 𝑉𝑉(𝐻𝐻 ∩ 𝑅𝑅𝑖𝑖 ) 𝑃𝑃�𝐶𝐶𝑖𝑖𝑖𝑖 �𝐻𝐻, 𝑅𝑅𝑖𝑖 � = Where H: The event that a hypertensive patient is diagnosed Ri: The event that a patient is with risk factor i Cij: The event that a patient with risk factor i is diagnosed with comorbidity j V: The overall population P(E): The probability of event E occurs V(E): The overall number of instances of event E For example, the prevalence of coronary heart disease given the patient is both male and diagnosed with hypertension equals the total number of male hypertensive patients with coronary heart disease divided by the total number of male hypertensive patients The results were in Table Statistical analyses The correlation analysis between the associate risk factors and disease prevalence was performed using non-conditional logistic regression method To fully understand the correlation, we conducted both univariate and multivariate regression analysis The associated risk factors were considered independent variables in the regression analysis The dependent variable was a binary variable where indicated a hypertensive patient with at least one of the three comorbidities and a hypertensive patient without Differences in prevalence of hypertension comorbidities by the associated factors were compared among subgroups by using Pearson’s chi-squared test Data was entered and reviewed by two different researchers For descriptive analysis, the prevalence of hypertension comorbidities and other categorical variables were expressed in percentages Results were presented in Tables and All the categorical variables used in the study were coded as dummy variables, namely, hypertension comorbidities (positive = 1, negative = 0); gender (male = 1, female = 0); age (45-59 = 1, 60+ = 2, 0-44 = 0); the city average of per capita disposable income of urban households during 2011-2013 (higher than national average = 1, lower than national average = 0); hospital class (3A = 1, 3B = 2, 3C = 3, 2A = 4, 2C = 204 0); zone type (North = 1, East = 2, South = 3, Central = 4, Northwest = 5, Southwest = 6, Northeast = 0); topography (hill = 1, mountain = 2, plateau = 3, plain = 0) Odd ratios (OR) and 95% confidence intervals (CI) were calculated using univariate and multivariable logistic regression analyses P value < 0.05 was considered statistically significant Geographical detector methods The geographical detector method [48, 49] is a spatial variance analysis method developed in the context of medical geography to assess the associations between a health outcome and feasible risk factors Spatial stratified heterogeneity is a universal driver of biological diversity and evolution, environmental patterns and tyranny, and inter-regional conflicts and cooperation The geographical detector method computes the power of determinant (q) that quantitatively measures the affinity between the risk factors and disease prevalence The geographical detector method is based on analysis of the variance of disease prevalence by the categories of each risk factor under consideration The key underlying assumption is the following: if the factor F is associated with disease prevalence P, then P would exhibit a spatial distribution similar to that of F In the perfect case in which factor F completely explains pattern of P, the value of P would be uniform across each category of F and spatial variance of P within all categories would be in a realistic case, the degree of spatial correspondence between layers F and P is measured by the power of determinant (q) for a factor F which is defined as 𝐿𝐿 � 𝑁𝑁𝑐𝑐 𝜎𝜎𝑐𝑐 𝑞𝑞𝐹𝐹 = − 𝑁𝑁𝜎𝜎 𝑐𝑐=1 Where 𝜎𝜎𝑐𝑐 is the variance of P within category c of the risk factor F, Nc is number of sample units in category c, 𝜎𝜎 is global variance of P in the entire study area, N is the number of total samples in the entire study area, and L is the number of categories of the factor F The standard definition of 𝜎𝜎𝑐𝑐 and 𝜎𝜎 apply here 𝜎𝜎𝑐𝑐 𝑁𝑁𝑐𝑐 = �(𝑃𝑃𝑐𝑐,𝑖𝑖 − 𝑃𝑃�𝑐𝑐 )2 𝑁𝑁𝑐𝑐 − 𝑖𝑖=1 Where Pc,i is the value of ith sample unit of P in category c and 𝑃𝑃�𝑐𝑐 is the mean of P in category c 𝜎𝜎 = 𝑁𝑁 �(𝑃𝑃𝑗𝑗 − 𝑃𝑃�)2 𝑁𝑁 − 𝑗𝑗=1 Where Pj is the value of the jth sample unit from http://www.medsci.org Int J Med Sci 2017, Vol 14 205 the entire study area and 𝑃𝑃� is the global mean of P over the entire study area Note that the term ∑𝐿𝐿𝑐𝑐=1 𝑁𝑁𝑐𝑐 𝜎𝜎𝑐𝑐 is a ratio of the 𝑁𝑁𝜎𝜎 weighted sum of local variance (weighted by the number of samples in each category) to the global variance If factor F completely controls the spatial distribution of P, local variance is and qF = (assuming 𝜎𝜎 ≠ 0) If factor F is completely unrelated to the spatial distribution of P, the weighted sum of local variance is the same as the global variance and qF = In general, 𝑞𝑞𝐹𝐹 ∈ [0,1] reflects the proportion of spatial variation of P explained by the factor F Higher values of qF indicate higher affinity of F and P Note that this method assesses degree of affinity or spatial association and not specifically a degree of causal relation between F and P The power of determinant (qF) is termed the “factor detector” and addresses the question “which risk factor is more strongly associated with the spatial distribution of P and thus could be a controlling factor?” The free software for conducting geographical detector analysis can be downloaded from http://www.sssampling.org/ Excel-Geodetector/ Results Here we summarize the results that we obtained from the above analysis Prevalence of hypertension and its comorbidities Table showed the occurrence distribution of hypertension and the three important comorbidities, diabetes mellitus, hyperlipidemia, and coronary heart disease The numbers of patients in each group of each risk factor category were presented The numbers in the parentheses were the percentage values In terms of patient’s income, the higher income patients accounted for the majority of hypertension (68.24%) and all the three comorbidities (69.36% for diabetes mellitus, 77.39% for hyperlipidemia, and 57.32% for coronary heart disease) Note that observation might be due to the fact that higher income population tended to go to hospital more often than lower income population did For hospital class, while it was reasonable to assume more patients tended to go to better and bigger hospitals, we did observe that for hyperlipidemia, 17.02% patients went to 2A hospitals which was significantly higher than 3B (3.47%) and 3C (0.29%) hospitals Hospital’s zone type showed some interesting patterns too East zone accounted for the largest portion of prevalence for hypertension (35.56%) and diabetes mellitus (35.29%), but north zone was by far the biggest contributor in hyperlipidemia (69.82%) and coronary heart disease (40.16%) Table Disease distribution of hypertension and three comorbidities Characteristic (Number and %) Gender Male Female Age 0-44 45-59 60+ Income higher than national average lower than national average Hospital class 3A 3B 3C 2A 2C Zone type Northeast North East South Central Northwest Southwest Topography Plain(0-200 meters) Hill(201-500 meters) Mountain(501-1000 meters) Plateau( >1000 meters) Total Hypertension Hypertension and Diabetes mellitus Hypertension and Hyperlipidemia Hypertension and Coronary heart disease 1146218 (54.00) 976485 (46.00) 209121 (54.44) 174984 (45.56) 108929 (55.63) 86876 (44.37) 163156 (54.44) 136538 (45.56) 265554 (12.51) 653872 (30.80) 1203277 (56.69) 15222 (3.96) 100263 (26.10) 268620 (69.93) 20557 (10.50) 62154 (31.74) 113094 (57.76) 7444 (2.48) 56712 (18.92) 235538 (78.59) 1448506 (68.24) 674197 (31.76) 266401 (69.36) 117704 (30.64) 151541 (77.39) 44264 (22.61) 171784 (57.32) 127910 (42.68) 1771120 (83.44) 131428 (6.19) 12221 (0.58) 197455 (9.30) 10479 (0.49) 340968 (88.77) 23749 (6.18) 2063 (0.54) 16119 (4.20) 1206 (0.31) 154595 (78.95) 6790 (3.47) 570 (0.29) 33320 (17.02) 530 (0.27) 261746 (87.34) 17863 (5.96) 1502 (0.50) 18180 (6.07) 403 (0.13) 54983 (2.59) 621662 (29.29) 750531 (35.36) 29300 (1.38) 179844 (8.47) 189209 (8.91) 297174 (14.00) 10521 (2.74) 129063 (33.60) 135533 (35.29) 3848 (1.00) 19477 (5.07) 20471 (5.33) 65192 (16.97) 681 (0.35) 136712 (69.82) 17864 (9.12) 2761 (1.41) 7998 (4.08) 9433 (4.82) 20356 (10.40) 4115 (1.37) 120361 (40.16) 72028 (24.03) 4648 (1.55) 20445 (6.82) 17724 (5.91) 60373 (20.14) 1549047 (72.98) 426324 (20.08) 51807 (2.44) 95525 (4.50) 2122703 (100) 291380 (75.86) 77709 (20.23) 3874 (1.01) 11142 (2.90) 384105 (100) 159679 (81.55) 26261 (13.41) 5217 (2.66) 4648 (2.37) 195805 (100) 203607 (67.94) 73954 (24.68) 14298 (4.77) 7835 (2.61) 299694 (100) Values in parentheses referred to the percentage of patients in the corresponding group http://www.medsci.org Int J Med Sci 2017, Vol 14 206 Table Prevalence of comorbidities among hypertensive patients Characteristics Hypertension and Hypertension and Diabetes mellitus Hyperlipidemia Prevalence, P value Prevalence, P % % value Gender Male 18.24