Measures of the Center of the Data

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	15
Dung lượng	211,33 KB

Nội dung

Measures of the Center of the Data tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất cả các lĩ...

ESSENTIAL BUSINESS KNOWLEDGE (BEK)THE SIX MEASURES OF INCOMEOF THE U.S. DEPARTMENT OF COMMERCE. Nguyen Tran Bich Ngoc1. Gross domestic product (GDP) is the market value of all final goods and services produced within a country in a given period of time.2. Gross national product (GNP) is the total income earned by a nation’s permanent residents (called nationals). It differ from GDP by including income that our citizens earn abroad and excluding income that foreigners earn here. For example, when a Canadian citizen works temporally in the United States, his production is part of U.S. GDP, but it is not part of U.S. GNP.( It is part of Canada’s GNP). For most countries, including the United States, domestic residents are responsible for most domestic production, so GDP and GNP are quite close.3. Net national product (NNP) is the total income of a nation’s resident (GNP) minus losses from depreciation. Depreciation is the wear and tear on the economy’s stock of equipment and structures, such as trucks rusting and lightbulbs burning out. In the national income accounts prepared by the Department of Commerce, depreciation is called the “ consumption of fixed capital”.4. National income is the total income earned by nation’s residents in the production of goods and services, it differs from Net national product by excluding indirect business taxes (such as sales taxes) and including business subsidies. NNP and national income also differ because of a “statistical discrepancy” that arises from problem in data collection.5. Personal income is the income that households and non-corporate businesses receive. Unlike national income, it excludes retained earnings, which is income that corporations have earned but have not paid out to their owners. It also subtracts corporate income taxes and contributions for social insurance (mostly Social Security taxes). In addition, personal income includes the interest income that households receive from their holding of government debt and the income that households receive from government transfer programs, such as welfare and Social Security.6. Disposable personal income is the income that households and Measures of the Center of the Data Measures of the Center of the Data By: OpenStaxCollege The "center" of a data set is also a way of describing location The two most widely used measures of the "center" of the data are the mean (average) and the median To calculate the mean weight of 50 people, add the 50 weights together and divide by 50 To find the median weight of the 50 people, order the data and find the number that splits the data into two equal parts The median is generally a better measure of the center when there are extreme values or outliers because it is not affected by the precise numerical values of the outliers The mean is the most common measure of the center NOTE The words “mean” and “average” are often used interchangeably The substitution of one word for the other is common practice The technical term is “arithmetic mean” and “average” is technically a center location However, in practice among non-statisticians, “average" is commonly accepted for “arithmetic mean.” When each value in the data set is not unique, the mean can be calculated by multiplying each distinct value by its frequency and then dividing the sum by the total number of data values The letter used to represent the sample mean is an x with a bar over it ¯ (pronounced “x bar”): x The Greek letter μ (pronounced "mew") represents the population mean One of the requirements for the sample mean to be a good estimate of the population mean is for the sample taken to be truly random To see that both ways of calculating the mean are the same, consider the sample: 1; 1; 1; 2; 2; 3; 4; 4; 4; 4; ¯ x= ¯ x= 1+1+1+2+2+3+4+4+4+4+4 11 3(1) + 2(2) + 1(3) + 5(4) 11 = 2.7 = 2.7 1/15 Measures of the Center of the Data In the second example, the frequencies are 3(1) + 2(2) + 1(3) + 5(4) You can quickly find the location of the median by using the expression n+1 The letter n is the total number of data values in the sample If n is an odd number, the median is the middle value of the ordered data (ordered smallest to largest) If n is an even number, the median is equal to the two middle values added together and divided by two after the data has been ordered For example, if the total number of data values n+1 97 + is 97, then = = 49 The median is the 49th value in the ordered data If the total number of data values is 100, then n + 100 + = = 50.5 The median occurs midway between the 50th and 51st values The location of the median and the value of the median are not the same The upper case letter M is often used to represent the median The next example illustrates the location of the median and the value of the median AIDS data indicating the number of months a patient with AIDS lives after taking a new antibody drug are as follows (smallest to largest): 3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47; Calculate the mean and the median The calculation for the mean is: ¯ [3 + + (8)(2) + 10 + 11 + 12 + 13 + 14 + (15)(2) + (16)(2) + + 35 + 37 + 40 + (44)(2) + 47] x= = 23.6 40 To find the median, M, first use the formula for the location The location is: n+1 40 + = 20.5 = Starting at the smallest value, the median is located between the 20th and 21st values (the two 24s): 3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47; M= 24 + 24 = 24 To find the mean and the median: Clear list L1 Pres STAT 4:ClrList Enter 2nd for list L1 Press ENTER Enter data into the list editor Press STAT 1:EDIT Put the data values into list L1 2/15 Measures of the Center of the Data Press STAT and arrow to CALC Press 1:1-VarStats Press 2nd for L1 and then ENTER Press the down and up arrow keys to scroll ¯ x = 23.6, M = 24 Try It The following data show the number of months patients typically wait on a transplant list before getting surgery The data are ordered from smallest to largest Calculate the mean and median • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 7 7 8 9 10 10 10 10 10 11 12 12 13 14 14 15 15 17 17 18 19 19 19 3/15 Measures of the Center of the Data • • • • • • • • • 21 21 22 22 23 24 24 24 24 Mean: + + + + + + + + + + + 10 + 10 + 10 + 10 + 10 + 11 + 12 + 12 + 13 + 14 + 14 + 15 + 15 + 17 + 17 + 18 + 19 + 19 + 19 + 21 + 21 + 22 + 22 + 23 + 24 + 24 + 24 = 544 544 39 = 13.95 Median: Starting at the smallest value, the median is the 20th term, which is 13 Suppose that in a small town of 50 people, one person earns $5,000,000 per year and the other 49 each earn $30,000 Which is the better measure of the "center": the mean or the median? ¯ x= 5, 000, 000 + 49(30, 000) 50 = 129,400 M = 30,000 (There are 49 people who earn $30,000 and one person who earns $5,000,000.) The median is a better measure of the "center" than the mean because 49 of the values are 30,000 and one is 5,000,000 The 5,000,000 is an outlier The 30,000 gives us a better sense ... I NTERNATIONAL J OURNAL OF E NERGY AND E NVIRONMENT Volume 4, Issue 1, 2013 pp.59-72 Journal homepage: www.IJEE.IEEFoundation.org ISSN 2076-2895 (Print), ISSN 2076-2909 (Online) ©2013 International Energy & Environment Foundation. All rights reserved. Changes of temperature data for energy studies over time and their impact on energy consumption and CO 2 emissions. The case of Athens and Thessaloniki – Greece K. T. Papakostas 1 , A. Michopoulos 1 , T. Mavromatis 2 , N. Kyriakis 1 1 Process Equipment Design Laboratory, Mechanical Engineering Department, Energy Division, Aristotle University of Thessaloniki - 54124 Thessaloniki - Greece. 2 Department of Meteorology-Climatology, School of Geology, Faculty of Sciences, Aristotle University of Thessaloniki - 54124 Thessaloniki - Greece. Abstract In steady-state methods for estimating energy consumption of buildings, the commonly used data include the monthly average dry bulb temperatures, the heating and cooling degree-days and the dry bulb temperature bin data. This work presents average values of these data for the 1983-1992 and 1993-2002 decades, calculated for Athens and Thessaloniki, determined from hourly dry bulb temperature records of meteorological stations (National Observatory of Athens and Aristotle University of Thessaloniki). The results show that the monthly average dry bulb temperatures and the annual average cooling degree-days of the 1993-2002 decade are increased, compared to those of the 1983-1992 decade, while the corresponding annual average heating degree-days are reduced. Also, the low temperature bins frequency results decreased in the 1993-2002 decade while the high temperature ones increased, compared to the 1983-1992 decade. The effect of temperature data variations on the energy consumption and on CO 2 emissions of buildings was examined by calculating the energy demands for heating and cooling and the CO 2 emissions from diesel-oil and electricity use of a typical residential building-model. From the study it is concluded that the heating energy requirements during the decade 1993-2002 were decreased, as compared to the energy demands of the decade 1983-1992, while the cooling energy requirements were increased. The variations of CO 2 emissions from diesel oil and electricity use were analog to the energy requirements alterations. The results indicate a warming trend, at least for the two regions examined, which affect the estimation of heating and cooling demands of buildings. It, therefore, seems obvious that periodic adaptation of the temperature data used for building energy studies is required. Copyright © 2013 International Energy and Environment Foundation - All rights reserved. Keywords: Climate change; Cooling; CO 2 emissions; Degree-days; Energy consumption in buildings; Heating; Steady-state methods; Temperature data. 1. Introduction A climate change seems to be in progress and there is strong evidence that it will continue in the forthcoming decades. Obviously, this change affects the temperature RESEARCH Open Access Instrument development, data collection, and characteristics of practices, staff, and measures in the Improving Quality of Care in Diabetes (iQuaD) Study Martin P Eccles 1* , Susan Hrisos 1 , Jill J Francis 2 , Elaine Stamp 1 , Marie Johnston 3 , Gillian Hawthorne 4 , Nick Steen 1 , Jeremy M Grimshaw 5 , Marko Elovainio 6 , Justin Presseau 1 and Margaret Hunter 1 Abstract Background: Type 2 diabetes is an increasingly prevalent chronic illness and an important cause of avoidable mortality. Patients are managed by the integrated activities of clinical and non-clinical mem bers of primary care teams. This study aimed to: investigate theoretically-based organisational, team, and individual factors determining the multiple behaviours needed to manage diabetes; and identify multilevel determinants of different diabetes management behaviours and potential interventions to improve them. This paper describes the instrument development, study recruitment, characteristics of the study participating practices and their constituent healthcare professionals and administrative staff and reports descriptive analyses of the data collected. Methods: The study was a predictive study over a 12-month period. Practices (N = 99) were recruited from within the UK Medical Research Council General Practice Research Framework. We identified six behaviours chosen to cover a range of clinical activities (prescribing, non-prescribing), reflect decisions that were not necessarily straightforward (controlling blood pressure that was above target despite other drug treatment), and reflect recommended best practice as described by national guidelines. Practice attributes and a wide range of individually reported measures were assessed at baseline; measures of clinical outcome were collected over the ensuing 12 months, and a number of proxy measures of behaviour were collected at baseline and at 12 months. Data were collected by telephone interview, postal questionnaire (organisational and clinical) to practice staff, postal questionnaire to patients, and by computer data extraction query. Results: All 99 practices completed a telephone interview and responded to baseline questionnaires. The organisational questionnaire was completed by 931/1236 (75.3%) administrative staff, 423/529 (80.0%) primary care doctors, and 255/314 (81.2%) nurses. Clinical questionnaires were completed by 326/361 (90.3%) primary care doctors and 163/186 (87.6%) nurses. At a practice level, we achieved response rates of 100% from clinicians in 40 practices and > 80% from clinicians in 67 practices. All measures had satisfactory internal consistency (alpha coefficient range from 0.61 to 0.97; Pearson correlation coefficient (two item measures) 0.32 to 0.81); scores were generally consistent with good practice. Measures of behaviour showed relatively high rates of performance of the six behaviours, but with considerable variability within and across the behaviours and measures. Discussion: We have assembled an unparalleled data set from clinicians reporting on their cognitions in relation to the performance of six clinical behaviours involved in the management of people with one chronic disease (diabetes mellitus), using a range of organisational and individual level measures as well as information on the * Correspondence: martin.eccles@ncl.ac.uk 1 Institute of Health and Society, Newcastle University, Baddiley-Clark Building, Richardson Road, Newcastle upon Tyne, NE2 4AX, UK Full list of author information is available at the end of the article Eccles et al . Implementation Science 2011, 6:61 http://www.implementationscience.com/content/6/1/61 Implementation Science © 2011 Eccles et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://crea tivecommons.org/licenses/by/2 .0), which permits unrestricted use, distribution, and reprod uction in any medium, provided the origina l Measures of the Location of the Data Measures of the Location of the Data By: OpenStaxCollege The common measures of location are quartiles and percentiles Quartiles are special percentiles The first quartile, Q1, is the same as the 25th percentile, and the third quartile, Q3, is the same as the 75th percentile The median, M, is called both the second quartile and the 50th percentile To calculate quartiles and percentiles, the data must be ordered from smallest to largest Quartiles divide ordered data into quarters Percentiles divide ordered data into hundredths To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score Percentiles are useful for comparing values For this reason, universities and colleges use percentiles extensively One instance in which colleges and universities use percentiles is when SAT results are used to determine a minimum testing score that will be used as an acceptance factor For example, suppose Duke accepts SAT scores at or above the 75th percentile That translates into a score of at least 1220 Percentiles are mostly used with very large populations Therefore, if you were to say that 90% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant The median is a number that measures the "center" of the data You can think of the median as the "middle value," but it does not actually have to be one of the observed values It is a number that separates ordered data into halves Half the values are the same number or smaller than the median, and half the values are the same number or larger For example, consider the following data 1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; Ordered from smallest to largest: 1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5 1/21 Measures of the Location of the Data Since there are 14 observations, the median is between the seventh value, 6.8, and the eighth value, 7.2 To find the median, add the two values together and divide by two 6.8 + 7.2 =7 The median is seven Half of the values are smaller than seven and half of the values are larger than seven Quartiles are numbers that separate the data into quarters Quartiles may or may not be part of the data To find the quartiles, first find the median or second quartile The first quartile, Q1, is the middle value of the lower half of the data, and the third quartile, Q3, is the middle value, or median, of the upper half of the data To get the idea, consider the same data set: 1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5 The median or second quartile is seven The lower half of the data are 1, 1, 2, 2, 4, 6, 6.8 The middle value of the lower half is two 1; 1; 2; 2; 4; 6; 6.8 The number two, which is part of the data, is the first quartile One-fourth of the entire sets of values are the same as or less than two and three-fourths of the values are more than two The upper half of the data is 7.2, 8, 8.3, 9, 10, 10, 11.5 The middle value of the upper half is nine The third quartile, Q3, is nine Three-fourths (75%) of the ordered data set are less than nine One-fourth (25%) of the ordered data set are greater than nine The third quartile is part of the data set in this example The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data It is the difference between the third quartile (Q3) and the first quartile (Q1) IQR = Q3 – Q1 The IQR can help to determine potential outliers A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile Potential outliers always require further investigation NOTE 2/21 Measures of the Location of the Data A potential outlier is a data point that is significantly different ESSENTIAL BUSINESS KNOWLEDGE (BEK)THE SIX MEASURES OF INCOMEOF THE U.S. DEPARTMENT OF COMMERCE. Nguyen Tran Bich Ngoc1. Gross domestic product (GDP) is the market value of all final goods and services produced within a country in a given period of time.2. Gross national product (GNP) is the total income earned by a nation’s permanent residents (called nationals). It differ from GDP by including income that our citizens earn abroad and excluding income that foreigners earn here. For example, when a Canadian citizen works temporally in the United States, his production is part of U.S. GDP, but it is not part of U.S. GNP.( It is part of Canada’s GNP). For most countries, including the United States, domestic residents are responsible for most domestic production, so GDP and GNP are quite close.3. Net national product (NNP) is the total income of a nation’s resident (GNP) minus losses from depreciation. Depreciation is the wear and tear on the economy’s stock of equipment and structures, such as trucks rusting and lightbulbs burning out. In the national income accounts prepared by the Department of Commerce, depreciation is called the “ consumption of fixed capital”.4. National income is the total income earned by nation’s residents in the production of goods and services, it differs from Net national product by excluding indirect business taxes (such as sales taxes) and including business subsidies. NNP and national income also differ because of a “statistical discrepancy” that arises from problem in data collection.5. Personal income is the income that households and non-corporate businesses receive. Unlike national income, it excludes retained earnings, which is income that corporations have earned but have not paid out to their owners. It also subtracts corporate income taxes and contributions for social insurance (mostly Social Security taxes). In addition, personal income includes the interest income that households receive from their holding of government debt and the income that households receive from government transfer programs, such as welfare and Social Security.6. Disposable personal income is the income that households and Measures of the Spread of the Data Measures of the Spread of the Data By: OpenStaxCollege An important characteristic of any set of data is the variation in the data In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean The most common measure of variation, or spread, is the standard deviation The standard deviation is a number that measures how far data values are from their mean The standard deviation • provides a numerical measure of the overall amount of variation in a data set, and • can be used to determine whether a particular data value is close to or far from the mean The standard deviation provides a measure of the overall variation in a data set The standard deviation is always positive or zero The standard deviation is small when the data are all concentrated close to the mean, exhibiting little variation or spread The standard deviation is larger when the data values are more spread out from the mean, exhibiting more variation Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket A and supermarket B the average wait time at both supermarkets is five minutes At supermarket A, the standard deviation for the wait time is two minutes; at supermarket B the standard deviation for the wait time is four minutes Because supermarket B has a higher standard deviation, we know that there is more variation in the wait times at supermarket B Overall, wait times at supermarket B are more spread out from the average; wait times at supermarket A are more concentrated near the ... sense of the middle of the data Another measure of the center is the mode The mode is the most frequent value There can be more than one mode in a data set as long as those values have the same... in the ordered data If the total number of data values is 100, then n + 100 + = = 50.5 The median occurs midway between the 50th and 51st values The location of the median and the value of the. .. not the same The upper case letter M is often used to represent the median The next example illustrates the location of the median and the value of the median AIDS data indicating the number of

Ngày đăng: 31/10/2017, 16:48

Xem thêm