Measures of the Spread of the Data

ESSENTIAL BUSINESS KNOWLEDGE (BEK)THE SIX MEASURES OF INCOMEOF THE U.S. DEPARTMENT OF COMMERCE. Nguyen Tran Bich Ngoc1. Gross domestic product (GDP) is the market value of all final goods and services produced within a country in a given period of time.2. Gross national product (GNP) is the total income earned by a nation’s permanent residents (called nationals). It differ from GDP by including income that our citizens earn abroad and excluding income that foreigners earn here. For example, when a Canadian citizen works temporally in the United States, his production is part of U.S. GDP, but it is not part of U.S. GNP.( It is part of Canada’s GNP). For most countries, including the United States, domestic residents are responsible for most domestic production, so GDP and GNP are quite close.3. Net national product (NNP) is the total income of a nation’s resident (GNP) minus losses from depreciation. Depreciation is the wear and tear on the economy’s stock of equipment and structures, such as trucks rusting and lightbulbs burning out. In the national income accounts prepared by the Department of Commerce, depreciation is called the “ consumption of fixed capital”.4. National income is the total income earned by nation’s residents in the production of goods and services, it differs from Net national product by excluding indirect business taxes (such as sales taxes) and including business subsidies. NNP and national income also differ because of a “statistical discrepancy” that arises from problem in data collection.5. Personal income is the income that households and non-corporate businesses receive. Unlike national income, it excludes retained earnings, which is income that corporations have earned but have not paid out to their owners. It also subtracts corporate income taxes and contributions for social insurance (mostly Social Security taxes). In addition, personal income includes the interest income that households receive from their holding of government debt and the income that households receive from government transfer programs, such as welfare and Social Security.6. Disposable personal income is the income that households and Measures of the Spread of the Data Measures of the Spread of the Data By: OpenStaxCollege An important characteristic of any set of data is the variation in the data In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean The most common measure of variation, or spread, is the standard deviation The standard deviation is a number that measures how far data values are from their mean The standard deviation • provides a numerical measure of the overall amount of variation in a data set, and • can be used to determine whether a particular data value is close to or far from the mean The standard deviation provides a measure of the overall variation in a data set The standard deviation is always positive or zero The standard deviation is small when the data are all concentrated close to the mean, exhibiting little variation or spread The standard deviation is larger when the data values are more spread out from the mean, exhibiting more variation Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket A and supermarket B the average wait time at both supermarkets is five minutes At supermarket A, the standard deviation for the wait time is two minutes; at supermarket B the standard deviation for the wait time is four minutes Because supermarket B has a higher standard deviation, we know that there is more variation in the wait times at supermarket B Overall, wait times at supermarket B are more spread out from the average; wait times at supermarket A are more concentrated near the average 1/27 Measures of the Spread of the Data The standard deviation can be used to determine whether a data value is close to or far from the mean Suppose that Rosa and Binh both shop at supermarket A Rosa waits at the checkout counter for seven minutes and Binh waits for one minute At supermarket A, the mean waiting time is five minutes and the standard deviation is two minutes The standard deviation can be used to determine whether a data value is close to or far from the mean Rosa waits for seven minutes: • Seven is two minutes longer than the average of five; two minutes is equal to one standard deviation • Rosa's wait time of seven minutes is two minutes longer than the average of five minutes • Rosa's wait time of seven minutes is one standard deviation above the average of five minutes Binh waits for one minute • One is four minutes less than the average of five; four minutes is equal to two standard deviations • Binh's wait time of one minute is four minutes less than the average of five minutes • Binh's wait time of one minute is two standard deviations below the average of five minutes • A data value that is two standard deviations from the average is just on the borderline for what many statisticians would consider to be far from the average Considering data to be far from the mean if it is more than two standard deviations away is more of an approximate "rule of thumb" than a rigid rule In general, the shape of the distribution of the data affects how much of the data is further away than two standard deviations (You will learn more about this in later chapters.) The number line may help you understand standard deviation If we were to put five and seven on a number line, seven is to the right of five We say, then, that seven is one standard deviation to the right of five because + (1)(2) = If one were also part of the data set, then one is two standard deviations to the left of five because + (–2)(2) = • In general, a value = mean + (#ofSTDEV)(standard deviation) • where #ofSTDEVs = the number of standard deviations • #ofSTDEV does not need to be an integer 2/27 Measures of the Spread of the Data • One is two standard deviations less than the mean of five because: = + (–2)(2) The equation value = mean + (#ofSTDEVs)(standard deviation) can be expressed for a sample and for a population ¯ • sample: x = x + ( # ofSTDEV)(s) • Population: x = μ + ( # ofSTDEV)(σ) The lower case letter s represents the sample standard deviation and the Greek letter σ (sigma, lower case) represents the population standard deviation ¯ The symbol x is the sample mean and the Greek symbol μ is the population mean Calculating the Standard Deviation If x is a number, then the difference "x – mean" is called its deviation In a data set, there are as many deviations as there are items in the data set The deviations are used to calculate the standard deviation If the numbers belong to a population, in symbols a ¯ deviation is x – μ For sample data, in symbols a deviation is x – x The procedure to calculate the standard deviation depends on whether the numbers are the entire population or are data from a sample The calculations are similar, but not identical Therefore the symbol used to represent the standard deviation depends on whether it is calculated from a population or a sample The lower case letter s ... I NTERNATIONAL J OURNAL OF E NERGY AND E NVIRONMENT Volume 4, Issue 1, 2013 pp.59-72 Journal homepage: www.IJEE.IEEFoundation.org ISSN 2076-2895 (Print), ISSN 2076-2909 (Online) ©2013 International Energy & Environment Foundation. All rights reserved. Changes of temperature data for energy studies over time and their impact on energy consumption and CO 2 emissions. The case of Athens and Thessaloniki – Greece K. T. Papakostas 1 , A. Michopoulos 1 , T. Mavromatis 2 , N. Kyriakis 1 1 Process Equipment Design Laboratory, Mechanical Engineering Department, Energy Division, Aristotle University of Thessaloniki - 54124 Thessaloniki - Greece. 2 Department of Meteorology-Climatology, School of Geology, Faculty of Sciences, Aristotle University of Thessaloniki - 54124 Thessaloniki - Greece. Abstract In steady-state methods for estimating energy consumption of buildings, the commonly used data include the monthly average dry bulb temperatures, the heating and cooling degree-days and the dry bulb temperature bin data. This work presents average values of these data for the 1983-1992 and 1993-2002 decades, calculated for Athens and Thessaloniki, determined from hourly dry bulb temperature records of meteorological stations (National Observatory of Athens and Aristotle University of Thessaloniki). The results show that the monthly average dry bulb temperatures and the annual average cooling degree-days of the 1993-2002 decade are increased, compared to those of the 1983-1992 decade, while the corresponding annual average heating degree-days are reduced. Also, the low temperature bins frequency results decreased in the 1993-2002 decade while the high temperature ones increased, compared to the 1983-1992 decade. The effect of temperature data variations on the energy consumption and on CO 2 emissions of buildings was examined by calculating the energy demands for heating and cooling and the CO 2 emissions from diesel-oil and electricity use of a typical residential building-model. From the study it is concluded that the heating energy requirements during the decade 1993-2002 were decreased, as compared to the energy demands of the decade 1983-1992, while the cooling energy requirements were increased. The variations of CO 2 emissions from diesel oil and electricity use were analog to the energy requirements alterations. The results indicate a warming trend, at least for the two regions examined, which affect the estimation of heating and cooling demands of buildings. It, therefore, seems obvious that periodic adaptation of the temperature data used for building energy studies is required. Copyright © 2013 International Energy and Environment Foundation - All rights reserved. Keywords: Climate change; Cooling; CO 2 emissions; Degree-days; Energy consumption in buildings; Heating; Steady-state methods; Temperature data. 1. Introduction A climate change seems to be in progress and there is strong evidence that it will continue in the forthcoming decades. Obviously, this change affects the temperature Flash 6X9 / Harness the Power of Big Data: The IBM Big Data Platform / Zikopoulos / 817-5 Harness the Power of Big Data 00-FM.indd 1 04/10/12 12:19 PM Flash 6X9 / Harness the Power of Big Data: The IBM Big Data Platform / Zikopoulos / 817-5 About the Authors Paul C. Zikopoulos, B.A., M.B.A ., is the Director of Technical Professionals for IBM Software Group’s Information Management division and addition- ally leads the World-Wide Competitive Database and Big Data Technical Sales Acceleration teams. Paul is an award-winning writer and speaker with over 19 years of experience in Information Management. In 2012, Paul was chosen by SAP as one of its Top 50 Big Data Twitter influencers (@BigData_ paulz). Paul has written more than 350 magazine articles and 16 books, including Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data; Warp Speed, Time Travel, Big Data, and More: DB2 10 New Fea- tures; DB2 pureScale: Risk Free Agile Scaling; DB2 Certification for Dummies; and DB2 for Dummies. In his spare time, Paul enjoys all sorts of sporting activities, including running with his dog Chachi, avoiding punches in his MMA train- ing, and trying to figure out the world according to Chloë—his daughter. You can reach him at: paulz_ibm@msn.com. Dirk deRoos, B.Sc., B.A., is IBM’s World-Wide Technical Sales Leader for IBM InfoSphere BigInsights. Dirk spent the past two years helping customers with BigInsights and Apache Hadoop, identifying architecture fit, and advis- ing early stage projects in dozens of customer engagements. Dirk recently coauthored a book on this subject area, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data (McGraw-Hill Professional, 2012). Prior to this, Dirk worked in the IBM Toronto Software Development Lab on the DB2 database development team where he was the Information Architect for all of the DB2 product documentation. Dirk has earned two degrees from the University of New Brunswick in Canada: a Bachelor of Computer Sci- ence, and a Bachelor of Arts (Honors English). You can reach him at: dirk .ibm@gmail.com or on Twitter at @Dirk_deRoos. Krishnan Parasuraman, B.Sc., M.Sc., is part of IBM’s Big Data industry solutions team and serves as the CTO for Digital Media. In his role, Krishnan works very closely with customers in an advisory capacity, driving Big Data solution architectures and best practices for the management of Internet- scale analytics. He is an authority on the use of Big Data technologies, such as Hadoop and MPP data warehousing platforms, for solving analytical problems in the online digital advertising, customer intelligence, and 00-FM.indd 2 04/10/12 12:19 PM Flash 6X9 / Harness the Power of Big Data: The IBM Big Data Platform / Zikopoulos / 817-5 real-time marketing space. He speaks regularly at industry events and writes for trade publications and blogs. Prior to his current role, Krishnan worked in research, product development, consulting, and technology marketing across multiple disciplines within information management. Krishnan has enabled data warehousing and customer analytics solutions for large media and consumer electronics organizations, such as Apple, Microsoft, and Kodak. He holds an M.Sc. degree in computer science from the University of Georgia. You can keep up with his musings on Twitter @kparasuraman. Thomas Deutsch, RESEARCH Open Access Instrument development, data collection, and characteristics of practices, staff, and measures in the Improving Quality of Care in Diabetes (iQuaD) Study Martin P Eccles 1* , Susan Hrisos 1 , Jill J Francis 2 , Elaine Stamp 1 , Marie Johnston 3 , Gillian Hawthorne 4 , Nick Steen 1 , Jeremy M Grimshaw 5 , Marko Elovainio 6 , Justin Presseau 1 and Margaret Hunter 1 Abstract Background: Type 2 diabetes is an increasingly prevalent chronic illness and an important cause of avoidable mortality. Patients are managed by the integrated activities of clinical and non-clinical mem bers of primary care teams. This study aimed to: investigate theoretically-based organisational, team, and individual factors determining the multiple behaviours needed to manage diabetes; and identify multilevel determinants of different diabetes management behaviours and potential interventions to improve them. This paper describes the instrument development, study recruitment, characteristics of the study participating practices and their constituent healthcare professionals and administrative staff and reports descriptive analyses of the data collected. Methods: The study was a predictive study over a 12-month period. Practices (N = 99) were recruited from within the UK Medical Research Council General Practice Research Framework. We identified six behaviours chosen to cover a range of clinical activities (prescribing, non-prescribing), reflect decisions that were not necessarily straightforward (controlling blood pressure that was above target despite other drug treatment), and reflect recommended best practice as described by national guidelines. Practice attributes and a wide range of individually reported measures were assessed at baseline; measures of clinical outcome were collected over the ensuing 12 months, and a number of proxy measures of behaviour were collected at baseline and at 12 months. Data were collected by telephone interview, postal questionnaire (organisational and clinical) to practice staff, postal questionnaire to patients, and by computer data extraction query. Results: All 99 practices completed a telephone interview and responded to baseline questionnaires. The organisational questionnaire was completed by 931/1236 (75.3%) administrative staff, 423/529 (80.0%) primary care doctors, and 255/314 (81.2%) nurses. Clinical questionnaires were completed by 326/361 (90.3%) primary care doctors and 163/186 (87.6%) nurses. At a practice level, we achieved response rates of 100% from clinicians in 40 practices and > 80% from clinicians in 67 practices. All measures had satisfactory internal consistency (alpha coefficient range from 0.61 to 0.97; Pearson correlation coefficient (two item measures) 0.32 to 0.81); scores were generally consistent with good practice. Measures of behaviour showed relatively high rates of performance of the six behaviours, but with considerable variability within and across the behaviours and measures. Discussion: We have assembled an unparalleled data set from clinicians reporting on their cognitions in relation to the performance of six clinical behaviours involved in the management of people with one chronic disease (diabetes mellitus), using a range of organisational and individual level measures as well as information on the * Correspondence: martin.eccles@ncl.ac.uk 1 Institute of Health and Society, Newcastle University, Baddiley-Clark Building, Richardson Road, Newcastle upon Tyne, NE2 4AX, UK Full list of author information is available at the end of the article Eccles et al . Implementation Science 2011, 6:61 http://www.implementationscience.com/content/6/1/61 Implementation Science © 2011 Eccles et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://crea tivecommons.org/licenses/by/2 .0), which permits unrestricted use, distribution, and reprod uction in any medium, provided the origina l Measures of the Location of the Data Measures of the Location of the Data By: OpenStaxCollege The common measures of location are quartiles and percentiles Quartiles are special percentiles The first quartile, Q1, is the same as the 25th percentile, and the third quartile, Q3, is the same as the 75th percentile The median, M, is called both the second quartile and the 50th percentile To calculate quartiles and percentiles, the data must be ordered from smallest to largest Quartiles divide ordered data into quarters Percentiles divide ordered data into hundredths To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score Percentiles are useful for comparing values For this reason, universities and colleges use percentiles extensively One instance in which colleges and universities use percentiles is when SAT results are used to determine a minimum testing score that will be used as an acceptance factor For example, suppose Duke accepts SAT scores at or above the 75th percentile That translates into a score of at least 1220 Percentiles are mostly used with very large populations Therefore, if you were to say that 90% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant The median is a number that measures the "center" of the data You can think of the median as the "middle value," but it does not actually have to be one of the observed values It is a number that separates ordered data into halves Half the values are the same number or smaller than the median, and half the values are the same number or larger For example, consider the following data 1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; Ordered from smallest to largest: 1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5 1/21 Measures of the Location of the Data Since there are 14 observations, the median is between the seventh value, 6.8, and the eighth value, 7.2 To find the median, add the two values together and divide by two 6.8 + 7.2 =7 The median is seven Half of the values are smaller than seven and half of the values are larger than seven Quartiles are numbers that separate the data into quarters Quartiles may or may not be part of the data To find the quartiles, first find the median or second quartile The first quartile, Q1, is the middle value of the lower half of the data, and the third quartile, Q3, is the middle value, or median, of the upper half of the data To get the idea, consider the same data set: 1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5 The median or second quartile is seven The lower half of the data are 1, 1, 2, 2, 4, 6, 6.8 The middle value of the lower half is two 1; 1; 2; 2; 4; 6; 6.8 The number two, which is part of the data, is the first quartile One-fourth of the entire sets of values are the same as or less than two and three-fourths of the values are more than two The upper half of the data is 7.2, 8, 8.3, 9, 10, 10, 11.5 The middle value of the upper half is nine The third quartile, Q3, is nine Three-fourths (75%) of the ordered data set are less than nine One-fourth (25%) of the ordered data set are greater than nine The third quartile is part of the data set in this example The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data It is the difference between the third quartile (Q3) and the first quartile (Q1) IQR = Q3 – Q1 The IQR can help to determine potential outliers A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile Potential outliers always require further investigation NOTE 2/21 Measures of the Location of the Data A potential outlier is a data point that is significantly different ... = the ending value; No data values fall on an interval boundary 8/27 Measures of the Spread of the Data The long left whisker in the box plot is reflected in the left side of the histogram The. .. the distribution of the data is: At least 75% of the data is within two standard deviations of the mean At least 89% of the data is within three standard deviations of the mean At least 95% of. .. were asked the number of pairs of sneakers they owned Let X = the number of pairs of sneakers owned The results are as follows: X Frequency 2 12 12 21/27 Measures of the Spread of the Data X Frequency

Định dạng
Số trang	27
Dung lượng	724,76 KB