Chapter 7 Analyzing quantitative data Chapter 7 Quantitative Research Methods • Raw quantitative data, that haven’t been processed or analyzed, convey very little meaning to most people • For these da[.]
Chapter 7: Quantitative Research Methods 10/6/2023 • Raw quantitative data, that haven’t been processed or analyzed, convey very little meaning to most people • For these data into useful information, they need to be processed • Quantitative data refer to all numerical primary and secondary data and can help the researcher to answer research questions and meet objectives V.T.P.Mai- FIE- maivp@ftu.edu.vn Chapter 7: Quantitative Research Methods 10/6/2023 7.1 Preparing, checking and inputting data 7.2 Exploring and presenting the data 7.3 Describing data with use of statistics 7.4 Explore relationships, differences and trends using statistics V.T.P.Mai- FIE- maivp@ftu.edu.vn 7.1 Preparing, checking and inputting data Types of data: • Quantitative data can be divided in two groups: categorical data and numerical data Categorical data are those whose values cannot be measured numerically but can be classified into sets/categories according to the characteristics that describe or identify the variable, or they could be placed in rank order • There are two types of data: • Descriptive/nominal data – these data can simply count the number of occurrences in each category of a variable When a variable is divided into two categories (female/male for example) than the data are known as dichotomous data • Ranked/ordinal data – these are data that are a more precise form than categorical data An example of ranked data may be answers to rating or scale questions 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 7.1 Preparing, checking and inputting data • Alternatively, numerical data are those whose values are numerically measured or counted as quantities (Berman 2008) • Numerical data are therefore more precise than categorical ones because one can assign each data value a position on a numerical scale • Numerical data can be subdivided in two ways: based on interval and ratio data: or based on continuous or discrete data • Interval data can state the difference (interval) between any two data values of a certain variable, whereas ratio data can calculate the relative difference (ratio) between any two data values of a certain variable • Continuous data are those whose values can take any value (given that you measure them accurately) while discrete data can be measured precisely (often whole numbers/integers) 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 7.1 Preparing, checking and inputting data • After determining the types of data that are to be collected the researcher can start to enter the data into data computer data processing software (RSS/EXCELL) • To this the data need to be coded using numerical codes This enables the researcher to enter the data quickly with fewer errors • When this is done the data should be checked for errors 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 7.2 Exploring and presenting the data • Turkey’s (1977) exploratory data analysis (EDA) is a useful approach to start the analysis of quantitative data This approach focuses on the use of diagrams to explore and understand the data Sometimes it might be possible that this approach enables you to look at other relationships in data, which your research was not designed to test • When looking at the collected data it is best to explore specific values, highest and lowest values, trends over time, proportions and distributions • Once these have been explored one can start to compare them and look for (causal) relationships between variables) 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 7.2 Exploring and presenting the data Exploring variables 10/6/2023 Shapes of diagrams V.T.P.Mai- FIE- maivp@ftu.edu.vn Comparing variables 7.2 Exploring and presenting the data • Exploring variables: • The easiest way of summarizing the data is by using tables However, tables not demonstrate visual significance to highest or lowest values so it may be that diagrams are a better option for summarizing the data • Another way to present data is by using a bar chart, where the height or length of each bar represents the frequency of occurrence • Bar charts are similar to histograms, another type of data presenting, where the area of each bar represents the frequency of occurrence and where the continuous nature of the data is emphasized by the absence of gaps between bars 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 2.2 Exploring and presenting the data 10/6/2023 • Exploring variables: • Finally, a pictogram, also like a bar chart, shows a series of pictures chosen to represent the data Other kind of data presentation are: • Line graph – this is a suitable approach when trying to explore a trend • Pie chart – this is a diagram that is divided into proportional segments according to the share each has of the total value V.T.P.Mai- FIE- maivp@ftu.edu.vn 7.2 Exploring and presenting the data • Shapes of diagrams • If a diagram shows a bunching to the left and a long tail to the right then the data are ‘positively skewed’ • If this is the other way around then the data are ‘negatively skewed’ When the data are equally distributed on each side of the highest frequency they are ‘symmetrically skewed’ • A bell-shaped curve is called a normal distribution With the indicator ‘kurtosis’ one can compare a diagrams pointedness or flatness with that of the normal distribution When a distribution is flatter than it is called platykurtic and the kurtosis value is negative When the distribution is more peaked, than it is leptokurtic, and the kurtosis value is positive 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 10 7.2 Exploring and presenting the data • Comparing variables • Contingency tables or cross tabulation are approaches one could use examine the interdependence between variables Other approaches are: • Multiple bar charts - to explore highest and lowest values • Percentage component bar chart – this is used to compare proportions between variables • Multiple line graph – this Is used to compare trends and conjunctions • Stacked bar chart – used to compare totals between variables • Comparative proportional pie chart – this is used to compare proportions of each category or value as well as the totals between variables • Scatter graphs or scatter plots – this diagram is often used to explore the possible relationships between ranked and numerical data variables by plotting one variable against another 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 11 7.3 Describing data with use of statistics • Turkey’s exploratory data analysis approach is a good approach to understand the data using diagrams • Descriptive statistics, on the other hand, enable one to describe the variables numerically They describe a variable focus on the central tendency and the dispersion • Central tendency is measured by general impressions of values that could be seen as common, middling or average These measures are determined by: • The mode – the value that is visible most often • The median – the middle value or mid-point after the data have been ranked • The mean – also known as the average 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 12 7.3 Describing data with use of statistics • The dispersion (how data are distributed around the central tendency) could be described by: • Inter-quartile range – the difference within the middle 50 per cent of values • Standard deviation – extent to which the value differs from the mean • Range – the difference between the lowest and the highest values • Coefficient of variation – this is to compare the relative spread of data between distributions of different magnitudes, for example hundreds of tons with billions of tons (calculated by dividing the standard deviation by the mean and multiply the answer by 100) 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 13 7.4 Explore relationships, differences and trends using statistics • In a research one often wishes to find the relationship between variables • This is called hypothesis testing, where one is actually comparing the collected data with what he expected to happen • There are two general groups of statistical significance tests: the non-parametric tests (used when the data are not normally distributed) and the parametric tests( these are used with numerical data) 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 14 7.4 Explore relationships, differences and trends using statistics Testing for normal distribution Testing for significance Type and errors Exploring the strength of a relationship 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 15 7.4 Explore relationships, differences and trends using statistics • Testing for normal distribution: • A way to test for normality is to use statistics to determine whether the distribution for a variable differs significantly from a comparable normal distribution • This could be done using statistical software that use the Kolmogorov-Smirnov test and the Shapiro-Wilk test A probability of 0.05 means that there is a per cent chance that the data distribution differs from a comparable normal distribution • Thus if the probability is lower than 0,05, the data are not normally distributed 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 16 7.4 Explore relationships, differences and trends using statistics • Testing for significance • If a there is a relationship between variables than the researcher will reject the null hypothesis and accept the alternative hypothesis • It is difficult to obtain a significant test statistic with a small sample, by increasing the sample size more relationships found will be significant • This is because the sample size resembles that of the population from which it was selected 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 17 7.4 Explore relationships, differences and trends using statistics • Type and errors • A Type error occurs when the null hypothesis has been wrongly rejected and the alternative hypothesis should not have been accepted In other words, the researcher states that two variables are related when they are actually not Statististical significance is the same as determining the probability of making a Type error • A Type error is when a researcher does not reject the null hypothesis when he should Thus he states that two variable are not related when they actually are 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 18 7.4 Explore relationships, differences and trends using statistics • Type and errors • When descriptive or numerical data are summarized as a twoway contingency table it is helpful to use a chi square test • A chi square test makes it possible to determine how likely it is that two variables are associated • In order to this test two assumptions should be met: • The categories of the contingency table are mutually exclusive Each observation falls into one category only • Not more than 25 per cent of the cells can have expected values of less than When the table consists of two rows and two columns, no expected values can be less than 10 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 19 7.4 Explore relationships, differences and trends using statistics • Exploring the strength of a relationship • There are two kinds of relationships: • Correlations: this is when a change in one variable leads to a change in another variable, but it is not clear which variable has caused the other to change • Cause-and-effect relationship: when a change in one or more variables cause a change in another variable 10/6/2023 V.T.P.Mai- FIE- maivp@ftu.edu.vn 20