Opening a Data File: From the menu choose File - Open In the open file dialog box, select the file you want to open, and click open Saving Data Files From the menus choose: File – Sav
Trang 1Development Pioneers Company for Consultations
Gaza City, Al-Shohada Street, 6th floor Palestine building,
Tel/Fax: 2888781
info@pioneer.ps
Trang 2Doing Data Analysis
With SPSS
Prepared by:
Dr: Nafez M Barakat
2012 - 2013
Trang 3Table of Contents
Purpose of Training Program 4
Training Methodology 5
How to run SPSS program 6
Measurement of the variables 10
Displaying Distributions with Graphs 13
One variable Descriptive Statistics 21
Bivariate Correlations 42
Selecting cases 47
Inference for Distributions 51
Inference for Two Population Mean 66
Goodness of Fit Test (A Multinomial Population) 124
Inferential methods in regression and correlation 128
Trang 4Purpose of Training Program
Introduction:
Statistical Package for the Social Sciences (SPSS) is a computer programme used for survey authoring and deployment, data mining, text analytics, statistical analysis, and collaboration SPSS is among the most widely used programmes for statistical analysis in social science It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others
Aim of training program
to perform statistical analysis effectively using (SPSS)
Course Objectives
At the end of the training participants will able to:
Sciences (SPSS)
sources
analysis and Analysis of Variance for analysis of data to secure outputs
Trang 5Training Methodology
SPSS training programs will be conducted in a concise, easy-to-understand way that is exciting, rewarding, interactive and job-application oriented It is aimed at improving and developing trainees’ skills through the use of a variety of training methods such as:
Trang 6How to run SPSS program
Click the left mouse on Start button and select Programs, and from the list choose SPSS 14.0 for Windows Waite a few minutes before the program is ready for use
To Show or Hide a Toolbar:
From the menu choose: View – toolbar
Menu bar
Tool bar
Variables Cases
Cell editor
Row number
Variable name Active cell
Trang 7Opening a Data File:
From the menu choose File - Open
In the open file dialog box, select the file you want to open, and click open
Saving Data Files
From the menus choose:
File – Save As
Select a file type from the drop-down list (SPSS (*.sav))
Enter a file name for the new data file
Basic Steps for Data Analysis:
Analyzing data with SPSS is easy All you want to do is:
1 Get data into SPSS
2 Select a procedure
3 Select the variables for the analysis
4 Run the procedure and look at the results
Entering Data into the Data Editor:
Many of the features of the data Editor are similar to those found un spreadsheet applications There are, however, several important distinctions:
Rows are cases: each row represents a case or observation For example, each individual respondent a questionnaire is a case
Columns are variables: each column represents a variable or characteristic being measured For example, each item on a questionnaire is a variable
Cells contain values: each cell contains a single value of a variable for a case The cell is the intersection of the case and variable
The data file is rectangular: the dimension of the data file are determined by the number of cases and variables
Example: if we have some questions in a questionnaire like that:
Gender male female
Job cat clerical custodial manager
Salary $ ………
Enter the data for these questions above in SPSS Data Editor:
At the bottom of the data editor click on the tab Variable View, a different grid
appears, with these column headings:
Trang 8Under Name enter the variable name gender for the first question, jobcat for the second question, and salary for the third question
Rules apply to variable names:
the name must begin with a letter, the remaining characters can be any letter, any digit, a period, or the symbols @, #,_ , $
Variable names can't end with a period
Blanks and special characters (for example , !, ?, ', and *) cannot be used
Each variable name must be unique, duplication is not allowed Variable name
are not case sensitive The name gender, GENDER, gender is all identical in
SPSS
Some preserved word in SPSS not allowed like not, and, or…
Define variable type:
Click on a small gray button marked with three dots in the type column, you will see this dialog box
The available data type are: numeric, coma, dot, scientific notation, date, dollar, custom currency, and string
The custom currency format CCA, CCB, CCC, CCD, and CCE are defined in the Currency tab of the Options dialog box, accessed from the edit menu
We select Numeric with width equal 8 digits and Decimal places 0 digit for the
variables (jobcat and salary), and string for the variable (gender)
You can change the width and decimal places from the columns named by width and column
Define Labels:
Define Label provides descriptive variables and can be up to 250 characters long, and these descriptive labels are display in output We write gender, employment category, and current salary for the three our variables
Trang 9For the variable jobcat type 1 in the value box and type clerical in the value label box, click add Then type 2 in the value box and type custodial in the value label box, click add, and Then type 3 in the value box and type manager in the value label box, click add
The salary variable is quantitative variable and no value label allowed to it
Define missing values:
Click on a small gray button marked with three dots in the values column, you will see this dialog box
Trang 10Define missing values defined specified data as user – missing, and that missing values are excluded from the calculations
You can enter up three discreet (individual) missing values, a range of missing values, or range plus one distinct value
Ranges can only be specified for numeric variables
You cannot define missing values for long string variables
Define Column Format:
You can defined the width of the column by clicking the mouse on column named by columns, we choose the column width for the three variables equal to 8, and click the align column and choose center
Measurement of the variables
Click the mouse on the column named by measure, and choose
nominal for the variable gender, order for the variable jobcat, and
scale for the variable salary
You can now click on data view and entering the data like that:
Trang 11
Inserting new cases:
To insert new cases between existing cases:
Select any cell in the case (row) below the position where uou want to insert the new case
From the menu choose: data> insert case
A new row is inserted for the case and all variable receive the system- missing value
Inserting new variable:
To insert a new variable between existing variables:
Select any cell in the variable (column) to the right of the position where you want to insert the new variable
From the menu choose: data> insert variable
A new variable is inserted with the system- missing value for all cases
Moving or remove Variables:
Click on the variable you want to remove it at the top of the column
From the menu choose: Edit> Cut
If you want to move the variable, choose: Edit> Pass
Go To Case:
To Go to Case in the data editor:
Make the data editor the active window
From the menu choose : data > Go to Case
Enter the data row number for the case and click OK
Search for Data:
To find a data value in the data editor
Trang 12 Select any cell in the column of the variable you want to search
From the menu choose: edit> find
Enter the data you want to find
Click Fined Next
Opening a Data File:
Choose File > Open > data
A dialog box like the one shown below:
Trang 13Select the appropriate directory for your system and you will see a list of available worksheet files Select the one named employee data, and then click Open
Displaying Distributions with Graphs
The frequency tables and bar charts and pie chars used only for qualitative data or for small data set
The stem-plots, histograms, and time plots will be used for quantitative variables
Frequency Tables:
To create a frequency table for a categorical variable, follow these steps
Click analyze> Descriptive Statistics> frequencies, the Frequencies dialog box appears
Click gender from the left rectangle to move it to right rectangle named by variables (s):
Trang 14 Click on chart button, the chart dialog box appears below, click on bar chart and from the Chart Values click on Frequency, finally click Continue
We return to frequency dialog box, click OK, the resulting SPSS for Windows output appears
Frequencies
[DataSet1] D : \ Program Files\SPSSEval\Employee data.sav
This table summarizes how many observations we have in the dataset; her there are
474 observations, we have a valid data value, and there is no missing data
Statistics
Gender
474 0
Valid Missing N
Trang 15In this table 216 or 45.6% of employee are female, and 258 or 54.4% of employee are male, the cumulative percent is the percentage of the current category plus the percent
of the categories above it
Q What is the difference between percent and the valid percent?
Valid
Frequency Percent Valid Percent
Cumulativ e Percent
The graph below show the bar chart for gender, and the height of each bar chart represent the frequency of the employee
Male Female
Trang 16 Double click on the bar chart in the output-SPSS We get the following chart editor
choose : element > show data label as illustrated below:
The dialog box appear as shown below Click on text style, write 14 in preferred size box, and then click apply
Trang 17 Click file >close to close the chart editor, then the following graph appear below
Male Female
Trang 18Compare between the salary of female and male using bar charts:
graph > interactive > Bar the following dialog box appear
complete the dialog box as shown below, and click OK
Trang 19Bars show Means
Q from the bar chart above: did the males or female have a better salary? Why?
Another method for comparing the mean of the salary between male and female
Graph > Bar, the following dialog appear
Choose Simple and Summaries for groups of cases, click on the button
marked define:
Trang 20 Complete the dialog box as shown below and if you are interested in including a title or a footnote on the chart , click Titles and type in the desired information, click continue, return to the original dialog box, click ok
A new window appears, containing bar charts
Male Female
Trang 21One variable Descriptive Statistics
The Frequency procedure provides statistics and histogram graph for quantitative variables as the following:
Analyze > descriptive statistics > frequencies, the following dialog box appear
Move the current salary to rectangle named by variable(s)
Click on the button marked by Statistics, the following dialog appear below; complete the dialog box as shown below, click continue to return to the original dialog box
Percentile values: Values of a salary (quantitative variable) that divide the ordered data into groups so that a certain percentages are above and another
Trang 22percentage is below Quartiles (the 25th, 50th, and 75th percentiles) divided the observations into four groups of equal size
If you want an equal number of groups other than four, select cut points for n equal groups You can also specify individual percentages (for example , the 77th percentile the value below which 77% of the observations fall)
Central Tendency Statistics that describe the location of the location of the distribution, you can select the mean, median, and mode, or the sum of all the values
Dispersion: Statistics that measure the amount of variation or spread in the data You can select the Std deviation (Slandered deviation), variance, range, minimum, maximum, or S.E.mean (standard error of the mean)
Distribution: Statistical that describe the shape and symmetry of the distribution, you can select skewness or kurtosis These statistics are displayed with their standard errors
Click on the button marked by charts, then click on histogram and on click on the box with normal curve, click continue to return to the original dialog box
A histogram also has bars, but they are plotted along an equal interval scale The height of each bar is the count of values of quantitative variables falling within the interval The histogram shows the shape, center, and spread of the distribution A normal curve superimposed on a histogram helps you judge whether the data are normally distributed
Click OK to get the following results:
Trang 231 Frequencies
This table shows the following results:
Q1 Is the distribution symmetric, skewed to the right, skewed to the left? Why?
Q2 Find the IQR (Inter Quartile Range = Q3 – Q1 = P75 – P25)
Q3 You prefer to use range or IQR In this example to determine the dispersion of the data, and why?
Statistics
Current Salary
474 0
2.125 112 5.378 224
Trang 24When z scores are saved, they are added to the data in the data editor and are available for SPSS charts, data listings, and analysis When variables are recorded in different units (for example, salary, education, and experience) a z-score transformation places variables on a common scale for easier visual comparison
Example: Open the employee data.sav., and find the descriptive for the current salary and discuss the results
To obtain descriptive statistics:
*From the menus choose: analyze > descriptive statistics > descriptives ,
The descriptive dialog box appears, click on the current salary to move in the rectangle named by variable (s), and click on the box "save standardized values as variables, as shown below
Trang 25 Optionally you can click Options for optional statistics and display order, as shown in the descriptive options dialog box below
Click continue to return to the descriptive dialog box, then click OK, to get these result
Descriptives
[DataSet1] D : \ Program Files\SPSSEval\Employee data.sav
Trang 26$784.311 112 224
N Range Minimum Maximum Sum Mean Std Dev iat ion Skewness Kurt osis Mean Skewness Kurt osis
Stat istic
Std Error
Current Salary Valid N (list wise)
Note that we drag an icon from column tray into row tray, and drag an icon from the row tray into the column tray to obtain the result shown above
Frequency tables, percentiles, and other descriptive statistics
Test for normality, including probability plots and Shapiro-Wilk and Lilliefors tests
Leven's test for assessing equality of variances
Robust estimates of location (M-estimators)
Column icon ( salary) Statistics
type
statistics
statistics
Trang 27Reasons for using the explore procedure:
There are many reasons for using the explore procedure – data screening, outlier identification, description, assumption checking, and characterizing differences among subpopulations (groups of cases) Data screening may show that you have unusual values, extreme values, gaps in data, or other percentiles, exploring the data may indicate that the distribution of the data is normal or not
Example: open the file named employee data.sav
From the menus choose: analyze > descriptive statistics > explore
The following explore dialog box appear, move the salary variable (quantitative variable) to the rectangle named by Dependent list
Click statistics for robust estimator, outliers, percentiles, discriptives, and
95% confidence interval for mean, click continue
Trang 28 Click plots for histograms, stem-and-leaf, normal probability plots with tests, click continue
Click OK, to obtain the following results
Explore
[DataSet1] D : \ Program Files\SPSSEval\Employee data.sav
This table shows that we have 474 valid observations, and no missing value present
The table of descriptives shows several statistics
95% confidence interval for mean (lower and upper bound): a confidence interval is arranged used to estimate a population mean
5% trimmed mean: the 5% trimmed sample mean, computed by omitting the highest and lowest 5% of the sample data
We discussed the other statistics in the previous sections
Case Processing Summary
474 100.0% 0 0% 474 100.0% Current Salary
N Percent N Percent N Percent
Cases
Trang 29Lower Bound Upper Bound
95% Conf idence Interv al f or Mean
5% Trimmed Mean Median
Variance
St d Dev iation Minimum Maximum Range Interquart ile Range Skewness
Kurt osis
Current Salary
St at ist ic St d Error
The table M-Estimators shows alternatives to sample mean for estimating the center
of location The estimators calculated differ in the weights they apply to cases
Huber's M-estimator, Tukey's Biweight, Hampel's M-Estimator, and Andrew's
Wave estimator are displayed
M-Esti mators
$29,434.84 $27,613.71 $28,739.16 $27,599.33 Current Salary
Huber's M-Estimatora
Tukey 's Biweightb
Hampel's M-Estimatorc
Andrews' Wav edThe weighting constant is 1.339.
The table Extreme Values displays the five smallest values and the five largest
values with case labels
Trang 30Lowest
Current Salary
Case Number Value
The table tests of normality displays normal probability and detrended normal probability plots The Kolmogorov-Smirnov statistic, with a Lilliefors significance level for testing normality is displayed A Shapiro-Wilk statistic calculated for samples with 50 or fewer observations The significance level equal 0.00 < 0.05 which means that the distribution of the data is not normal
Tests of Normality
.208 474 000 771 474 000 Current Salary
St at ist ic df Sig St at ist ic df Sig.
Kolmogorov -Smirnova Shapiro-Wilk
Lillief ors Signif icance Correction
a
We have tow plots, a histogram, and stem-and-leaf, we discussed the histogram plot previously, and now we want to discuss the stem-and-leaf plot:
Trang 31q how would you describe the shape of tis distribution ? Compare between histogram and stem=and-leaf, what important difference if any, do you see?
Current Salary Stem-and-Leaf Plot
Frequency Stem & Leaf
Trang 32This plot called normal quartile plot Any data that follow a normal distribution produce a straight line on the normal quartile plot Systematic deviations from a straight line indicate a no normal distribution Outliers appear as points that are far away from the overall pattern of the plot
We note that most points lay far from the straight line, indicating that no normal distribution
120,000 90,000
60,000 30,000
Trang 33This plot called box-plot graph, which illustrate the minimum value, fist quartile (Q1 =P25), second quartile (Q2 = P50), third quartile (Q3 = P75), maximum value, and extreme values (outliers value) [we must distinguishes between minor outliers and major outliers, Minor outliers denoted by o in the plot are observation more than 1.5 IQR outside the central box Major outliers denoted by * in the plot are observations more than 3*IQR outside the central box
75,000 50,000
25,000 0
Trang 34We can compare between two groups of data using Explore data as follows:
Analysis > Descriptive Statistics > Explore
Move salary variable under Dependent list rectangle, and move gender in Factor List rectangle as shown in the dialog box (Explore)
Minimum value
Q1=P25
Q2=P50=Median
Q3=P75
Maximum value
Minor outliersMajor outliers
Trang 35 Click statistics and plot button and choose any statistics you want, to get the following results
N Percent N Percent N Percent
Mean
Lower Bound Upper Bound
95% Conf idence Interv al f or Mean
5% Trimmed Mean Median
Variance
St d Dev iation Minimum Maximum Range Interquart ile Range Skewness
Kurt osis Mean
Lower Bound Upper Bound
95% Conf idence Interv al f or Mean 5% Trimmed Mean Median
Variance
St d Dev iation Minimum Maximum Range Interquart ile Range Skewness
Kurt osis
Gender Female
Male Current Salary
St at ist ic St d Error
Trang 3610 25 50 75 90 95 5 10 25 50 75 90 95
Gender Female
Huber's M-Estimatora
Tukey 's Biweightb
Hampel's M-Estimatorc
Andrews' Wav ed
The weighting constant is 1.339.
Trang 37Extreme Val ues
Male
Current Salary
Case Number Value
Tests of Normal ity
Gender Female Male Current Salary
Stat istic df Sig Stat istic df Sig Kolmogorov -Smirnova Shapiro-Wilk
Lillief ors Signif icance Correction
a
Trang 39Each leaf: 1 case(s)
Current Salary Stem-and-Leaf Plot for
Each leaf: 2 case(s)
& denotes fractional leaves
Trang 40Normal Q-Q Plots
60,000 50,000
40,000 30,000
20,000 10,000
Normal Q-Q Plot of Current Salary
for gender= Female
120,000 90,000
60,000 30,000
Normal Q-Q Plot of Current Salary
for gender= Male