1. Trang chủ
  2. » Tài Chính - Ngân Hàng

SPSS training material

153 219 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 153
Dung lượng 5,38 MB

Nội dung

Opening a Data File: From the menu choose File - Open In the open file dialog box, select the file you want to open, and click open Saving Data Files From the menus choose: File – Sav

Trang 1

Development Pioneers Company for Consultations

Gaza City, Al-Shohada Street, 6th floor Palestine building,

Tel/Fax: 2888781

info@pioneer.ps

Trang 2

Doing Data Analysis

With SPSS

Prepared by:

Dr: Nafez M Barakat

2012 - 2013

Trang 3

Table of Contents

Purpose of Training Program 4

Training Methodology 5

How to run SPSS program 6

Measurement of the variables 10

Displaying Distributions with Graphs 13

One variable Descriptive Statistics 21

Bivariate Correlations 42

Selecting cases 47

Inference for Distributions 51

Inference for Two Population Mean 66

Goodness of Fit Test (A Multinomial Population) 124

Inferential methods in regression and correlation 128

Trang 4

Purpose of Training Program

 Introduction:

Statistical Package for the Social Sciences (SPSS) is a computer programme used for survey authoring and deployment, data mining, text analytics, statistical analysis, and collaboration SPSS is among the most widely used programmes for statistical analysis in social science It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others

 Aim of training program

to perform statistical analysis effectively using (SPSS)

 Course Objectives

At the end of the training participants will able to:

Sciences (SPSS)

sources

analysis and Analysis of Variance for analysis of data to secure outputs

Trang 5

Training Methodology

SPSS training programs will be conducted in a concise, easy-to-understand way that is exciting, rewarding, interactive and job-application oriented It is aimed at improving and developing trainees’ skills through the use of a variety of training methods such as:

Trang 6

How to run SPSS program

Click the left mouse on Start button and select Programs, and from the list choose SPSS 14.0 for Windows Waite a few minutes before the program is ready for use

To Show or Hide a Toolbar:

From the menu choose: View – toolbar

Menu bar

Tool bar

Variables Cases

Cell editor

Row number

Variable name Active cell

Trang 7

Opening a Data File:

From the menu choose File - Open

In the open file dialog box, select the file you want to open, and click open

Saving Data Files

From the menus choose:

File – Save As

Select a file type from the drop-down list (SPSS (*.sav))

Enter a file name for the new data file

Basic Steps for Data Analysis:

Analyzing data with SPSS is easy All you want to do is:

1 Get data into SPSS

2 Select a procedure

3 Select the variables for the analysis

4 Run the procedure and look at the results

Entering Data into the Data Editor:

Many of the features of the data Editor are similar to those found un spreadsheet applications There are, however, several important distinctions:

 Rows are cases: each row represents a case or observation For example, each individual respondent a questionnaire is a case

 Columns are variables: each column represents a variable or characteristic being measured For example, each item on a questionnaire is a variable

 Cells contain values: each cell contains a single value of a variable for a case The cell is the intersection of the case and variable

 The data file is rectangular: the dimension of the data file are determined by the number of cases and variables

Example: if we have some questions in a questionnaire like that:

Gender male female

Job cat clerical custodial manager

Salary $ ………

Enter the data for these questions above in SPSS Data Editor:

At the bottom of the data editor click on the tab Variable View, a different grid

appears, with these column headings:

Trang 8

Under Name enter the variable name gender for the first question, jobcat for the second question, and salary for the third question

Rules apply to variable names:

 the name must begin with a letter, the remaining characters can be any letter, any digit, a period, or the symbols @, #,_ , $

 Variable names can't end with a period

 Blanks and special characters (for example , !, ?, ', and *) cannot be used

 Each variable name must be unique, duplication is not allowed Variable name

are not case sensitive The name gender, GENDER, gender is all identical in

SPSS

 Some preserved word in SPSS not allowed like not, and, or…

Define variable type:

Click on a small gray button marked with three dots in the type column, you will see this dialog box

The available data type are: numeric, coma, dot, scientific notation, date, dollar, custom currency, and string

The custom currency format CCA, CCB, CCC, CCD, and CCE are defined in the Currency tab of the Options dialog box, accessed from the edit menu

We select Numeric with width equal 8 digits and Decimal places 0 digit for the

variables (jobcat and salary), and string for the variable (gender)

You can change the width and decimal places from the columns named by width and column

Define Labels:

Define Label provides descriptive variables and can be up to 250 characters long, and these descriptive labels are display in output We write gender, employment category, and current salary for the three our variables

Trang 9

For the variable jobcat type 1 in the value box and type clerical in the value label box, click add Then type 2 in the value box and type custodial in the value label box, click add, and Then type 3 in the value box and type manager in the value label box, click add

The salary variable is quantitative variable and no value label allowed to it

Define missing values:

Click on a small gray button marked with three dots in the values column, you will see this dialog box

Trang 10

Define missing values defined specified data as user – missing, and that missing values are excluded from the calculations

 You can enter up three discreet (individual) missing values, a range of missing values, or range plus one distinct value

 Ranges can only be specified for numeric variables

 You cannot define missing values for long string variables

Define Column Format:

You can defined the width of the column by clicking the mouse on column named by columns, we choose the column width for the three variables equal to 8, and click the align column and choose center

Measurement of the variables

Click the mouse on the column named by measure, and choose

nominal for the variable gender, order for the variable jobcat, and

scale for the variable salary

You can now click on data view and entering the data like that:

Trang 11

Inserting new cases:

To insert new cases between existing cases:

 Select any cell in the case (row) below the position where uou want to insert the new case

 From the menu choose: data> insert case

A new row is inserted for the case and all variable receive the system- missing value

Inserting new variable:

To insert a new variable between existing variables:

 Select any cell in the variable (column) to the right of the position where you want to insert the new variable

From the menu choose: data> insert variable

A new variable is inserted with the system- missing value for all cases

Moving or remove Variables:

 Click on the variable you want to remove it at the top of the column

 From the menu choose: Edit> Cut

 If you want to move the variable, choose: Edit> Pass

Go To Case:

To Go to Case in the data editor:

 Make the data editor the active window

 From the menu choose : data > Go to Case

 Enter the data row number for the case and click OK

Search for Data:

To find a data value in the data editor

Trang 12

 Select any cell in the column of the variable you want to search

 From the menu choose: edit> find

 Enter the data you want to find

 Click Fined Next

Opening a Data File:

Choose File > Open > data

A dialog box like the one shown below:

Trang 13

Select the appropriate directory for your system and you will see a list of available worksheet files Select the one named employee data, and then click Open

Displaying Distributions with Graphs

The frequency tables and bar charts and pie chars used only for qualitative data or for small data set

The stem-plots, histograms, and time plots will be used for quantitative variables

Frequency Tables:

To create a frequency table for a categorical variable, follow these steps

 Click analyze> Descriptive Statistics> frequencies, the Frequencies dialog box appears

 Click gender from the left rectangle to move it to right rectangle named by variables (s):

Trang 14

 Click on chart button, the chart dialog box appears below, click on bar chart and from the Chart Values click on Frequency, finally click Continue

 We return to frequency dialog box, click OK, the resulting SPSS for Windows output appears

Frequencies

[DataSet1] D : \ Program Files\SPSSEval\Employee data.sav

This table summarizes how many observations we have in the dataset; her there are

474 observations, we have a valid data value, and there is no missing data

Statistics

Gender

474 0

Valid Missing N

Trang 15

In this table 216 or 45.6% of employee are female, and 258 or 54.4% of employee are male, the cumulative percent is the percentage of the current category plus the percent

of the categories above it

Q What is the difference between percent and the valid percent?

Valid

Frequency Percent Valid Percent

Cumulativ e Percent

The graph below show the bar chart for gender, and the height of each bar chart represent the frequency of the employee

Male Female

Trang 16

 Double click on the bar chart in the output-SPSS We get the following chart editor

 choose : element > show data label as illustrated below:

 The dialog box appear as shown below Click on text style, write 14 in preferred size box, and then click apply

Trang 17

 Click file >close to close the chart editor, then the following graph appear below

Male Female

Trang 18

Compare between the salary of female and male using bar charts:

 graph > interactive > Bar the following dialog box appear

 complete the dialog box as shown below, and click OK

Trang 19

Bars show Means

Q from the bar chart above: did the males or female have a better salary? Why?

Another method for comparing the mean of the salary between male and female

Graph > Bar, the following dialog appear

Choose Simple and Summaries for groups of cases, click on the button

marked define:

Trang 20

 Complete the dialog box as shown below and if you are interested in including a title or a footnote on the chart , click Titles and type in the desired information, click continue, return to the original dialog box, click ok

 A new window appears, containing bar charts

Male Female

Trang 21

One variable Descriptive Statistics

The Frequency procedure provides statistics and histogram graph for quantitative variables as the following:

 Analyze > descriptive statistics > frequencies, the following dialog box appear

 Move the current salary to rectangle named by variable(s)

 Click on the button marked by Statistics, the following dialog appear below; complete the dialog box as shown below, click continue to return to the original dialog box

 Percentile values: Values of a salary (quantitative variable) that divide the ordered data into groups so that a certain percentages are above and another

Trang 22

percentage is below Quartiles (the 25th, 50th, and 75th percentiles) divided the observations into four groups of equal size

If you want an equal number of groups other than four, select cut points for n equal groups You can also specify individual percentages (for example , the 77th percentile the value below which 77% of the observations fall)

 Central Tendency Statistics that describe the location of the location of the distribution, you can select the mean, median, and mode, or the sum of all the values

 Dispersion: Statistics that measure the amount of variation or spread in the data You can select the Std deviation (Slandered deviation), variance, range, minimum, maximum, or S.E.mean (standard error of the mean)

 Distribution: Statistical that describe the shape and symmetry of the distribution, you can select skewness or kurtosis These statistics are displayed with their standard errors

 Click on the button marked by charts, then click on histogram and on click on the box with normal curve, click continue to return to the original dialog box

A histogram also has bars, but they are plotted along an equal interval scale The height of each bar is the count of values of quantitative variables falling within the interval The histogram shows the shape, center, and spread of the distribution A normal curve superimposed on a histogram helps you judge whether the data are normally distributed

 Click OK to get the following results:

Trang 23

1 Frequencies

This table shows the following results:

Q1 Is the distribution symmetric, skewed to the right, skewed to the left? Why?

Q2 Find the IQR (Inter Quartile Range = Q3 – Q1 = P75 – P25)

Q3 You prefer to use range or IQR In this example to determine the dispersion of the data, and why?

Statistics

Current Salary

474 0

2.125 112 5.378 224

Trang 24

When z scores are saved, they are added to the data in the data editor and are available for SPSS charts, data listings, and analysis When variables are recorded in different units (for example, salary, education, and experience) a z-score transformation places variables on a common scale for easier visual comparison

Example: Open the employee data.sav., and find the descriptive for the current salary and discuss the results

To obtain descriptive statistics:

*From the menus choose: analyze > descriptive statistics > descriptives ,

The descriptive dialog box appears, click on the current salary to move in the rectangle named by variable (s), and click on the box "save standardized values as variables, as shown below

Trang 25

 Optionally you can click Options for optional statistics and display order, as shown in the descriptive options dialog box below

 Click continue to return to the descriptive dialog box, then click OK, to get these result

Descriptives

[DataSet1] D : \ Program Files\SPSSEval\Employee data.sav

Trang 26

$784.311 112 224

N Range Minimum Maximum Sum Mean Std Dev iat ion Skewness Kurt osis Mean Skewness Kurt osis

Stat istic

Std Error

Current Salary Valid N (list wise)

Note that we drag an icon from column tray into row tray, and drag an icon from the row tray into the column tray to obtain the result shown above

 Frequency tables, percentiles, and other descriptive statistics

 Test for normality, including probability plots and Shapiro-Wilk and Lilliefors tests

 Leven's test for assessing equality of variances

 Robust estimates of location (M-estimators)

Column icon ( salary) Statistics

type

statistics

statistics

Trang 27

Reasons for using the explore procedure:

There are many reasons for using the explore procedure – data screening, outlier identification, description, assumption checking, and characterizing differences among subpopulations (groups of cases) Data screening may show that you have unusual values, extreme values, gaps in data, or other percentiles, exploring the data may indicate that the distribution of the data is normal or not

Example: open the file named employee data.sav

From the menus choose: analyze > descriptive statistics > explore

The following explore dialog box appear, move the salary variable (quantitative variable) to the rectangle named by Dependent list

Click statistics for robust estimator, outliers, percentiles, discriptives, and

95% confidence interval for mean, click continue

Trang 28

 Click plots for histograms, stem-and-leaf, normal probability plots with tests, click continue

 Click OK, to obtain the following results

Explore

[DataSet1] D : \ Program Files\SPSSEval\Employee data.sav

This table shows that we have 474 valid observations, and no missing value present

The table of descriptives shows several statistics

95% confidence interval for mean (lower and upper bound): a confidence interval is arranged used to estimate a population mean

5% trimmed mean: the 5% trimmed sample mean, computed by omitting the highest and lowest 5% of the sample data

We discussed the other statistics in the previous sections

Case Processing Summary

474 100.0% 0 0% 474 100.0% Current Salary

N Percent N Percent N Percent

Cases

Trang 29

Lower Bound Upper Bound

95% Conf idence Interv al f or Mean

5% Trimmed Mean Median

Variance

St d Dev iation Minimum Maximum Range Interquart ile Range Skewness

Kurt osis

Current Salary

St at ist ic St d Error

The table M-Estimators shows alternatives to sample mean for estimating the center

of location The estimators calculated differ in the weights they apply to cases

Huber's M-estimator, Tukey's Biweight, Hampel's M-Estimator, and Andrew's

Wave estimator are displayed

M-Esti mators

$29,434.84 $27,613.71 $28,739.16 $27,599.33 Current Salary

Huber's M-Estimatora

Tukey 's Biweightb

Hampel's M-Estimatorc

Andrews' Wav edThe weighting constant is 1.339.

The table Extreme Values displays the five smallest values and the five largest

values with case labels

Trang 30

Lowest

Current Salary

Case Number Value

The table tests of normality displays normal probability and detrended normal probability plots The Kolmogorov-Smirnov statistic, with a Lilliefors significance level for testing normality is displayed A Shapiro-Wilk statistic calculated for samples with 50 or fewer observations The significance level equal 0.00 < 0.05 which means that the distribution of the data is not normal

Tests of Normality

.208 474 000 771 474 000 Current Salary

St at ist ic df Sig St at ist ic df Sig.

Kolmogorov -Smirnova Shapiro-Wilk

Lillief ors Signif icance Correction

a

We have tow plots, a histogram, and stem-and-leaf, we discussed the histogram plot previously, and now we want to discuss the stem-and-leaf plot:

Trang 31

q how would you describe the shape of tis distribution ? Compare between histogram and stem=and-leaf, what important difference if any, do you see?

Current Salary Stem-and-Leaf Plot

Frequency Stem & Leaf

Trang 32

This plot called normal quartile plot Any data that follow a normal distribution produce a straight line on the normal quartile plot Systematic deviations from a straight line indicate a no normal distribution Outliers appear as points that are far away from the overall pattern of the plot

We note that most points lay far from the straight line, indicating that no normal distribution

120,000 90,000

60,000 30,000

Trang 33

This plot called box-plot graph, which illustrate the minimum value, fist quartile (Q1 =P25), second quartile (Q2 = P50), third quartile (Q3 = P75), maximum value, and extreme values (outliers value) [we must distinguishes between minor outliers and major outliers, Minor outliers denoted by o in the plot are observation more than 1.5 IQR outside the central box Major outliers denoted by * in the plot are observations more than 3*IQR outside the central box

75,000 50,000

25,000 0

Trang 34

We can compare between two groups of data using Explore data as follows:

 Analysis > Descriptive Statistics > Explore

 Move salary variable under Dependent list rectangle, and move gender in Factor List rectangle as shown in the dialog box (Explore)

Minimum value

Q1=P25

Q2=P50=Median

Q3=P75

Maximum value

Minor outliersMajor outliers

Trang 35

 Click statistics and plot button and choose any statistics you want, to get the following results

N Percent N Percent N Percent

Mean

Lower Bound Upper Bound

95% Conf idence Interv al f or Mean

5% Trimmed Mean Median

Variance

St d Dev iation Minimum Maximum Range Interquart ile Range Skewness

Kurt osis Mean

Lower Bound Upper Bound

95% Conf idence Interv al f or Mean 5% Trimmed Mean Median

Variance

St d Dev iation Minimum Maximum Range Interquart ile Range Skewness

Kurt osis

Gender Female

Male Current Salary

St at ist ic St d Error

Trang 36

10 25 50 75 90 95 5 10 25 50 75 90 95

Gender Female

Huber's M-Estimatora

Tukey 's Biweightb

Hampel's M-Estimatorc

Andrews' Wav ed

The weighting constant is 1.339.

Trang 37

Extreme Val ues

Male

Current Salary

Case Number Value

Tests of Normal ity

Gender Female Male Current Salary

Stat istic df Sig Stat istic df Sig Kolmogorov -Smirnova Shapiro-Wilk

Lillief ors Signif icance Correction

a

Trang 39

Each leaf: 1 case(s)

Current Salary Stem-and-Leaf Plot for

Each leaf: 2 case(s)

& denotes fractional leaves

Trang 40

Normal Q-Q Plots

60,000 50,000

40,000 30,000

20,000 10,000

Normal Q-Q Plot of Current Salary

for gender= Female

120,000 90,000

60,000 30,000

Normal Q-Q Plot of Current Salary

for gender= Male

Ngày đăng: 05/04/2014, 01:43

TỪ KHÓA LIÊN QUAN

w