1. Trang chủ
  2. » Giáo Dục - Đào Tạo

(Tiểu luận) mid term report topic research and perform basic statistics and data analysis in spss software

47 5 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Research And Perform Basic Statistics And Data Analysis In Spss Software
Tác giả Phan Thị Như Anh, Nguyễn Thu Giang, Đào Hương Giang, Vũ Phương Anh, Hoàng Ngọc Linh
Người hướng dẫn Dr. Nguyễn Đình Đạt
Trường học Foreign Trade University
Chuyên ngành Quantitative Methods
Thể loại Mid-term report
Năm xuất bản 2023
Thành phố Hanoi
Định dạng
Số trang 47
Dung lượng 8,14 MB

Cấu trúc

  • 1. Overview of SPSS Software for Data Analysis (4)
  • 2. Descriptive Statistics (5)
  • 3. Testing the Reliability of the Research Scale Using Cronbach's Alpha (18)
    • 3.1. Cronbach’s Alpha reliability testing (18)
    • 3.2. Using an example dataset to demonstrate how to perform in reliability of the Research Scale SPSS (19)
    • 3.3. Results (20)
  • 4. Exploratory factor analysis (21)
    • 4.1. Overview of exploratory factor analysis (EFA) (21)
    • 4.2. Use an example dataset to demonstrate how to perform exploratory factor analysis (EFA) (22)
    • 4.3. Results (24)
  • 5. Correlational analysis (27)
    • 5.1. Overview of correlational analysis (27)
    • 5.2. Use an example dataset to demonstrate how to perform correlation analysis . 25 5.3. Results (27)
  • 6. Regression analysis & Analysis of Variance (ANOVA) (29)
    • 6.1. Overview of regression analysis (29)
    • 6.2. Overview of Analysis of Variance (ANOVA) (30)
    • 6.3. Use an example dataset to demonstrate how to perform Regression analysis & Analysis of Variance in SPSS (31)
    • 6.4. Results (32)
  • 7. Mean comparison of two groups using a t-test (35)
    • 7.1. The purpose of a t-test (35)
    • 7.2. Use an example dataset to demonstrate how to perform a t-test in SPSS (35)
    • 7.3. Results (37)
  • 8. Logistic regression (39)
    • 8.1. The purpose of logistic regression (39)
    • 8.2. Use an example dataset to demonstrate how to perform logistic regression in SPSS (40)
    • 8.3. Results (42)

Nội dung

4 ● Using an example dataset to demonstrate how to perform distribution and central tendencies in SPSS .... Use an example dataset to demonstrate how to perform Regression analysis & An

Overview of SPSS Software for Data Analysis

SPSS (Statistical Package for the Social Sciences) is a widely used software suite for statistical analysis and data management, offering a diverse array of tools and techniques that cater to the needs of researchers, data analysts, and social scientists.

SPSS provides an intuitive graphical interface that enables users to conduct a variety of statistical analyses and create reports without needing extensive programming skills It encompasses both fundamental and advanced statistical methods, such as descriptive statistics, hypothesis testing, regression analysis, factor analysis, and cluster analysis This versatility makes SPSS ideal for diverse research areas, including social sciences, market research, healthcare, and business analytics.

6 key features of SPSS include:

- Data Management: SPSS allows users to import, clean, and manipulate data easily It provides tools for data screening, variable transformation, recoding, merging datasets, and handling missing values

SPSS offers a comprehensive suite of statistical procedures for data analysis, enabling users to conduct descriptive statistics, t-tests, ANOVA, chi-square tests, correlation analysis, regression analysis, and various other statistical tests efficiently.

SPSS provides a range of data visualization tools, enabling users to create diverse visual representations such as bar charts, scatter plots, and histograms These features facilitate the effective exploration and presentation of data, enhancing insights through visual analysis.

SPSS offers extensive customizability, enabling users to tailor their analyses and outputs to meet specific research requirements With options to modify variable and value labels, as well as the ability to define custom functions and transformations, SPSS ensures that users can effectively adapt the software to their unique analytical needs.

SPSS produces detailed output files featuring statistical results, tables, and charts, which can be exported in various formats like Microsoft Excel or PDF for additional analysis and reporting.

SPSS seamlessly integrates with various software packages, facilitating the import and export of data across multiple formats Additionally, it supports syntax programming, which empowers advanced users to automate their analyses and create reproducible workflows.

Descriptive Statistics

Descriptive statistics is a key branch of statistics that aims to summarize and describe the essential features of a dataset It enables the organization, analysis, and presentation of data in a clear and concise way Common techniques in descriptive statistics focus on central tendency, variability, and distribution of data The primary measures of central tendency include the mean (average), median (middle value), and mode (most frequently occurring value), which offer insights into the typical values around which data points cluster.

● The purpose of distribution and central tendencies

Analyzing distribution and central tendencies in a dataset is essential for understanding its overall characteristics and patterns These statistical measures offer insights into typical values, data spread, and shape, enabling researchers and analysts to effectively summarize and interpret the dataset.

Understanding data distribution is crucial for analyzing how values are spread across various intervals It reveals insights into the frequency of occurrences and the overall shape of the data Key methods for analyzing distribution include histograms, frequency tables, and probability distributions.

Understanding data distribution is crucial for identifying outliers, evaluating skewness or symmetry, and informing subsequent analysis and decision-making Central tendency measures reveal the typical value around which data points cluster, with the mean, median, and mode being the three primary metrics used for this purpose.

The mean, often referred to as the average, is determined by adding all data points together and dividing by the total number of observations This value serves as the arithmetic center of the dataset; however, it is important to note that the mean can be significantly influenced by extreme values or outliers.

The median is the central value in a data set when arranged in ascending order, effectively dividing the data into two equal halves This measure of central tendency is robust, as it remains unaffected by outliers or extreme values.

The mode is the most frequently occurring value in a dataset, representing its peak or most common value Datasets can be classified as unimodal, bimodal, or multimodal based on the number of modes present Analyzing the mode enhances our understanding of central tendencies, indicating where the majority of data points are located It serves as a reference point for comparing individual observations and effectively summarizes data into a single representative value, facilitating comparisons across different groups or datasets.

Phương pháp lượng tài chính

● Using an example dataset to demonstrate how to perform distribution and central tendencies in SPSS

Step 1 : Choose the "Analyze" menu and select Descriptive Statistics, Frequencies option

Step 2 : When the Frequencies tab appears, pick Age, Annual income, Work hours a week and Overall job satisfaction to the Variable(s) column, then choose Statistics

Sóng-2 - Phân tích Sóng - Xuân Quỳnh

3 - vở ghi thuế vở ghi thuếvở ghi thuếvở…

Phương pháp lượng tài… None 3

2 vở ghi thuếvở ghi thuếvở ghi thuếvở…

Phương pháp lượng tài… None 3

Phương pháp lượng tài… None 3

Mkt-cb - Summary Ggz en…

Phương pháp lượng tài… None 43

Phương pháp lượng tài… None67

Step 3 : In Percentile Values, choose Quartiles; in Central Tendency, choose Mean, Median; in Dispersion, choose Std deviation, Variance and in distribution, choose Skewness and Kurtosis

Step 4 : We can also add charts into the output document by opening Charts and selecting Histogram with a normal curve

- The mean (average) age is approximately 39.77 years

- The median (middle value when the ages are ordered) age is 38 years

- The standard deviation is about 14.43, indicating some variation or spread in ages Annual Income:

- Mean annual income is approximately $100,279.67

- The median annual income is $88,800.00

- The standard deviation is approximately $48,294.62, indicating a relatively wide range of income values

- The mean work hours per week are approximately 46.30 hours

- The median work hours per week are 46.00 hours

- The standard deviation is about 9.98, indicating a relatively small amount of variation in work hours

- The mean overall job satisfaction rating is 6.37 on a scale of 1 to 7

- The median overall job satisfaction rating is 6.00

- The standard deviation is approximately 1.53, indicating some variation in job satisfaction ratings b) The skewness and the kurtosis

The skewness is positive (0.488), suggesting a slightly right-skewed distribution, meaning that there may be a few older individuals in the dataset

The kurtosis is negative (-0.661), indicating that the distribution is slightly less peaked (platykurtic) than a normal distribution

The skewness is positive (1.295), suggesting a right-skewed distribution, indicating that there may be some high-income outliers

The kurtosis is positive (1.364), indicating a distribution with slightly heavier tails (leptokurtic) than a normal distribution

The skewness is negative (-0.189), suggesting a slightly left-skewed distribution, meaning that there may be a few individuals working fewer hours

The kurtosis is positive (0.519), indicating a distribution with slightly heavier tails (leptokurtic) than a normal distribution

The skewness is negative (-0.458), suggesting a slightly left-skewed distribution, meaning that there may be a few individuals with lower satisfaction ratings

The kurtosis is positive (0.822), indicating a distribution with slightly heavier tails (leptokurtic) than a normal distribution b Association table

● The purpose of Association table

Association tables, also referred to as contingency tables or cross-tabulations, are utilized in SPSS to analyze the relationship between two categorical variables within a dataset These tables present a cross-tabulation of the variables, showcasing the frequencies or counts for each combination of categories The primary goals of employing association tables in SPSS include examining associations and understanding the interactions between categorical variables.

- Understanding Relationships: By examining the frequencies of different combinations of categories, you can assess whether there is a relationship, dependency, or association between the variables

Analyzing the frequencies of variables across various categories reveals whether specific categories frequently co-occur or exhibit significant deviations from anticipated patterns.

Association tables are essential for evaluating the independence between two categorical variables By utilizing statistical tests like the chi-square test, researchers can analyze these tables to assess whether the observed frequencies differ significantly from the expected frequencies, assuming independence.

- Hypothesis Testing: Association tables are useful for hypothesis testing related to categorical variables They can help evaluate research hypotheses or investigate the significance of an association between variables

● Using an example dataset to demonstrate how to perform Association table in SPSS Step 1: Open the "Analyze" menu, select Descriptive Statistics, choose Crosstabs option

In Step 2, select four variables for the Crosstabs table: Work Environment Satisfaction, Level of Responsibility Satisfaction, Work-Life Balance Satisfaction, and Level of Autonomy Satisfaction as Rows, while placing Overall Job Satisfaction in the Columns After making these selections, click on the Statistics option to generate the results.

Step 3: Choose Chi-square and Correlation in Statistics table

The crosstabulation illustrates the correlation between "Overall Job Satisfaction" and "Work-Life Balance Satisfaction," offering a detailed count of individuals across various categories of these two variables, thereby enabling an analysis of their association.

The table shows counts of individuals' job satisfaction ratings (ranging from 1 to

Based on reported work-life balance satisfaction ratings from 1 to 9, a clear correlation emerges between work-life balance and overall job satisfaction For instance, eight individuals who rated their work-life balance satisfaction as 1 also indicated a job satisfaction rating of 1, illustrating a consistent trend across various rating combinations This relationship can be further analyzed using a Chi-Square Test to assess the statistical significance of the findings.

The chi-square tests, including Pearson Chi-Square, Likelihood Ratio, and Linear-by-Linear Association, are employed to evaluate the relationship between Overall Job Satisfaction and Work-Life Balance Satisfaction.

The Pearson Chi-Square value is 4177.962 with 72 degrees of freedom, indicating a highly significant association (p < 001) This suggests that there is a strong relationship between these two variables in the dataset

The likelihood ratio and linear-by-linear association tests also yield highly significant results, further supporting the association c) Symmetric Measures

The symmetric measures (Pearson's R and Spearman Correlation) provide insight into the strength of the association

Pearson's R of 0.921 demonstrates a very strong positive relationship between work-life balance satisfaction and overall job satisfaction, suggesting that as individuals experience greater satisfaction in their work-life balance, their overall job satisfaction also rises Additionally, Spearman's Correlation of 0.903 further supports this finding, indicating a robust positive correlation between these two variables.

Testing the Reliability of the Research Scale Using Cronbach's Alpha

Cronbach’s Alpha reliability testing

Cronbach’s Alpha coefficient measures the internal consistency, or reliability , of a set of survey items Use this statistic to help determine whether a collection of items consistently measures the same characteristic:

High Cronbach’s alpha values signify consistent response patterns among participants across a series of questions, suggesting that the measurements are reliable and likely assess the same characteristic In contrast, low Cronbach’s alpha values reveal that the items do not consistently measure the same construct, indicating potential issues with reliability.

- Cronbach’s alpha ranges from 0 to 1

A correlation value of zero (0) signifies that there is no relationship between the items, meaning they operate independently; understanding the response to one question does not inform the responses to the other questions Conversely, a correlation value of one (1) indicates a perfect correlation, where knowing the answer to one question gives complete insight into the other items.

Using an example dataset to demonstrate how to perform in reliability of the Research Scale SPSS

Step 1: Open “Scale” then choose the option called “Reliability Analysis…”

Step 2: In this case, Cronbach's Alpha is computed for a set of items related to various aspects of job satisfaction Press the “shift” key and select all 5 items as below then click ‘’OK’’

In this case, Cronbach's Alpha is computed for a set of items related to various aspects of job satisfaction, specifically:

Results

The reliability statistic Cronbach's Alpha measures the internal consistency or reliability of a set of items that are intended to measure a particular construct or concept

The Cronbach's Alpha value obtained is 0.948

Cronbach's Alpha indicates a strong consistency among the five survey items, effectively measuring the underlying construct of job satisfaction.

The strong internal consistency of the five items demonstrates their reliability as measures of job satisfaction Respondents' answers align consistently with their actual levels of job satisfaction, indicating that these items effectively assess the same underlying concept.

A Cronbach's Alpha score of 0.948 for job satisfaction items indicates a highly reliable survey instrument for measuring job satisfaction among respondents With a commonly accepted threshold of 0.7, this result significantly exceeds that standard, showcasing excellent internal consistency among the survey items.

Exploratory factor analysis

Overview of exploratory factor analysis (EFA)

Exploratory factor analysis is a statistical method aimed at condensing data into a smaller set of summary variables while uncovering the underlying theoretical structure of phenomena This technique helps identify the relationships between variables and respondents, providing insights into data patterns Exploratory factor analysis can be conducted using two primary methods.

- R-type factor analysis: When factors are calculated from the correlation matrix, then it is called R-type factor analysis

- Q-type factor analysis: When factors are calculated from the individual respondent, then it is said to be Q-type factor analysis

To perform a successful exploratory factor analysis, it is essential to consider key assumptions, such as the presence of correlations among the factors in the dataset and the concept of common factors or underlying variables that influence these correlations.

Use an example dataset to demonstrate how to perform exploratory factor analysis (EFA)

Step 1: Open “Dimension” then scrolling down to the option called “Factor”

Step 2: Select “Age” then choose “Extraction…”

Step 3: Choose “Principal components” method to analyze

Results

Communalities represent the proportion of variance in each variable that is accounted for by the extracted components (factors)

The "Extraction" column presents the communalities obtained from Principal Component Analysis (PCA), revealing that the variable "Overall Job Satisfaction" has a high communality of 0.937 This means that around 93.7% of the variance in overall job satisfaction is accounted for by the extracted components.

This table provides information about how much of the total variance in the data is explained by each extracted component

The first component accounts for approximately 57.58% of the total variance, while the second component explains about 10.10% Collectively, these components contribute to a cumulative variance percentage, with the first three components together explaining 77.29% of the total variance.

The component matrix displays the loadings of each variable on each of the extracted components Each row represents a variable, and each column represents a component

Positive loadings reflect a strong positive relationship between variables and the component, whereas negative loadings indicate a negative relationship In the first component, variables such as "Overall Job Satisfaction," "Work-Life Balance Satisfaction," "Level of Responsibility Satisfaction," "Level of Autonomy Satisfaction," and "Supervisor Rating" exhibit high positive loadings, highlighting their strong association with this component.

● The PCA has extracted 2 components based on the eigenvalues and variance explained

The first component is closely linked to job satisfaction and its associated factors, highlighting variables such as "Overall Job Satisfaction," "Work-Life Balance Satisfaction," "Level of Responsibility Satisfaction," and "Level of Autonomy Satisfaction," which all exhibit high positive loadings This component accounts for a substantial portion of the variance in these key variables, emphasizing the importance of these factors in overall job satisfaction.

● The second component is primarily related to the variable "Commute Time (Minutes)," as it has a very high loading on this component

In summary, the PCA has effectively revealed the fundamental structures within the data set, with the first component focusing on job satisfaction and its associated factors, while the second component is largely influenced by commute time.

Correlational analysis

Overview of correlational analysis

Correlation in statistics indicates a relationship between different events, and correlation analysis is a key tool used to determine the existence of such links One of its primary advantages is its practical simplicity, making it an accessible method for data analysis.

- Awareness of the behavior between two variables: A correlation helps to identify the absence or presence of a relationship between two variables It tends to be more relevant to everyday life

- A good starting point for research: It proves to be a good starting point when a researcher starts investigating relationships for the first time

- Simple metrics: Research findings are simple to classify The findings can range from -1.00 to 1.00 There can be only three potential broad outcomes of the analysis.

Use an example dataset to demonstrate how to perform correlation analysis 25 5.3 Results

Step 2: Ticking in some criterias as below to evaluate

The Pearson correlation coefficient (r) between "Engagement level" and "Overall job satisfaction" is approximately 0.614

The analysis reveals a statistically significant correlation with a p-value of less than 0.001 (p < 0.001), based on a sample size of 1,220 cases for both variables.

A correlation coefficient of approximately 0.614 indicates a strong positive linear relationship between engagement level and overall job satisfaction This means that higher engagement levels are associated with increased job satisfaction, and this relationship is statistically significant, suggesting it is unlikely to have occurred by random chance.

The research indicates a significant relationship between employee engagement and job satisfaction, revealing that increased engagement correlates with greater job satisfaction This connection highlights the importance of fostering employee engagement to enhance overall well-being and improve organizational performance.

Regression analysis & Analysis of Variance (ANOVA)

Overview of regression analysis

Regression analysis is a statistical technique used to predict a dependent variable based on one or more independent variables Its main goal is to model the relationship between these variables, allowing for accurate predictions and insights into the factors affecting the dependent variable.

Here's a breakdown of regression analysis:

In a regression equation, each predictor variable is linked to a regression coefficient that measures the strength and direction of its relationship with the dependent variable These coefficients reveal the expected change in the dependent variable for a one-unit increase in the predictor variable, while keeping other variables constant.

- Interpreting R-squared: It indicates the proportion of the variance in the dependent variable that is explained by the predictor variables in the model

● R-squared values range from 0 to 1 (0% to 100%) A higher R-squared value indicates that a larger proportion of the variation in the dependent variable is explained by the model

● A low R-squared suggests that the model may not be a good fit for the data, while a high R-squared suggests a strong fit

Significance tests, specifically p-values, play a crucial role in evaluating the statistical significance of regression coefficients Each coefficient is linked to a p-value that reflects the likelihood that its impact on the dependent variable is a result of random variation.

● A small p-value (typically < 0.05) suggests that the coefficient is statistically significant, meaning that there is strong evidence that the predictor variable has a non-zero effect on the dependent variable

A large p-value indicates that the coefficient lacks statistical significance, implying inadequate evidence to assert that the predictor variable influences the outcome Consequently, variables with non-significant coefficients may be considered for removal if they do not significantly enhance the model's predictive ability.

Overview of Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical method employed to assess and compare the means of different groups based on a categorical independent variable This technique identifies statistically significant differences among group means and highlights which groups differ Accurate interpretation and reporting of ANOVA results, including effect size measures, are crucial for deriving meaningful conclusions from the analysis.

Use an example dataset to demonstrate how to perform Regression analysis & Analysis of Variance in SPSS

Step 1: Open “Regression” then choose “Linear”

Step 2: Choose “Dependent” and “Independent” variables to evaluate

Results

The model summary provides an overview of the regression model's performance

R: The multiple correlation coefficient (R) is 0.983 It measures the strength and direction of the linear relationship between the predictors and the dependent variable In this case, it suggests a very strong positive relationship

R Square (R^2) is 0.966, indicating that approximately 96.6% of the variance in overall job satisfaction is explained by the predictor variables This high value reflects a strong relationship between the predictors and job satisfaction.

Adjusted R Square: This adjusts R^2 for the number of predictors in the model It is 0.965, indicating that the model is a good fit even after considering the number of predictors

The Standard Error of the Estimate, approximately 0.285, indicates the average deviation of observed values from predicted values, reflecting the accuracy of the model's predictions.

This table provides information about the overall significance of the regression model

The regression model demonstrates a significant predictive capability for "Overall job satisfaction," evidenced by a sum of squares of 2763.600 and a mean square of 690.900 across 4 degrees of freedom The F-statistic of 8506.917, accompanied by a p-value of less than 0.001, underscores the statistical significance of the model.

Residual: The sum of squares for the residuals (unexplained variance) is 98.678, with

Total: The total sum of squares is 2862.278, with 1219 degrees of freedom c) Coefficients

This table provides information about the coefficients of the predictor variables

Constant: The constant represents the intercept of the regression equation In this case, it's approximately -0.383

B: These are the unstandardized coefficients (regression coefficients) for each predictor They represent the change in the dependent variable (Overall job satisfaction) for a one-unit change in the predictor variable, holding all other predictors constant Std Error: This is the standard error of the coefficients

The standardized coefficients, known as Beta coefficients, reveal the relative importance of each predictor variable, all of which show a positive relationship with job satisfaction The t-values indicate that each predictor is significantly related to job satisfaction, as they are all far from zero Additionally, the p-values for all predictors are less than 001, confirming their high statistical significance.

The following are some conclusions drawn from the analysis of the data:

- The multiple linear regression model is highly significant (p < 001), indicating that the combination of predictor variables significantly predicts "Overall job satisfaction."

- The model explains approximately 96.6% of the variance in job satisfaction

Positive relationships exist between work environment satisfaction, level of responsibility satisfaction, work-life balance satisfaction, and level of autonomy satisfaction and overall job satisfaction As satisfaction in these areas increases, overall job satisfaction is likely to rise as well.

The standardized coefficients reveal the significance of various predictors on job satisfaction, highlighting that "Work-life balance satisfaction" exerts the greatest influence, succeeded by "Level of responsibility satisfaction," "Work environment satisfaction," and "Level of autonomy satisfaction."

Mean comparison of two groups using a t-test

The purpose of a t-test

A t-test is a statistical hypothesis test utilized to assess whether there is a significant difference between the means of two groups or populations It is widely used in research and data analysis to compare means and evaluate if observed differences are due to chance or indicate a genuine effect The main objective of a t-test is to assist researchers in drawing valid conclusions from their data.

- Test Hypothesis: Researchers use t-tests to test hypotheses about the differences between two groups or populations

In experimental research, t-tests are essential for evaluating treatment effects by determining if an intervention has a statistically significant impact Researchers achieve this by comparing the mean outcomes of a treatment group against a control group, allowing them to identify the significance of the treatment's effects.

T-tests enable researchers to draw conclusions about population parameters, such as population means, from sample data By computing a t-statistic and comparing it to a critical value, they can assess the probability of the observed differences occurring by chance.

Use an example dataset to demonstrate how to perform a t-test in SPSS

Steps 1: Analyze Means with Independent-Samples T-test

Step 2: Choose Independent-Samples T-test

Step 3: Define groups of variables

Results

In summary, these results provide information about the average (mean) job satisfaction scores and the variability (standard deviation) of job satisfaction scores for two gender groups (Gender 1 and Gender 0)

In a comparison of overall job satisfaction between genders, Gender 1 reported a mean score of 6.45 with a standard deviation of 1.506, while Gender 0 had a mean score of 6.29 and a standard deviation of 1.555 The independent samples test included Levene's test for equality of variances to assess the differences.

Levene's test is a statistical method employed to evaluate the homogeneity of variances across multiple groups or samples within a dataset Its primary objective is to identify significant differences in variance among the groups being analyzed.

The test result indicates a p-value of 0.500, exceeding the standard significance level of 0.05 This finding suggests that the variances between the two groups are approximately equal, supporting the assumption of equal variances in the Independent Samples Test for Equality of Means.

This section provides the results of the t-test, which assesses whether there is a significant difference in the mean "Overall job satisfaction" scores between the two gender groups

- The t-statistic is 1.795, with 1218 degrees of freedom

- The p-value associated with this test is 0.036 (two-sided)

- The mean difference is approximately 0.073, indicating that Gender 1 (the first group) has a slightly higher mean job satisfaction score than Gender 0 (the second group)

- The 95% confidence interval of the difference spans from approximately -0.015 to 0.329

- The results are similar to the equal variances assumed case, with a t-statistic of 1.795, but with slightly different degrees of freedom

- The p-value is 0.036 (two-sided), and the mean difference and confidence interval are similar d) Independent Samples Effect Sizes

This section provides different effect size estimates to quantify the magnitude of the difference between the two groups Cohen's d, Hedges' correction, and Glass's delta are all effect size measures

In this case, Cohen's d is approximately 1.531, indicating a medium effect size This suggests that there is a moderate difference in "Overall job satisfaction" between the two gender groups.

Logistic regression

The purpose of logistic regression

Logistic regression is a statistical modeling technique designed to predict the probability of a binary outcome based on one or more predictor variables It primarily analyzes the relationship between a dependent binary variable and independent predictors, estimating the likelihood of the binary outcome occurring.

- Binary Classification: Logistic regression is widely used for binary classification problems, where the outcome of interest falls into one of two categories Examples include:

● Predicting whether a customer will buy a product (yes/no)

● Predicting whether a patient has a disease (positive/negative)

● Predicting whether an email is spam or not (spam/ham)

Logistic regression differs from linear regression by predicting probabilities rather than continuous numeric values It estimates the likelihood of an event occurring, such as success or the presence of an outcome, based on predictor variables The logistic function, characterized by an S-shaped curve, is employed to convert linear combinations of these predictors into probabilities that range from 0 to 1.

- Measuring the Strength of Relationships: Logistic regression provides coefficients (log-odds ratios) for each predictor variable, which can be interpreted as the impact

Use an example dataset to demonstrate how to perform logistic regression in SPSS

Step 1: Choose “Binary Logistic” in “Regression” tool

Step 2: Defining variables to model and analyze the relationships

Step 3: Choose suitable options to evaluate the data

Results

We have the table of dependent Variable Encoding with "Unsatisfied" (0) and "Satisfied"

The classification table presents both the observed and predicted values for the dependent variable, revealing an overall accuracy of 54.4% for the model This accuracy rate appears to be relatively low, indicating potential areas for improvement.

The model's constant has a negative coefficient, suggesting that when all other predictors are set to zero, the likelihood of being "Satisfied" diminishes Additionally, the Exp(B) value is below 1, further indicating a reduction in the odds of satisfaction.

In the initial model, several significant variables (p < 001) were excluded, highlighting their potential contributions In Block 1, the method employed is the Enter approach.

The omnibus tests of model coefficients indicate that this model is statistically significant (p < 001), meaning it's better than an empty model

The model fit summary table indicates a high goodness-of-fit, as evidenced by the Cox & Snell R Square and Nagelkerke R Square values nearing 1 However, the estimation process was halted after reaching the maximum number of iterations, preventing the identification of a final solution.

This test checks whether the model's predictions match the observed outcomes A non-significant p-value (close to 1.000) indicates a good fit

Contingency Table for Hosmer and Lemeshow Test:

The table further supports the good fit, with observed and expected values closely aligned

This table shows the observed and predicted values after adding the predictor variables The overall percentage correct is now 100%, suggesting that the model perfectly predicts the outcome

Variables in the Equation (Step 1):

This section shows the coefficients, standard errors, Wald statistics, significance values, and odds ratios for the variables included in the final model In this table, we have statistics that:

● All predictor variables are highly significant (p < 001)

● The constant term in the model is also highly significant (p < 001)

Overall, the logistic regression model seems to have a good fit, as indicated by high goodness-of-fit statistics and classification accuracy

Phương pháp lượng tài chính

Sóng-2 - Phân tích Sóng - Xuân Quỳnh

3 - vở ghi thuế vở ghi thuếvở ghi thuếvở ghi…

Phương pháp lượng tài chính None 3

2 vở ghi thuếvở ghi thuếvở ghi thuếvở ghi…

Phương pháp lượng tài chính None 3

Phương pháp lượng tài chính None3

Ngày đăng: 30/01/2024, 05:23

w