4 2.1 The definition of basic probability distribution, normal distribution, Poisson distribution, binomial distribution, inference statistics and regression .... With the desire to prop
Introduction
Background and reasons for choosing the topic
In the era of industrial revolution 4.0, e-commerce technology platform is growing strongly, businesses do not necessarily have a strong staff, but instead have the presence and support of modern technology platform The model of e-banking is gradually expanded, in which mobile banking is a service that cannot be ignored But in fact, the process of developing and improving the quality of mobile banking services of MB-Bank branch still has some difficulties and limitations With the desire to propose to Military Commercial Joint Stock Bank with specific and practical solutions to improve service quality and satisfy customers who are using services at MB-Bank, Da Nang branch Therefore,
I have chosen the topic: "Customer satisfaction about the quality of banking services MB-Bank Da Nang branch"
Objectives, scope of research, methodology and report structure
Inference about mathematical statistics is carried out within the framework of probability theory, which is concerned with the analysis of random phenomena In business, business analysis and research the plan is basic As a research analyst for MB-Bank Da Nang branch, I was tasked with understanding the factors affecting customer satisfaction when using services at MB-Bank branch Da Nang and created a plan to improve customer satisfaction for the bank To do this, I conducted a survey on the level of satisfaction with the quality of banking services MB-Bank Da Nang branch in Da Nang city With the scale in Da Nang city, but I only surveyed 50 of them and all are customers who have used the service at this bank Through this survey, I will take raw data and then apply statistical techniques to analyze the factors that make customers satisfied when using services at MB-Bank Da Nang branch And the main purpose of this report is to analyze and evaluate customer satisfaction when using services at MB-bank, Da Nang branch with causal statistics Specifically, the research method used in the study is quantitative This report is divided into 3 main parts: the first part analyzes the theoretical basis of the survey topic and the statistical methods Specifically, analysis of research theories and basic probability distributions, normal distributions, poisson distributions, binomial distributions, inferential statistics and regression The next section analyzes the study designs The final section reviews and analyzes the results of the study From there, I can assess customer satisfaction when using services at MB-Bank Da Nang branch.
Theoretical basis of satisfaction and related variables
The definition of basic probability distribution, normal distribution, Poisson distribution, binomial distribution, inference statistics and regression
A random variable can have any number of alternative values and likelihoods within a particular range, and a probability distribution is a statistical function that captures all of these possibilities This range will be constrained by the minimum and maximum possible values, but the location of the possible value on the probability distribution will rely on a number of other variables These elements include the mean (average), standard deviation, skewness, and kurtosis of the distribution
Several extensively used probability distributions exist, but the normal distribution, also known as the
"bell curve," is arguably the most used Usually, the probability distribution of a phenomenon is determined by the method used to collect the data The probability density function is the name given to this process
Cumulative distribution functions (CDFs), which sum up the probability of occurrences cumulatively and always start at zero and end at 100%, can also be made using probability distributions The probability distribution of a given stock can be calculated by academics, financial analysts, and fund managers to assess the potential expected returns that the stock may provide in the future The analysis will be prone to sampling error since the stock's history of returns, which can be measured over any time period, is probably only made up of a small portion of the stock's returns The size of the sample can be increased, significantly lowering this inaccuracy
Example: Probability distributions are idealized frequency distributions
Imagine that an egg farmer wants to know the probability of an egg from her farm being a certain size The farmer weighs 100 random eggs and describes their frequency distribution using a histogram
Figure 1: Frequency distribution histogram of 100 random eggs (scribbr)
The most typical distribution function for independent, randomly produced data is the normal distribution, commonly known as the Gaussian distribution Every statistical report uses this well- known bell-shaped curve, from survey analysis and quality control to resource allocation The mean, or average, which is the maximum of the graph and about which the graph is always symmetric, and the standard deviation, which indicates the degree of dispersion from the mean, are the two parameters that define the graph of the normal distribution A graph with a small standard deviation (relative to the mean) will be steep, whereas one with a big standard deviation (again relative to the mean) will be flat
The normal distribution is produced by the normal density function, p(x) = e−(x − μ)2/2σ2/σSquare root of√2π In this exponential function e is the constant 2.71828…, is the mean, and σ is the standard deviation The fraction of the region contained inside the function's graph between the supplied values and above the x-axis determines the likelihood that a random variable will fall within any particular range of values Probabilities may be calculated directly from the corresponding area since the denominator (sometimes referred to as the "normalizing coefficient") makes the total area enclosed by the graph precisely equal to unity, i.e., an area of 0.5 corresponds to a probability of 0.5 Even though these areas can be calculated using calculus, tables for the special case of = 0 and = 1 were created in the 19th century and can be used for any normal distribution once the variables are suitably rescaled by subtracting their mean and dividing by their standard deviation, (x − μ)/σ The use of such tables has now almost entirely been replaced by calculators
Figure 2: Percent of Population Between 0 and 0.45 (mathsisfun)
The Poisson distribution is a discrete probability distribution It provides the possibility that an event will occur a specific number of times (k) in a predetermined period or area The mean number of occurrences, denoted by the letter λ "lambda," is the only parameter of the Poisson distribution An example of a Poisson distribution with different values of λ is shown in the graph below The chance of a discrete (i.e., countable) outcome is provided by the Poisson distribution, which is a discrete probability distribution The discrete result for the Poisson distribution is k, which stands for the frequency of an event
A Poisson distribution can be used to forecast or explain how many events will take place over a specific period of time or space The term "events" can refer to anything from the occurrence of a sickness to client purchases to meteor strikes Any defined period of time or area, such as 10 days or
5 square inches, can be used as the interval Independently and at random, individual occurrences take place In other words, the likelihood of one event has no bearing on the likelihood of another Aware of the average amount of occurrences that take place throughout a specific period of time or space It is believed that this quantity, known as (lambda), is constant
Figure 3: Chart examples of Poisson distributions with different values of λ
The binomial distribution is the discrete probability distribution used in probability theory and statistics that only allows for Success or Failure as the possible outcomes of an experiment For instance, if we flip a coin, there are only two conceivable results: heads or tails, and if we take a test, there are only two possible outcomes: pass or fail A binomial probability distribution is another name for this distribution There are two parameters n and p used here in a binomial distribution The variable ‘n’ states the number of times the experiment runs and the variable ‘p’ tells the probability of any one outcome Suppose a die is thrown randomly 10 times, then the probability of getting 2 for anyone throw is ⅙ When you throw the dice 10 times, you have a binomial distribution of n = 10 and p = ⅙ Learn the formula to calculate the two-outcome distribution among multiple experiments along with solved examples here in this article In binomial probability distribution, the number of ‘Success’ in a sequence of n experiments, where each time a question is asked for yes-no, then the boolean- valued outcome is represented either with success/yes/true/one (probability p) or failure/no/false/zero (probability q = 1 − p) A single success/failure test is also called a Bernoulli trial or Bernoulli experiment, and a series of outcomes is called a Bernoulli process For n = 1, i.e., a single experiment, the binomial distribution is a Bernoulli distribution The binomial distribution is the base for the famous binomial test of statistical importance
The binomial distribution formula is for any random variable X
There, n = the number of experiments x = 0, 1, 2, 3, 4, … p = Probability of Success in a single experiment q = Probability of Failure in a single experiment = 1 – p
The binomial distribution formula can also be written in the form of n-Bernoulli trials, where nCx = n! /x! (n-x)!.Hence,
Figure 4: The graph shows that the mean is 10 (expected value) and the chance of getting six heads is on the red left tail (Investopedia)
The technique of analyzing the outcome and drawing conclusions from data with random variation is known as statistical inference Additionally known as inferential statistics Applications of statistical inference include hypothesis testing and confidence intervals Based on a random sample, statistical inference is a technique for determining a population's characteristics Analyzing the correlation between the dependent and independent variables is helpful Estimating uncertainty or sample to sample variation is the goal of statistical inference It enables us to offer a likely range of values for an p g y g item's true values in the population The following elements are included in statistical inference: Sample size, variability in the sample and size of the observed differences
There are different types of statistical inferences that are extensively used for making conclusions They are:
• Chi-square statistics and contingency table
Solutions for statistical inference make effective use of statistical information about populations or trials It covers all characters, as well as the gathering, checking, and analyzing of data, as well as the arranging of the data gathered After beginning their work in a variety of sectors, people might gain knowledge by using statistical inference solutions The following facts about statistical inference are frequent assumptions:
• The observed sample is made up of independent observations from a population type like Poisson or normal
• The parameters of the anticipated model, such as the normal mean or binomial proportion, are evaluated using a statistical inference solution
Regression analysis is a statistical technique used to estimate the equation that best fits the set of observations of the dependent and independent variables It allows the best estimate of the true relationship between the variables to be obtained From this estimation equation one can predict the dependent variable (unknown) based on the given value of the independent variable (known)
• Evaluation of model fit the adjusted Cheap and Cheap coefficients of determination are used to evaluate the fit of the model Leakage will increase when the independent variable is added to the model, so it is safer to use Low Adjustment When assessing the fit of the model, the larger the adjusted Roots, the higher the fit of the model To test whether the model can be inferred for the real population, we must test the model's goodness of fit
• Testing the fit of the model to test the fit of the multiple linear regression model, we use the F value in the ANOVA analysis table If the sig of the F value < the significance level, then we reject the population's Cheap coefficient as 0 and conclude that the model fits the data set and can be generalized to the population The sig value in the Coefficients table for t regression parameters is significant or not (with 95% confidence, Sig 0.05: If the regression coefficient of the variable Xi is statistically significant to be zero, the variable Xi has no effect on the dependent variable and the hypothesis H0 is accepted
The Coefficients table in SPSS is where the data for the t-test are collected from Also take note that, in the absence of a variable type and analysis, we will draw the conclusion that an independent variable has no effect on the dependent variable if it is not statistically significant in the regression findings return to regress (IBM)
An indicator of the degree of multicollinearity in regression analysis is the variance inflation factor (VIF) In a multivariate regression model, multicollinearity occurs when there is a correlation between several independent variables The regression results may suffer as a result The variance inflation factor can therefore be used to calculate the degree to which multicollinearity has inflated the variance of a regression coefficient
When VIF or tolerance is equal to 1 and Ri2 is equal to 0, the ith independent variable is not correlated with the other independent variables, indicating that multicollinearity does not exist
Variables are not connected when the VIF value is 1
VIF between 1 and 5 = moderate correlation between the factors
VIF greater than 5 = heavily connected variables
The likelihood of multicollinearity increases with VIF, necessitating more investigation There is severe multicollinearity that needs to be adjusted when VIF is higher than 10
The CFI's "Variance Inflation Factor." (Investopedia)
A statistical test called a t test is employed to compare the means of two groups It is frequently employed in hypothesis testing to establish whether a procedure or treatment truly affects the population of interest or whether two groups differ from one another
Only when comparing the means of two groups—also known as a pairwise comparison—can a t test be employed Use an ANOVA test or a post-hoc test if you want to compare more than two groups or perform multiple pairwise comparisons
The t test is a parametric test of difference, which means it bases its conclusions on the same premises that other parametric tests do The t test presupposes that your data: are distinct, roughly normally distributed, and exhibit homogeneous variance (also known as independence of variance) within each group being compared Try a nonparametric alternative to the t test, such as the Wilcoxon Signed- Rank test for data with uneven variances, if your results do not support these hypotheses
One-, two-, or paired t tests are available
Use a paired t test if the groups are drawn from the same population (for instance, when assessing before and after an experimental treatment) This design is within-subjects
Use a two-sample t test (also known as an independent t test) if the groups are from two different populations (for example, two different species or humans from two different towns) This design is between-subjects
Use a one-sample t test for comparing one group to a standard value (such as when comparing a liquid's acidity to a neutral pH of 7)
Which t test has two or one tails?
Use a two-tailed t test if the only thing that matters is if the two populations differ from one another Use a one-tailed t test to determine whether one population mean is higher or lower than the other (Scribbr)
Result
Descriptive statistics
The study was conducted with 50 subjects are customers who are using services at MB-Bank and using social networking sites From the above data, I can use it to analyze the data, find out the relationship between the variables and customer satisfaction with service quality
Among 50 survey participants, all survey participants are customers who are using MB-Bank Da Nang branch, of which 28.8% of customers use less than 1 year, 38, 5% of customers use it for more than 1 year to 3 years, 19.2% of customers use it for more than 3 years to 5 years and 9.6% of customers use it for more than 5 years to 7 years
Statistical results on the duration of a transaction is maintained, distributed according to the time of each transaction, 67.3% below 6 months, 19.2% over 6 months to 1 year, over 1 year to 5 7.7% per year and 1.9% over 5 to 10 years, through which we see that the duration of a transaction of MB-Bank is mainly maintained under 6 months However, with time 5 to 10 years traded term still accounts for 1.9% of 100% shows that the bank maintains customer transactions with flexible time
Statistical results on the frequency of customers' transactions with the bank in a month, we found that 19.2% less than 2 times, 28.8% from 2 to 4 times and 5 to 7 times accounting for 48, first% Thereby, it can be seen that the frequency of customers' transactions with MB-Bank accounts for nearly 50%, 5-7 times in 1 month, showing that the use of services at this bank is quite frequent, and the service This bank is pretty good Next, about the statistical results on the reasons for choosing to use the service at MB-Bank as follows Up to 40.4% chose reputable banks, 32.7% chose good service quality, 15.4% chose many customer care policies and 7.7% chose other items This shows that the reasons for choosing to use this bank are all because of the good service quality and the high reputation of the bank Specifically, the results are described in the table below
Figure 7: Pie chart of question 1
Figure 8: Pie chart of question 2
Figure 9: Pie chart of question 3
Figure 10: Pie chart of question 4
Correlation analysis
Correlation analysis was performed between the dependent variable satisfaction when using the service and the independent variables: convenience, tangible, serve style, price competition, interaction and service The results of the correlation analysis are presented in Table 6 below
Table 6: The results of the correlation analysis
The results of the correlation analysis show that the independent variables (convenience, tangible, serve style, price competition and interaction) are all correlated with the dependent variable (satisfaction) at the 1% significance level The independent variable “convenience” has the strongest correlation with the dependent variable “satisfaction” (Pearson correlation = 0.572), followed by the variable “interaction” (Pearson correlation = 0.558), “price competition” (Pearson Correlation 0.517), “serve style” (Pearson Correlation = 0,480), “tangible” (Pearson Correlation = 0,448), “service” (Pearson Correlation = -0.061) This close correlation is expected in the research because it is the close linear relationship between the variables that will explain the influence of the factors in the research model Therefore, all these independent variables can be included in the regression analysis
Table 7: Model summary of the regression
The model regression results show that R = 0.762, that is, 76.2% of the variation of the dependent variable "satisfaction when using the service" is generally explained by the independent variables outside the model Along with that, the Sig test has a small value of 0.000, showing that the research model is suitable for the data set under investigation
Based on the p(F) values in the above table, we can see that all 5 variables serve style, price competition, interaction, intangible and service have an impact on satisfaction when using the service Specifically, for the serve style variable with p(F)= 0.002 and < 0.05, the price competition variable has p(F)= 0.033 and < 0.05, the interaction variable has p(F)= 0.034 and < 0.05, intangible variable has p(F)= 0.053 and