It is most commonly measured with the following: Range: the di erence between the highest and lowest values Standard devia on: average distance from the mean Variance: average of squared
Introduc on
1 Background and the reasons why you choose the topic
Title: Analysis of factors a ec ng telecommunica ons enterprises Vietnam in
2 Objec ves, scope, and meaning of the study
Objec ve: The objec ve of this study is to nd out the factors affec ng the business performance of enterprises in the telecommunica ons sector Vietnam and expand in to the scale busine ac vi es with of ss e cient use of capital
The study in scope: All telecommunica ons companies opera ng in Vietnam listed on HNX, HOSE, and Upcom in the period 2014-2021
Meaning: The meaning of this research paper is to provide recommenda ons for telecommunica ons businesses that they can o er methods help businesses improve and improve business performance or so to capital e ciency in the following business periods help the business operate be er to
Using quan ta ve analysis, nancial ra o comparison, and descrip ve research, we looked into the rela onship between nancial ratios and the nancial performance of Vietnamese telecom rms Vietnam as determined by the regression formula The applica on and interpreta on of statis cs is known as descrip ve analysis Unlike inferen al or induc ve sta s cs, which aim to make conclusions about the popula on the sample is supposed to represent, descrip ve sta s cs focus on summarising a sample This usually means that descrip ve sta s cs are non-parametric sta s cs, in contrast to inferen al sta s cs, which are based on probability theory Even in cases where inferen al sta s cs are u lized to derive meaningful inferences from data analysis, descrip ve sta s cs are typically presented
I will rst employ a variety of sta s cal techniques used in business planning for capacity, inventory, and quality management in my ASME ar cle A er that, I'll measure the variability in inferen al sta s cs, q y g y , y , probability distribu ons and their applica on to business opera ons and procedures, and quality management Subsequently, I use inferen al sta s cs to show how the popula on and sample di er based on various sampling strategies and tac cs I'll be working on one sample T-test and two sample T-tests in it, using the regression technique to measure the rela onship between two variables (from the dataset) provide an explana on of regression and its use in the dataset (Stata), and construct or draw various visual representa ons for the variables in the dataset, such as frequency tables, basic tables, pie charts, and histograms I'll be studying those sec ons in my ASM
II The sta s cal methods used business planning f quality, inventory, and capacity management in or 2.1.Measuring the variability in business processes quality management.or
According to Bhandari, A(2023), variability describes how far apart data points lie from each other and from the center of a distribu on Along with measures of central tendency, measures of variability give you descrip ve sta s cs that summarize your data Variability is also referred to as spread, sca er or dispersion It is most commonly measured with the following:
Range: the di erence between the highest and lowest values
Standard devia on: average distance from the mean
Variance: average of squared distances from the mean
Coe cient Varia on: a sta s cal measure the dispersion data points around the mean of of of
While the central tendency, or average, tells where most of the points lie, variability summarizes how far apart they are This is important because the amount of variability determines how well can generalize results from the sample to the popula on
Low variability is ideal because it means that can be er predict informa on about the popula on based on sample data High variability means that the values are less consistent, soit’s harder make predic ons to Data sets can have the same central tendency but di erent levels of variability or vice versa If know only the central tendency or the variability, you can’t say anything about the other aspect Both of them together give a complete picture your dataset of
According to Frost, A(2017), the range is the most straigh orward measure of variability to calculate and the simplest to understand The range a dataset is of the di erence between the largest and smallest values in that dataset While the range is easy to understand, it is based on only the two most extreme values in the dataset, which makes it very suscep ble to outliers If one of those numbers is unusually high or low, it a ects the en re range even if is atypical it
Formularies: Range = Highest value The lowest value –
Figure 1: Mean, mode, median, range and standard devia on of ROE
For example: In the dataset of Figure 1 above, the dataset of ROE Variable has a range of -8.687789 to 0.5592108, which means ROE has Min= -8.687789 and Max=0.5592108 or that means the total sta s cs of
368 observa ons of ROE show that the enterprise with the lowest ROE index is approximately -868.78% and enterprises have the highest ROE of index is approximately 55.92% the range is 55.92%-(-868,78%) So
Although the range and interquar le range measure the spread of data, both measures take into account only two of the data values We need a measure that would average the total (Σ) distan ce between each of the data values and the mean But for all data sets, this sum will always equal zero because the mean is the center of the data If the data value is less than the mean, the di erence between the data value and the mean would be nega ve (and distance is not nega ve) If each of these di erences is squared, then each observa on (both above and below the mean) contributes to the sum of the squared terms The average of the sum of squared terms is called the variance A measure of variability using all the data is called variance It is based on the discrepancy between the mean ( for a sample, m for a popula on) and each observa on's 𝑥 value (xi) When comparing the variability of two or more variables, the variance is helpful The average of the squared variances between each data value and the mean is the variance (Frost, 2023)
Formular: With respect to variance, the population variance, σ2 , is the sum of the squared differences between each observa on and the popula on mean divided by the popula on size, N
: Ith observa on in the popula on
: Number of observa ons in popula on
The sample variance, s2, is the sum of the squared di erences between each observa on and the sample mean divided by the sample size, n, minus 1:
: Ith observa on in the sample
: Number of observa ons in a sample
I’ll work through an example prac ce in Stata for a sample on my dataset with 368 observa ons in Figure 1 The total sta s cs 368 observa ons of of ROE show that the Variance is 0.2217484 or 22,17%
The standard devia on is the standard typical di erence between each data point and the mean When the or values in a dataset are grouped closer together, you have a smaller standard devia on On the other hand, when the values are spread out more, the standard devia on is larger because the standard distance is greater Conveniently, the standard devia on uses the original units of the data, which makes interpreta on easier Consequently, the standard devia on is the most widely used measure variability The standard devia on is of just the square root of the variance Recall that the variance is in squared units Hence, the square root returns the value to the natural units The symbol for the standard devia on as a population parameter is σ while s represents it as a sample es mate To calculate the standard devia on, calculate the variance as shown above, and then take the square root of it (Frost, 2023)
The sample standard devia on
The popula on standard devia on
𝝈 = √𝝈 𝟐 𝝈: Popula on standard devia on
Example: In the variance sec on, the total statis cs 368 observa ons ROE show that the Variance of of is0.2217484 or 22,17%, from that the standard devia on is the square root of the variance equal to 0.4709017 or 47,09% This is clearly shown in Figure 1 above Standard Devia on is 47.09% shows an unequal distribu on in the observed samples That is, businesses in the telecommunica ons sector are having a disparity in opera ng ability and pro tability
2.1.4.The Coe cient Varia onof
The coe cient of varia on (rela ve standard devia on) is a sta s cal measure of the dispersion of data points around the mean The metric is commonly used to compare the data dispersion between dis nct series of data Unlike the standard devia on that must always considered be in the context of the mean of the data, the coe cient of varia on provides a rela vely simple and quick tool to compare di erent data series
The sta s cal methods used business planning for quality, inventory, and capacity management in
Probability distribu ons and applica on to business opera ons and processes
De ni on: Probability is a quan ta ve representa on of the likelihood of an event happening, expressed on a scale from 0 to 1 A probability close to 0 suggests a low likelihood, signifying that the event is improbable, while a probability near 1 indicates a high likelihood, implying that the event is highly probable or almost certain to occur (Turney, 2023)
Possible Outcomes: Head (H) or Tail (T)
Probability Assignment: Each outcome has a probability of 1/2 So, P(H) = 1/2 and P(T) = 1/2
There are two types of probability distribu on which are used for di erent purposes and various types of data genera on processes
Con nuous Probability Distribu on
Discrete Probability distribu ons are applied for discrete random variables Discrete distribu ons represent data with a countable number outcomes, meaning that of the poten al outcomes can put into be a list and then graphed The list may be nite in nite oung, 2023) or (Y
The required condi ons for a discrete probability func on are:
For example, when determining the probability distribu on of a die with six numbered sides, the list is 1, 2, 3,
4, 5, 6 If you're rolling two dice, the chances rolling two sixes (12) of or two ones (two) are much less than other combina ons; on a graph, you'd see the probabili es of the two represented by the smallest bars on the chart Or in My dataset, is : it
Several discrete probability distribu ons speci ed by formulas are the discrete-uniform, binomial, Poisson, a nd hypergeometric distribu ons We only study Poison distribu on
De ni on: A Poisson-distributed random variable is o en useful in es ma ng the number of occurrences over a speci ed interval of me or space It is a discrete random variable that may assume an in nite sequence of values (x = 0, 1, 2, )
We can use the Poisson distribu on to determine the probability of each of these random variables, which are characterized as the number of occurrences or successes of a certain event in a given con nuous interval (such as me, surface area, or length) A Poisson distribu on is modeled according to certain assump ons The Poisson distribu on is a discrete probability distribu on signifying the average likelihood of successful event occurrences within a given me frame In its rst condi on, the probability of two variables being iden cal within the same me interval is considered In the second condi on, the presence or absence of one variable is en rely independent of the other (Poisson_dist, 2008) The core trait of the Poisson distribu on revolves around the frequency of occurrences
X= the number of occurrences in an interval
(fx) = the probability of x occurrences in an interval μ = mean number of occurrences in an interval e= 2.71828 x! = x(x-1)(x 2) (2)(1) –
Applica ons of Poison in prac ce:
For example: HYN Bag Store
Customers visit the HYN Bag store at an average rate of every six hours on weekend evenings What is the probability that 5 people will arrive in 30 minutes on a weekend evening? Use the probability func on with 6/hour = 3/half hour and x = 5 μF(x)= = 0.916082
2.2.2.Con nuous probability distribu ons
De ni on: A con nuous probability distribu on is a probability distribu on for con nuous random variables When it comes to con nuous random variables, we cannot determine the probability for a par cular value Therefore, to determine probability, we determine the probability of occurrence of a certain range of values (amsi.org.au, n.d.)
For example: Height of Individuals:
Random Variable: The height of a randomly selected individual from a popula on
Interval: The height can take any real value within a con nuous range, such as [4 feet, 7 inches] to [7 feet, 2 inches]
In nite Values: There are infinitely many possible heights within the given interval, as height is a continuous measurement
Examples of con nuous probability distribu on, probability distribu ons can be con nuous or discrete On the other hand, a person’s height or blood pressure levels can take any value in a continuum of outcomes, so in this case, data are said to follow a con nuous probability distribu on
A normal distribu on, also known as a Gaussian distribu on or bell curve, is a sta s cal distribu on that is symmetric and bell-shaped It is characterized by the probability density func on, which describes the likelihood of observing a par cular value within the distribu on The shape of the curve is determined by two parameters: the mean (μ), which rep resents the center of the distribution, and the standard deviation (σ), which measures the spread or dispersion of the values
For instance, the heights of people within a popula on tend to follow a normal distribu on Most people fall near the average height, with fewer individuals at signi cantly shorter or taller heights IQ scores, measurement errors in scien c experiments, and test scores of a large popula on
The normal distribu on is a hypothe cal symmetrical distribu on used to make comparisons among scores or to make other kinds of sta s cal decisions The shape of this distribu on is o en referred to as "bell-shaped" or colloquially called the "bell curve”
Figure 4 Bell-shaped distribu on :
The bell is tall and narrow for small standard devia ons and short, and wide for large standard devia ons In which, the func on F(x) is popular to determine the area of this graph
The normal distribu on closely approximates the probability distribu ons of a wide range of random variables For example, the dimensions of parts and the weights of food packages often follow a normal distribu on
The normal probability distribu on represents a large family of distribu ons, each with a unique speci ca on for the parameters μ and σ These parameters have a very convenient interpretation
The density func on of a normal probability distribu on is bell-shaped and symmetrical about the mean The normal probability distribu on was introduced by the French mathema cian Abraham de Moivre in 1733 He used it to approximate probabili es associated with binomial random variables when n is large
The probability density func on for a normally distributed random variable is:
Normal Probability Density Func on:
For those who buy products from HYN Bag Store, the average purchase amount per customer is 15,015 USD Let's say the standard devia on is $3540
• a.One what is the probability that a customer has a purchase amount greater than $18,000?
• b What is the probability that a customer's purchase amount is less than $10,000? μ = 15,015 σ = 3540
Figure 5 Example of Empirical rules :
The empirical rule, also some mes called the three-sigma or 68- -99.7 rule, is a sta s cal rule which states 95 that for normally distributed data, almost all observed data will fall within three standard devia ons (denoted by t he Greek letter sigma, or σ) of the mean or average (represented by the Greek le er mu, or à) of the data
Sampling Distribu on
The probability density func on of the sampling distribu on of means is normally distributed regardless of the underlying distribu on of the popula on observa ons and the standard devia on of the sampling distribu on decreases as the size of the samples that were used to calculate the means for the sampling distribu on increases
Let denote the sample mean computed from a random n measurements from a popula on having a mean, xˉ of μand finite standard devia on σ Let xμ ˉ and σ ˉ denote the mean and standard devia on x of the sampling distribu on of xˉ, respec vely Based on repeated random samples size n from the popula on, of we can conclude the following:
3 when n is large, the sampling distribu on of xˉ will be approximately normal (with the approxima on becoming more precise n increases) as
4 When the popula on distribu on is normal, the sampling distribu on of xˉis exactly normal for any sample size n
Foot Locker Store Produc vity Foot Locker uses sales per square foot as a measure of store produc vity Sales are currently running at an annual rate of $406 per square foot You have been asked by management to conduct a study of a sample of 16 Foot Locker stores Assume the standard devia on in annual sales per square foot for the popula on of all 3400 Foot Locker stores is $80
Show the sampling distribu on of x, the sample means annual sales per square foot for a sample of 64 Foot Locker stores
The sampling distribu on of the sample mean ( ) is approximately normal if the sample size is large enough due xˉ to the Central Limit Theorem The mean of the sampling distribu on (μxˉ) is equal to the popula on mean ( ), μ and the standard devia on of the sampling distribu on (σxˉ) is calculated using the formula:
Where: σ is the population standard deviation, n is the sample size
In this case: σ = 80$ (population standard deviation) n = 16 (sample size) σxˉ =
So, the standard devia on of the sampling distribu on (σxˉ) is 20$
Inferen al Sta s cs
Sampling is the methodical process of selec ng a subset, or sample, from a larger popula on to collect data that can be used to answer a research ques on about the en re popula on The sample is chosen because it is o en imprac cal or impossible to collect data from the en re popula on The results obtained from the sample are considered es mates, or approxima ons, of the actual values of popula on characteris cs Proper sampling methods are crucial in obtaining reliable es mates, ensuring that the sample is representa ve and can provide accurate insights into the broader popula on
The sample data helps us to make an estimate of a popula on parameter There are 2 types of es ma on for popula on parameters:
Point Es mate: we use the data from the sample to compute a value of a sample sta s c that serves as an es mate of a popula on parameter
Interval Es mate: is a range of values
Point es ma on is a sta s cal inference technique where data from a sample is u lized to calculate a speci c value for a sample sta s c This calculated value then serves as an es mate or approxima on of a corresponding popula on parameter The objec ve of point es ma on is to provide a single, best guess for the true value of the parameter based on the informa on obtained from the sample It is a fundamental aspect of sta s cal analysis that enables researchers and analysts to make predic ons and draw conclusions about a popula on using informa on gathered from a representa ve subset
We refer to xˉ as the point es mator the popula on mean of
S is the point es mator the popula on standard devia on of is the point es mator the popula on propor on p of
, p Example Point Es mate of Propor on:
A survey was conducted using a sample of 300 teacher trainees in a training school to determine what propor on of them view the services provided to them favorably Out of 150 trainees, 103 of them responded that they viewed the services provided to them by the school as favorable Find the point es ma on for this data
The point es ma on here will be of the popula on propor on The characteris c of interest is the teacher trainees having a favorable view about the services provided to them So, all trainees with a favorable view are successful, x3 n 0 that means p = = = 0.686
The researchers of this survey can establish the point es mate, which is the sample proportion to be 0.686 or 68.7%
Normal distribu on plays a crucial role in construc ng con dence intervals for popula on parameters, such as the population mean (μ) The steps involved are as follows:
Assump on of Normality: When the sample size is sufficiently large (usually n ≥ 30), the distribution of sample means tends to follow a normal distribu on due to the Central Limit Theorem alculation of the onfidence Interval: Given a sample mean ( ), population standard deviation (σ), sample size (n), and desired con dence level (o en denoted as 1 - α, where α is the significance level), you can use the Z- score for the standard normal distribu on to calculate the margin of error
Interval Estimate of a Population Mean: σ Known:
1 – : the con dence coef cient
𝟐 : the z-value providing an area of in the upper tail of the standard normal distribu on
𝝈: the popula on standard devia on
Discount Sounds has 260 retail outlets throughout the United States The rm is evalua ng a poten al loca on for a new outlet, based in part on the mean annual income of the individuals in the marke ng area of the new loca on
A sample of size = 36 was taken; the sample mean income is $41,100 The popula on is not believed to be n highly skewed The popula on standard devia on is es mated to be $4,500, and the con dence coe cient to be used in the interval es mate is 0.95
Ques on: es mate the popula on mean n 6
Hypothesis tes ng in sta s cs is a way for you to test the results of a survey or experiment to see if you have meaningful results You’re basically testing whether your results are valid by figuring out the odds that your results have happened by chance If your results may have happened by chance, the e periment won’t be repeatable and so has li le use
The alternate hypothesis, denoted by Ha, is the opposite of what is stated in the null hypothesis; the null hypothesis, donated by Ho, is a tenta ve assump on about a popula on parameter
Hypothesis tes ng can be used to determine whether a statement about the value of a popula on parameter should or should not be rejected
The null hypothesis, denoted by Ho, is a tenta ve assump on about a popula on parameter The alterna ve hypothesis, denoted by Ha, is the opposite of what is stated in the null hypothesis The hypothesis tes ng procedure uses data from a sample to test the two competing statements indicated by Ho and Ha
P – value ≤ 0: Reject Ho, accept Ha
P - value > 0: Accept Ho, reject Ha
Steps of hypothesis tes ng:
Step 1: Develop the null and alterna ve hypotheses
Step 2: Specify the level of significance α
Step 3: collect the sample data and compute the value of the test sta s c
Step 4: Use the value of the test sta s c to compute the p-value
Step 5: Reject Ho if p-value ≤ α
For example: Hypothesis tes ng σ Known
A major west coast city provides one of the most comprehensive emergency medical services in the world Opera ng in a mul ple hospital system with approximately 20 mobile medical units, the service goal is to respond to medical emergencies with a mean me of 12 minutes or less
The response mes for a random sample of 40 medical emergencies were tabulated The sample mean is 13.25 minutes The popula on standard devia on is believed to be 3.2 minutes
The EMS director wants to perform a hypothesis test, with a 05 level of signi cance, to determine whether the service goal of 12 minutes or less is being achieved
X= me to respond to medical emergencies
Ha: μ > 12 (did not reach the target)
=2,470 For Z = 2,47, the cumula ve probability is 0,9932
Inferen al sta s cs illustra ng the di erences between popula on and sample based on di erent
A t test is a sta s cal test that is used to compare the means of two groups It is often used in hypothesis tes ng to determine whether a process or treatment actually has an e ect on the popula on of interest, or whether two groups are di erent from one another (Bevans, 2023)
Want to know whether the mean petal length of iris owers di ers according to their species Find two di erent species of irises growing in a garden and measure 25 petals of each species You can test the di erence between these two groups using a t test and null and altera ve hypotheses
The null hypothesis (H0) is that the true di erence between these group means is zero
The alternate hypothesis (Ha) is that the true difference is di erent from zero
For more than 100 degrees freedom, the standard normal z value provides a good approxima on of to thet value The standard normal z values can be found in the infinite degrees (∞) row of the t distribution table
1-α= the con dence coe cient
/2= the t-value providing an area of α/2 in the upper tail of the t distribu on with n-1 degrees freedom of s= the sample standard devia on n= the sample size
2.5.1 One sample T-test: Es ma on and Hypotheses tes ng
A one-sample t-test is used test whether not the mean a popula on is equal to or of to some value One Sample t-test: Formula
A one-sample t-test always uses the following null hypothesis:
H0: = μ μ0 (popula on mean is equal some hypothesized value to μ0)
The alterna ve hypothesis can either two-tailed, le -tailed, be or right-tailed:
H1 (two-tailed): μ ≠ μ0 (popula on mean is not equal to some hypothesized value μ0)
H1 (le -tailed): < (popula on mean is less than some hypothesized value μ μ0 μ0)
H1 (right-tailed): > (popula on mean is greater than some hypothesized value μ μ0 μ0)
The formula calculate to the test sta s c t:
Where: x= sample mean μ0= hypothesized population mean s sample standard devia on n= sample size
If the p-value that corresponds the test sta s c t with to (n-1) degrees freedom is less than your chosen of signi cance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis One Sample t-test: Assump ons
For the results a one sample t-test of to be valid, the following assump ons should met: The be variable under study should either interval ra o variable be an or
The observa ons in the sample should independent be
The variable under study should be approximately normally distributed Can check this assump on by crea ng a histogram and visually checking if the distribu on has roughly a “bell shape.”
The variable under study should have no outliers Can check this assump on by crea ng a boxplot and visually checking for outliers (Zach, 2023)
Example: Suppose want know whether not to or the mean weight a certain species turtle Florida is of of in equal to 310 pounds Since there are thousands of turtles in Florida, would it be extremely me-consuming and costly to around and weigh each individual turtle go
Instead, might take a simple random sample of 40 turtles and use the mean weight of the turtles in this sample es mate to the true popula on mean:
However, it’s virtually guaranteed that the mean weight turtles in this sample will di er from 310 of pounds The ques on is whether not this di erence is sta s cally signi cant or Do a one sample t-test allows answer this ques on to
Step Gather the sample data 1:
Suppose we collect a random sample turtles with the following informa on: Sample of size n = 40
Sample mean weight x = 300 Sample standard devia on s = 18.5 Step 2:
Perform the one sample t-test with the following hypotheses:
H0: = 310 (popula on mean is equal μ to 310 pounds)
H1: 310 (popula on μ ≠ mean is not equal to 310 pounds)
Step Calculate the test sta s c 3: t
Step Calculate the p-value of 4: the test sta s c t.
According to the T Score to P Value Calculator, the p-value associated with t = -3.4817 and degrees of freedom = n-1 = -1 = is 0.00149 40 39
Since this p-value is less than our significance level α = 0.05, we reject the nu ll hypothesis We have su cient evidence to say that the mean weight of this species turtle is not equal of to 310 pounds
Frequently, the population standard deviation (σ) is not known We can estimate the population standard deviation (σ) with the sample standard deviation (s) However, the test statis c will no longer follow the standard normal distribution We must rely on the student’s t-distribu on with n-1 degrees of freedom Because use the sample standard devia on (s), the test sta s c will change from a Z-score to a t-score
Cadmium, a heavy metal, is toxic to animals Mushrooms, however, are able to absorb and accumulate cadmium high concentra ons The government has set safety limits for cadmium dry vegetables at in at 0.5 ppm Biologists believe that the mean level of cadmium in mushrooms growing near strip mines is greater than the recommended limit 0.5 ppm, nega vely impac ng the animals that live in this ecosystem A of random sample of 51 mushrooms gave a sample mean of 0.59 ppm with a sample standard devia on of 0.29 ppm Use a 5% level of significance to test the claim that the mean cadmium level is greater than the acceptable limit of 0.5 ppm
The sample size is greater than 30 so we are assured of a normal distribu on the means of
Step State 1) the null and alterna ve hypotheses
Step State 2) the level signi cance and the cri cal value of
This is a right-sided ques on so alpha is all in the right tail t is found going down the 0.05 column with α by 50 degrees freedom of t α = 1.676
Step Compute the test sta s c 3)
The test sta s c is a t-score
For this problem, the test sta s c is
Compare the test sta s c to the cri cal value.The test sta s c falls in the rejec on zone I will reject the null hypothesis I have enough evidence to support the claim that the mean cadmium level is greater than the acceptable safe limit
2.5.2.Two sample T-test: Es ma on and Hypotheses tes ng
Inferences about the Di erence Between Two Popula on Means: 𝜎 1 𝑎𝑛𝑑𝜎 2 Unknown
𝑊ℎ 𝑛 𝜎1 𝑎𝑛𝑑𝜎2 Unknown ∶ a Use the sample standard devia ons, and , as es mates of and 1 2 𝜎1 𝜎2 b Replace 𝑧 /2 with /2
Where the degrees freedom for are: of /2
A two sample t-test is used to determine whether or not two popula on means are equal
A two-sample t-test always uses the following null hypothesis:
H0: = (the two popula on means μ1 μ2 are equal)
The alterna ve hypothesis can be either two-tailed, le -tailed, or right-tailed: H1 (two-tailed): μ1 ≠ μ2 (the two popula on means are not equal)
H1 (le -tailed): < μ1 μ2 (popula on 1 mean is less than popula on 2 mean) H1 (right-tailed): μ1> μ2 (popula on 1 mean is greater than popula on 2 mean) The formula to calculate the test sta s c t:
Where the degrees freedom for are: of /2
Where and 1 2 2 2 are the sample variances
If the p-value that corresponds to the test sta s c t with(𝑛1 + 𝑛 2 − 1) degrees of freedom is less than your chosen signi cance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis Two Sample t-test: Assump ons
For the results a two sample t-test of to be valid, the following assump ons should met: be
The observa ons in one sample should independent the observa ons in be of the other sample The data should approximately normally distributed be
The two samples should have approximately the same variance
The data in both samples was obtained using a random sampling method (Zach, 2022)
Suppose we want to know whether or not the mean weight between two di erent species of turtles is equal Since there are thousands of turtles in each popula on, it would be too me-consuming and costly to go around and weigh each individual turtle
Instead, we might take a simple random sample of 15 turtles from each popula on and use the mean weight in each sample determine if the mean weight is equal between the two popula ons: to
However, it’s virtually guaranteed that the mean weight between the two samples will be at least a li le di erent The ques on is whether or not this di erence is sta s cally signi cant To test this, will performa two sample t-test signi cance level = 0.05 using the following steps: at α
Step Gather the sample data 1:
Suppose collect a random sample of turtles from each popula on with the following informa on:
Step De ne the hypotheses 2:
Perform the two sample t-test with the following hypotheses:
H0: 1 μ = μ2 (the two popula on means are equal)
H1: μ1 ≠ μ2 (the two popula on means are not equal)
Step Calculate the test sta s c 3: t
First, I will calculate the pooled standard devia on : 𝑝
Next, calculate the test sta s c t:
Step Calculate the p-value of 4: the test sta s c t.
According to the T Score to P Value Calculator, the p-value associated with t = -1.2508 and degrees of freedom = n1+n2-2 = 40+38-2 = is 0.21484 76
Since this p-value is not less than our signi cance level α = 0.05, I fail to reject the null hypothesis We do not have su cient evidence to say that the mean weight of turtles between these two popula ons is di erent.
Create/draw di erent types of visual representa ons for variables in the dataset Explain the advantages
Simple tables
A simple table provides summary informa on on a single characteris c, also called a univariate table
Simple tables present data in a clear and concise manner, making it easy for readers to understand the informa on presented
Simple tables are rela vely easy and quick to create, especially when dealing with small datasets or straigh orward data structures
Simple tables are ideal for presen ng basic summary sta s cs, such as counts, percentages, or averages, for a few variables or categories
When the data is inherently structured in a tabular form, simple tables o er a convenient and familiar way to organize the informa on
Simple tables have limited capacity to handle more complex data rela onships or mul variate analysis They excel at presen ng basic informa on but may not adequately convey more intricate pa erns or insights
Simple tables o en display numeric values without providing addi onal context or interpreta on This can make it challenging for readers to understand the signi cance or implica ons of the data
When dealing with large datasets with numerous variables or categories, simple tables can become overwhelming and di cult to read They may not be the best choice for presen ng extensive or detailed informa on
Here are Simple table extracted from a part in my Dataset:
Frequency tables
According to Reid, M(2018), frequency tables can be useful for describing the number of occurrences of a par cular type of datum within a dataset Frequency tables, also called frequency distribu ons, are one of the most basic tools for displaying descrip ve sta s cs
Figure 7 Simple tables of my data set :
They can present big data sets in a somewhat clear manner and are simple to understand
Frequency tables can be used to compare data between data sets of the same type and to spot clear trends within a data collec on
Frequency tables aren't appropriate for every applica on
They can obscure extreme values (more than X or less than Y), and they do not lend themselves to analyses of the skew and kurtosis of the data
From Figure 8 the data intervals are divided follows: as
Sta s cs show that there are 12 observed samples with nega ve ROE index accounting for the lowest propor on 3.26% the total number observed samples and of of of 365 samples with posi ve performance, accoun ng for 96.74% which of the number observed samples with high ROE performance 12.46% of of or more, accoun ng for 50% of the total observed samples, followed samples with ROE from 5.87%-12.4% by accoun ng for 25% of the total of 368 observed samples The rest is ROE from 0-5.87%, accoun ng for 21.74%.
Histogram
From the frequency tables of Figure 9, I can plot ROE's Histogram follows: as
In the research data, it can be seen that businesses with a high ROE index account for 50%, showing that businesses in the telecommunica ons industry are using their equity e ec vely generate high pro ts to This means that the business is a rac ng the a en on investors and has of the ability grow sustainably in to the future The other half are enterprises with low to medium ROE, in which, low ROE enterprises account for only 12 observed samples, equivalent to 3.26% of the total number of observed samples, showing that the telecommunica ons industry is on the rise in recent years With the characteris c that it is an economic - technical industry associated with science and technology and is one of the industries most a ected by the
Industrial Revolu on 4.0, Vietnam's telecommunica ons industry cannot stay out of the transforma on trend especially when tradi onal telecommunica ons services are already at satura on level.
Bar Charts
Bar charts are pre y easy to interpret, and there's a very clear rela onship between size and value that allows easy comparison
They're simple to make and most people have experience crea ng and understanding them from school They can help in presen ng very large very small values more empha cally or
Bar charts that a empt to represent wide ranges of numbers will struggle to e ciently communicate their message For example, a bar chart for the numbers 5, 6, 10 and 378 will lend extreme visual weight to the
Figure 10: Bar chart of Return On Equity g p , , , g highest value and make the rela ve values of the other measurements appear irrelevant An alterna ve to this would be crea ng an adjus ng scale for the bars, but this complicates the visual aspect the of presenta on and breaks the intui ve sense that size correlates directly to value
Bar graphs tend to locked into a par cular data set, making it hard show mul ple values changes over be to or me unless the chart is modi ed, such as by making the bars layered and three-dimensional.
Pie Charts
A pie chart presents data as a simple and easy- -understand picture to It can be e ec ve communica on tool an for even an uninformed audience, because it represents data visually as a fractional part of a whole Readers or audiences see a data comparison at a glance, enabling them to make an immediate analysis or to understand informa on quickly This type of data visualiza on chart removes the need for readers to examine or measure underlying numbers themselves, so it's a good way of presen ng data that might otherwise appear in a table You can also manipulate pieces of data in the pie circle to emphasize points you want to make
A pie chart becomes less effec ve if it uses too many pieces of data For example, a chart with four slices is easy read; one with more than becomes less so, especially if it contains many similarly sized slices to 10 Adding data labels and numbers may not help here, they themselves as may become crowded and hard to read This kind chart only represents one data set you'd need a series pie charts compare mul ple of – of to sets This may make it more di cult for readers to analyze and assimilate informa on quickly Comparing data slices in a circle also has its problems, because the reader has to factor in angles and compare non- adjacent slices Data manipula on within the chart's design may lead readers draw inaccurate conclusions to or make decisions based on visual impact rather than data analysis (Finch, 2021).to
Sca er plot
From Figure 11 I can draw Sca er plot shows the correla on between SIZE and DAR the following:
Figure 11: Sca er plot shows the correla on between SIZE and DAR
DAR presents how many days the company receives average accounts receivable a scal year Accounts in receivable are listed on the balance sheet as a current asset Accounts receivable refer to the outstanding invoices that a company has or the money that clients owe the company Accounts receivable, receivables, or represent a line of credit extended by a company and normally have terms that require payments due within a rela vely short period It typically ranges from a few days to a scal or calendar year called average collec on period
The average collec on period refers to the amount of me it takes for a business to receive payments owed by its clients in terms of accounts receivable Companies use the average collec on period to make sure they have enough cash on hand to meet their nancial obliga ons The average collection period is an indicator of the effectiveness of a firm’s AR management practices and is an important metric for companies that rely heavily on receivables for their cash ows The average collec on period is accoun ng metric used to an represent the average number of days between a credit sale date and the date when the purchaser remits payment A company’s average collec on period is indica ve the e ec veness its of of AR management prac ces sinesses must able manage their average collec on period to operate smoothly Average Bu be to collec on period is calculated by dividing a company's average accounts receivable balance by its net credit sales for a speci c period, then multiplying the quo ent by 365 days Average Collection Period = 365 Days * (Average Accounts Receivables / Net Credit Sales)
=> To be speci c the ed values line tends to go up This further proves the correla on between these two variables, it means the larger the business, the longer the average collec on period The higher DAR, the greater company’s accounts receivable and vice versa The accounts receivable turnover ratio is an e ciency ratio and is an indicator of a company’s financial and operational performance The higher the DAR, low accounts receivable turnover, the worse the debt management ability That proves, large enterprises are facing gaps in accounts receivable management The more they expand the scale of their produc on and business ac vi es, t more they neglect manage receivables he to
=> Knowledge: The company should keep the stability of the company size and expect to perform debt management properly to op mize increasing the level of Return On Equity ( ROE) and remain stable Simultaneously, Enterprises need to strengthen debt management, monitor debts according to speci c objects and contracts, and periodically evaluate the debt collec on ability of each object to determine policies and Trade credit to suit each customer.
Conclusion
In light of my research's conclusions, I advise companies to take the following into account when making decisions and developing their business plans The rst step is to precisely outline the organiza on's p g p p p y g corporate objec ves and strategic course Next, in order to comprehend client wants and trends, market analysis and compe tor research are required The next stage is to create a thorough business plan that outlines the goals, tasks to be completed, projected costs, and an cipated outcomes It also makes sure that there are enough resources (material, nancial, and human) to carry out the plan Lastly, set up a system for tracking and assessing company performance so that modi ca ons may be made on me.