(Tiểu luận) midterm reportsampling distribution estimation

NATIONAL ECONOMICS UNIVERSITY ADVANCED EDUCATION PROGRAM -*** - MIDTERM REPORT SAMPLING DISTRIBUTION & ESTIMATION Instructor: Assoc Prof., Tran Thi Bich Subject: Business Statistics Class: Advanced Finance 63D Group: Group Members: Đinh Xuân Anh (Leader) – 11210327 Vũ Lê Minh – 11213981 Trần Thu Phương – 11216961 Lê Kim Ngân – 11214201 Nguyễn Tấn Dũng - 11211502 Hoàng Bảo Ngọc Diệp – 11211304 Hanoi, May 2023 I Table of Contents I Table of Contents II Part 1: Main Article .2 Purpose of the study .2 Dataset Methodology Analysis a The meaning of the results .3 b Reason why the method was recommended to be used c Additional Sampling distribution and Estimation possible for this article .4 III Part 2: Additional Article .5 Article 1: Estimation of incubation period distribution of COVID-19 using disease onset forward time: A novel cross-sectional and forward follow-up study a Sampling distribution .5 b Estimation Article #2: Posttraumatic Stress Disorder and Functioning and Quality of Life Outcomes in a Nationally Representative Sample of Male Vietnam Veterans (1997) a b IV Esstimation .8 Part 3: Application General information .9 Sampling method + sampling distribution 10 Estimation 11 V Sampling distribution .8 Implication 13 REFERENCES .14 II Part 1: Main Article Artical analysis “Vendor rating in purchasing scenario: a confidence interval appoarch” https://sci-hub.ru/https://doi.org/10.1108/01443570110404736 Purpose of the study The selected article provides information about the process of evaluation and assessment of suppliers based on various criteria to make informed decisions regarding the selection and management of suppliers in a supply chain The aim of vendor rating is to identify and rate vendors based on their performance, capabilities, and suitability for the organization's procurement needs The specific issue of interest in the research paper titled "Vendor Rating in Purchasing Scenario: A Confidence Interval Approach" is to propose a methodology that addresses the drawbacks associated with the vendor rating process One of the challenges mentioned is the presence of bias in the estimation process To overcome this issue, the paper suggests utilizing a confidence interval approach for estimating the rating This approach involves considering the opinions and inputs of a group of decision-makers rather than relying solely on the judgment of a single decision-maker By incorporating a confidence interval approach, the proposed methodology aims to provide a more robust and reliable vendor rating process that reduces the impact of individual biases and provides a more accurate assessment of vendors This approach allows for a comprehensive evaluation of suppliers and helps in making well-informed decisions regarding supplier selection and management within the purchasing scenario Dataset  Various individuals have been drawn from different functions within an organization such as materials, production, maintenance, purchasing, quality control and a few representatives from the shop floor  The members have been selected based on their experience, knowledge of the company, etc  The functional heterogeneity in such multifunctional teams is potentially an asset because new knowledge from the overall range of departments is brought into the process in its early phases, much of the cost and quality of the final product is determined Methodology a Sampling method and sampling distribution: stratified random sampling method is recommended for selecting vendors for rating b Estimation method: The estimation method used in the study was a confidence interval estimation approach In the formula for calculating the upper and lower confidence limit, the following assumptions are made:  The data (rating) is from the normal distribution, and individuals are drawn from different functions, such as materials, purchasing, production, quality control department, etc This procedure of selecting the individuals from various functions is similar to stratified random sampling It involves a process of stratification of segregation of the organization into different functions, followed by a random selection of executives from each function with larger groups the central limit theorem is invoked to use the normal distribution  The individual values of the composite rating are independent of each other  Sample size (number of individuals) is adequate to use statistical confidence limits Analysis a The meaning of the results The results showed that the confidence interval approach provides a more comprehensive and accurate assessment of vendor performance The authors found that using statistical methods such as confidence intervals can provide a measure of the precision and reliability of vendor ratings, which is useful for decision-making in the purchasing process The confidence intervals also provide a range of values within which the true rating is likely to lie, which helps in making informed decisions about supplier selection They found that the confidence interval approach is less subjective and provides a more accurate and reliable assessment of vendor performance b Reason why the method was recommended to be used Sampling method Stratified random sampling is used because it ensures that the sample of vendors selected for rating is representative of the population of vendors from which they were drawn This can improve the accuracy of the vendor ratings by reducing the variability of the estimates and increasing the precision of the results Sampling distribution In the case of the vendor rating study, the researchers assumed that the performance scores for each vendor were normally distributed This assumption allowed them to use the properties of the normal distribution to calculate the mean and standard deviation of the performance scores, which are necessary for calculating confidence intervals The use of a normal sampling distribution was appropriate in this case because the performance scores were continuous variables and were assumed to be normally distributed However, it is important to note that the assumption of normality should be tested and validated using statistical methods such as normal probability plots or tests for normality If the data does not follow a normal distribution, alternative sampling distributions and methods may need to be used In this study, interval estimation was used to estimate the performance of each vendor and the level of confidence associated with these estimates Interval estimation involves using a range of values, called a confidence interval, to estimate an unknown parameter with a certain level of confidence In this study, the researchers used a confidence interval approach to estimate the vendor's performance score with a 95% level of confidence Specifically, they calculated a 95% confidence interval for each vendor's performance score, which provided a range of values that was likely to include the true performance score with a 95% level of confidence The width of the confidence interval reflects the amount of uncertainty associated with the estimate, with wider intervals indicating more uncertainty The use of point and interval estimation in this study allowed the researchers to estimate the vendor's performance with a certain level of confidence and to quantify the amount of uncertainty associated with these estimates c Additional Sampling distribution and Estimation possible for this article Sampling distribution Beta Distribution: The beta distribution is a continuous probability distribution that is commonly used in quality control and reliability engineering The beta distribution could be used to model the probability distribution of the performance score for each vendor This would allow for the calculation of confidence intervals for each vendor's performance score, which is the approach proposed by Muralidharan and Anantharaman Estimation:  Bayesian estimation: This method involves using prior knowledge or beliefs about the population parameter to make estimates In the vendor rating study, the researchers could have used Bayesian estimation to incorporate prior knowledge or beliefs about the performance of the vendors For example, if they had worked with the vendors before and had some idea of their performance, they could have used that information to adjust their estimates  Maximum likelihood estimation: This method involves finding the value of the population parameter that maximizes the likelihood of observing the sample data In the vendor rating study, the researchers could have used maximum likelihood estimation to estimate the parameters of a statistical model, such as a linear regression model or a logistic regression model, that relates the vendor's performance to other variables, such as the size of the purchase order or the complexity of the product  Nonparametric estimation: This method involves estimating the population parameter using nonparametric methods that not rely on assumptions about the underlying distribution of the data In the vendor rating study, the researchers could have used nonparametric estimation to estimate the performance of the vendors based on their rankings or ratings, without assuming a specific distribution III Part 2: Additional Article Article 1: Estimation of incubation period distribution of COVID-19 using disease onset forward time: A novel cross-sectional and forward follow-up study a Sampling distribution Weibull Meaning The Weibull distribution is used in the research to estimate the parameters of the incubation period distribution and describe its shape The estimated shape parameter is close to 1, suggesting an approximately exponential distribution However, the Weibull distribution allows for deviations from exponential behavior, providing a flexible model Overall, the use of the Weibull distribution provides a widely used and flexible model for estimating the distribution parameters and shape of the COVID-19 incubation period How they use it The Weibull distribution is a probability model with shape (k) and scale (λ) parameters It's commonly used to model time until an event occurs The study applies the Weibull distribution to COVID-19 incubation period data, estimating k and λ from the data to model the distribution Lognormal & Gamma Meaning Lognormal  The authors compare the Weibull model to a lognormal distribution and find that it also fits well The lognormal distribution is used as an alternative model to estimate the incubation period distribution, showing good performance and viability as an alternative to the Weibull distribution Document continues below Discover more Thống kê from: kinh tế kinh… TKKD Đại học Kinh tế… 999+ documents Go to course Ôn thi- trắc nghiệm 51 thống kế kinh… Thống kê trong… 100% (60) Thống Kê Anh Huy 36 Thống kê kin… 98% (66) Bai Tập Môn Nguyen 38 Lý Thống Ke Bản đầ… Thống kê kin… 100% (11) Giải BVN Buổi 21 22 Chương đến Thống kê kin… 95% (22) Vở-thống-kê ghi chép giảng và… Thống kê kin… 100% (10) He thong cong thuc 19  mon nguyen ly thon… Thống kê kin… 100% (8) Gamma The gamma distribution is used in this research as the primary model to estimate the incubation period distribution of COVID-19 The authors show that the gamma distribution provides a good fit to the data and performs better than other commonly used distributions for this type of analysis How they use it Estimates based on Gamma and lognormal distributions provide similar results, with slightly smaller log likelihoods than the Weibull distribution The average time from leaving Wuhan to symptom onset is 5.30 days, with a median of days and a maximum of 22 days The fitted density function calculated in this research fits the observed forward times well, indicating the model's reasonability and trustworthy results Figure shows the fitted density function in a solid line overlaid on the observed forward times histogram, with the Weibull probability density function for incubation period distribution shown in a dashed line b Estimation In the cohort of COVID-19 cases, they assume that the incubation period is a Weibull random variable; the estimates in the Weibull model can be obtained by maximizing the corresponding likelihood function Definition of maximum likelihood: maximum likelihood works by finding the set of parameter values that maximize the probability of observing the data This is done by constructing a likelihood function, which is a function that measures how well the model fits the data for a given set of parameter values The maximum likelihood estimate of the parameters is then the set of values that maximize the likelihood function Estimate α and λ by maximizing the product of likelihoods, , with respect to α and λ, where vi is the observed forward time of the ith individual and I is the sample size of the studying cohort Article #2: Posttraumatic Stress Disorder and Functioning and Quality of Life Outcomes in a Nationally Representative Sample of Male Vietnam Veterans (1997) a Sampling distribution The study population was stratified by geographical region, and then within each region, veterans were sampled based on their branch of service and military rank Furthermore, the study also used stratified sampling to assess the prevalence of PTSD among the veterans This approach ensured that the prevalence of PTSD was accurately estimated and representative of the larger population of male Vietnam veterans In general, if the stratified sampling method was appropriately implemented, then the sampling distribution of the estimator should follow a normal distribution according to the Central Limit Theorem, if the sample sizes within each stratum are sufficiently large (typically, n ≥ 30) and the strata are approximately normally distributed b Esstimation Weighted point estimation: Weighted point estimation is a statistical method that takes into account the complex survey design, including stratification and sampling weights, to produce nationally representative estimates In the study, the weighted prevalence estimates were calculated for PTSD and other outcomes, such as depression, substance abuse, and quality of life measures These estimates were based on data collected from a representative sample of male Vietnam veterans, with oversampling of certain subgroups, such as minorities and those with high combat exposure Weighted point estimation allowed the researchers to produce accurate prevalence estimates for the study population and to draw meaningful conclusions about the relationship between PTSD and functioning and quality of life outcomes Logistic regression: Logistic regression is a statistical method used to model the probability of a binary outcome (such as the presence or absence of a disorder) as a function of one or more predictor variables (such as other psychiatric disorders) They use logistic regression models to estimate the adjusted odds ratios and 95% confidence intervals for the association between PTSD and each of the six functional outcomes These models adjust for potential confounding variables, including demographic characteristics and comorbid disorders To be more specific, in this table, prevalence, odds ratios, and confidence intervals presented in the table are weighted to be nationally representative estimates for all male Vietnam veterans The odds ratios and their corresponding confidence intervals were estimated using logistic regression models that included demographic characteristics, comorbid disorders, and PTSD as independent variables Odds ratios with confidence intervals that not include 1.0 are considered statistically significant at the alpha level of 0.05 IV Part 3: Application General information In 2016, the city of Manchester, New Hampshire released salary information for its state and city employees This data shed light on the compensation of public employees across various sectors, including aviation, police department, fire department and others The release of this information revealed disparities in compensation between different types of public employees, with some making significantly more than others This information provided important transparency and accountability for public employee compensation, giving citizens a better understanding of how their tax dollars were being allocated It also initiated a larger conversation about the role of government in setting and managing employee compensation Overall, the 2016 salary information for state and city employees in Manchester highlighted the importance of transparency in public sector compensation and the need for ongoing discussions about how to best allocate taxpayer dollars Sampling method + sampling distribution Sampling Method: Our goal is to estimate the mean salary of all employees in the state We will use a simple random sampling method to select a subset of employees from the state In this method, each employee in the state has an equal chance of being selected for the sample, and the selection is made randomly In this case, we are choosing 100 individuals from a population with a salary, name, department, and title dataset to be able to use the central limit theorem This method is preferred when we want to ensure that our sample is representative of the population and when we not have any prior knowledge about the population’s characteristics The process of simple random sampling involves randomly selecting 100 individuals from the population and collecting their salary, name, department, and title data We use Calculatorsoup.com to help us randomly generate 100 unique numbers We then use the number generated and match it with the rows number from the population Excel File https://www.calculatorsoup.com/calculators/statistics/random-number-generator.php Sampling Distribution Sampling distribution refers to the distribution of sample statistics, such as the sample mean or sample standard deviation, obtained from multiple random samples of the same size from the population In an ideal world, we can use the central limit theorem to make inferences about the population mean salary The central limit theorem states that if a sample is large enough (typically, n > 30), the sample mean will be approximately normally distributed, regardless of the distribution of the population from which the sample was drawn However, in most real-life scenarios, it is highly likely that we wouldn’t know the population standard deviation And when the population standard deviation is unknown and is estimated using the sample standard deviation, we use the t-distribution instead of the standard normal distribution The t-distribution considers the additional uncertainty introduced by using the sample standard deviation, which reduces the precision of our estimate of the population mean The t-distribution is similar to the normal distribution, but has heavier tails, meaning that there is a greater probability of extreme values in the distribution 10 Estimation In this case we have a large sample size (100) but don't know the population standard deviation, we can still use the sample standard deviation as an estimate for the population standard deviation In this case, we would use a t-distribution with n-1 degrees of freedom to calculate the confidence interval for the population mean The formula for the confidence interval would be: CI = X ± t*(s/√n) where X is the sample mean, s is the sample standard deviation, n is the sample size, t is the tvalue from the t-distribution with n-1 degrees of freedom, and CI is the confidence interval The t-value can be obtained from a t-table or using statistical software Generally, if the sample size is greater than or equal to 30, the t-distribution approaches the normal distribution and the z-distribution can be used instead 11 We have calculated Sample mean = 63162.334 Sample Standard Deviation = 29118.993 a With 95% confidence interval: The margin of error: = 1.96 * 2911.8993 = 5707.323 Therefore, the confidence interval = Sample mean +/- Margin of error => (57455.011; 68869.657) A confidence interval is a range of values that is likely to contain the true population parameter with a certain level of confidence In this case, the 95% confidence interval for the mean salary of New Hampshire state employees is (57455.011; 68869.657), which means that if 12 we were to repeat the sampling process many times, 95% of the resulting confidence intervals would contain the true population mean salary The same goes for 90% and 99% b With 90% confidence interval: The margin of error: = 1.64 * 2911.8993 = 4775.52 Therefore, the confidence interval = Sample mean +/- Margin of error => (58386.81; 67937.85) c With 99% confidence interval: The margin of error: = 2.57 * 2911.8993 = 7483.58 Therefore, the confidence interval = Sample mean +/- Margin of error => (55678.75; 70645.91) As we can see, the higher the percentage of confidence the wider the interval Implication - - - - - A labor union representing state employees can use this confidence interval to negotiate salaries for their members By knowing the range of salaries that is likely to contain the population’s mean salary, they can negotiate for higher salaries within this range An HR manager responsible for hiring new state employees can use this confidence interval to set salary ranges for open positions By knowing the range of salaries that is likely to contain the population mean, they can set a range that is competitive with other employers in the area while still being within budget constraints A researcher studying the effects of salary on employee satisfaction can use this confidence interval to estimate the mean salary of all state employees and determine the sample size needed for their study By knowing the precision of the estimate, they can ensure that their study is powered appropriately to detect a meaningful effect A non-profit organization advocating for fair wages can use this confidence interval to make a case for increasing the minimum wage for state employees By comparing the sample mean salary to the lower bound of the confidence interval, they can argue that many state employees are being paid less than the mean salary with 95% confidence and that a higher minimum wage is needed to ensure fair compensation An individual state employee can use this confidence interval to assess their own salary in relation to their peers By comparing their own salary to the sample mean and the confidence interval, they can determine whether they are being paid above, at, or below the mean with 95% confidence 13 V REFERENCES Indeed Career Guide (n.d.) What is Sampling Distribution? Indeed Career Advice Retrieved May 6, 2023, from https://www.indeed.com/career-advice/career-development/what-issampling-distribution Indeed Career Guide (n.d.) Types of Estimators: Point and Interval Indeed Career Advice Retrieved May 6, 2023, from https://www.indeed.com/career-advice/finding-a-job/typesof-estimators Muralidharan, C., Anantharaman, N., & Deshmukh, S G (2001) Vendor rating in purchasing scenario: a confidence interval approach International Journal of Operations & Production Management, 21(10), 1305–1326 doi:10.1108/01443570110404736 Qin, J., You, C., Lin, Q., Hu, T., Yu, S., & Zhou, X.-H (2020) Estimation of incubation period distribution of COVID-19 using disease onset forward time: A novel cross-sectional and forward follow-up study Science Advances, 6(33), eabc1202 doi:10.1126/sciadv.abc1202 Zatzick, D F., Marmar, C R., Weiss, D S., Browner, W S., Metzler, T J., Golding, J M., … Wells, K B (1997) Posttraumatic Stress Disorder and Functioning and Quality of Life Outcomes in a Nationally Representative Sample of Male Vietnam Veterans American Journal of Psychiatry, 154(12), 1690–1695 doi:10.1176/ajp.154.12.1690 14

Tiêu đề	Midterm Report Sampling Distribution & Estimation
Tác giả	Đinh Xuân Anh, Vũ Lê Minh, Trần Thu Phương, Lê Kim Ngân, Nguyễn Tấn Dũng, Hoàng Bảo Ngọc Diệp
Người hướng dẫn	Assoc. Prof., Tran Thi Bich
Trường học	National Economics University
Chuyên ngành	Business Statistics
Thể loại	midterm report
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	16
Dung lượng	2,98 MB