Tin học trong CNTP

8 159 1
Tài liệu đã được kiểm tra trùng lặp
Tin học trong CNTP

Đang tải... (xem toàn văn)

Thông tin tài liệu

Tin học trong CNTP

1TIN HỌC TRONG CNTPNguyNguyễễnnHoHoààngngDDũũngng, , PhD.PhD.TrưTrườờngngĐĐạạiihhọọccBBááchchkhoakhoaTpTp. HCM. HCMNHDzung–Lesson 1, slide 2PHÂN BỐ -DISTRIBUTIONSPhânPhânbbốốchuchuẩẩnn––Normal distributionNormal distribution010203040506070JailPrisonProbationParolePercentageJail9%Prison19%Probation61%Parole11%NHDzung–Lesson 1, slide 3PHÂN BỐ -DISTRIBUTIONSPhânPhânbbốốchuchuẩẩnn––Normal distributionNormal distribution0510152025301018263442505866748290Behaviour problem scoreFrequencyN=289Mean=50Std=10NHDzung–Lesson 1, slide 4PHÂN BỐ -DISTRIBUTIONSPhânPhânbbốốchuchuẩẩnn––Normal distributionNormal distribution0510152025301018263442505866748290FrequencyBehaviour problem scoreNHDzung–Lesson 1, slide 5PHÂN BỐ -DISTRIBUTIONSPhânPhânbbốốchuchuẩẩnn––Normal distributionNormal distribution4 3 2 1 0 1 2 3 400.10.20.30.4f(x) (density).4001eX022.12π44 X()22221)(σµπσ−−=XexfNHDzung–Lesson 1, slide 6PHÂN BỐ -DISTRIBUTIONStandard Normal DistributionStandard Normal Distribution00.10.20.30.4f(x) (density).4001eX022.12π44 X33221100--11--22--3330302020101000--1010--2020--30308080707060605050404030302020ΖΖ::XX--µ:µ:X:X:N(µ,σ2)N(0,1)σµ−=XzN(50,100)N(0,1)1050−=Xz 2NHDzung–Lesson 1, slide 7Implications of the mean and SD““In the Vietnamese population aged 30+ years, the average of In the Vietnamese population aged 30+ years, the average of weight was 55.0 kg, with the SD being 8.2 kgweight was 55.0 kg, with the SD being 8.2 kg ””What does this mean? What does this mean? If the data are If the data are normally normally distributed, this means that the distributed, this means that the probability that probability that an individual randomly selected from the population an individual randomly selected from the population with weight being w kg iswith weight being w kg is::()()−−==222exp21sxwswWeightPπNHDzung–Lesson 1, slide 8Implications of the mean and SDIn our example, In our example, xx= 55, = 55, ss= 8.2 = 8.2 The probability that The probability that an individual randomly selected from the population an individual randomly selected from the population with weight being 40 kg iswith weight being 40 kg is::()()009.02.82.825540exp1416.322.81402=××−−××==WeightP()()040.02.82.825550exp1416.322.81502=××−−××==WeightP()()0004.02.82.825580exp1416.322.81802=××−−××==WeightPNHDzung–Lesson 1, slide 9Implications of the mean and SDThe distribution of weight of the entire population can be The distribution of weight of the entire population can be shown to be:shown to be:0123456222528313437404346495255586164677073767982858892Weight (kg)Percent (%)NHDzung–Lesson 1, slide 10Z-scoresActual measurements can be converted to zActual measurements can be converted to z--scoresscoresA zA z--score is the score is the number of number of SDsSDsfrom the meanfrom the meansxxZ−=A weight = 55 kg A weight = 55 kg ààz=(55z=(55--55)/8.2 = 0 55)/8.2 = 0 SDsSDsA weight = 40 kg A weight = 40 kg ààz=(40z=(40--55)/8.2 = 55)/8.2 = --1.8 1.8 SDsSDsA weight = 80 kg A weight = 80 kg ààz=(80z=(80--55)/8.2 = 3.0 55)/8.2 = 3.0 SDsSDsNHDzung–Lesson 1, slide 11Z-scores = Standard Normal DistributionA zA z--score is score is unitlessunitless, allowing comparison between variables , allowing comparison between variables with different measurementswith different measurementsZZ--scores have mean 0 and variance of 1. scores have mean 0 and variance of 1. ZZ--scores scores ààStandard Normal DistributionStandard Normal DistributionNHDzung–Lesson 1, slide 12Z-scores and area under the curveZZ--scores and weight scores and weight ––another look:another look:0123456-4.0-3.5-3.0-2.6-2.1-1.6-1.1-0.6-0.10.40.91.31.82.32.83.33.84.3Percent (%)Area under the curve for z Area under the curve for z <<--1.96 = 0.0251.96 = 0.025Area under the curve for Area under the curve for --1.0 1.0 <<z z <<1.0 = 0.68281.0 = 0.6828Area under the curve for Area under the curve for --2.0 2.0 <<z z <<2.0 = 0.95442.0 = 0.9544Area under the curve for Area under the curve for --3.0 3.0 <<z z <<3.0 = 0.99723.0 = 0.9972 3NHDzung–Lesson 1, slide 1395% confidence intervalA sample of A sample of n n measurements (measurements (xx11, x, x22, , ……, , xxnn), with mean ), with mean x x and standard deviation and standard deviation ss. . 95% of the individual values of 95% of the individual values of xxiilies between lies between xx--1.96s 1.96s and and x+1.96sx+1.96sMean weight = 55 kg, SD = 8.2 kgMean weight = 55 kg, SD = 8.2 kg95% of individuals95% of individuals’’weight lies between 39 kg and 71 kg. weight lies between 39 kg and 71 kg. NHDzung–Lesson 1, slide 14Cumulative probability (area under the curve) for Z-scores00.10.20.30.40.50.60.70.80.91-4.0-3.5-3.0-2.5-2.0-1.5-1.0-0.50.00.51.01.52.02.53.03.54.0Z-scoresPercent (%).8413.84131.01.0.6915.69150.50.5.5000.500000.3085.3085--0.50.5.9987.9987.9938.9938.9772.9772.9332.9332.1587.1587.0668.0668.0227.0227.006.006.0013.0013ProbProb3.03.02.52.52.02.01.51.5--1.01.0--1.51.5--2.02.0--2.52.5--33Z Z <<NHDzung–Lesson 1, slide 15PHÂN BỐ -DISTRIBUTIONStandard Normal Distribution Standard Normal Distribution ––Using TableUsing Table…………………………………………………………………………………………………………………………………………………………………………0.00800.00800.99200.99200.49200.49202.412.410.02500.02500.97500.975004750047501.961.960.07210.07210.92790.92790.42790.42791.461.460.15620.15620.84380.84380.34380.34381.011.010.07350.07350.92650.92650.42650.42651.451.450.15870.15870.84130.84130.34130.34131.001.000.07490.07490.92510.92510.42510.42511.441.440.16110.16110.83890.83890.33890.33890.990.99……………………………………………………………………………………0.31560.31560.68440.68440.18440.18440.480.480.48800.48800.51200.51200.01200.01200.030.030.31920.31920.68080.68080.18080.18080.470.470.49200.49200.50800.50800.00800.00800.020.020.32280.32280.67720.67720.17720.17720.460.460.49600.49600.50400.50400.00400.00400.010.010.32640.32640.66360.66360.17360.17360.450.450.50000.50000.50000.50000.00000.00000.000.00Smaller Smaller PortionPortionLarger Larger PortionPortionMean Mean to Zto ZZZSmaller Smaller PortionPortionLarger Larger PortionPortionMean Mean to Zto ZZZNHDzung–Lesson 1, slide 1600.10.20.30.4f(x) (density).4001eX022.12π44 X0.1587PHÂN BỐ -DISTRIBUTIONStandard Normal Distribution Standard Normal Distribution ––Using TableUsing Table0.34130.50000.8413NHDzung–Lesson 1, slide 17PHÂN BỐ -DISTRIBUTIONStandard Normal Distribution Standard Normal Distribution ––Using TableUsing Table00.10.20.30.4f(x) (density).4001eX022.12π44 X-2 -1NHDzung–Lesson 1, slide 1800.10.20.30.4f(x) (density).4001eX022.12π44 XPHÂN BỐ -DISTRIBUTIONStandard Normal Distribution Standard Normal Distribution ––Using TableUsing Table95% 4NHDzung–Lesson 1, slide 19Sampling Distributions & Hypothesis TestingLogic is typical of most test of hypothesis:1. We want to test the hypothesis, often called the research hypothesis, that students under stress are more likely than normal students to exhibit threshold problems.2. We obtained a random sample of students under stress.3. We set up the hypothesis (called the null hypothesis, Ho) that the sample was in fact drawn from a population whose mean, denoted µo, equals 50. This is the hypothesis that stressed students do not differ fromnormal students in terms of threshold problems.4. We then obtained the sampling distribution of the mean under theassumption that Ho (the null hypothesis) is true (i.e., we obtained the sampling distribution of the mean from a population with µo=50)5. Given the sampling distribution, we calculated the probability of a mean at least as large as our sample mean.6. On the basis of that probability, we made a decision: to either reject or fail to reject Ho. Because Ho states that µ =50, rejection of Ho represents a belief that µ >50, although the actual value of µ remains unspecified.NHDzung–Lesson 1, slide 20Sampling Distributions & Hypothesis TestingNull hypothesisNull hypothesisVVííddụụ: : cầnchứngtỏgiả thiếtnghiêncứu: “college students do not come from a population with a mean self-confidence score of 100”chúngta đặtngaygiả thiếtkhông: THEY DO !Hoặccầnchứngtỏsựphùhợpcủagiả thiếtnghiêncứurằngcácgiátrị trungbìnhcủatậphợptừđórútrahaimẫulàkhácnhau(µ1≠µ2). Chúngta đặtragiả thiếtkhôngrằnghaigiátrị như nhautứclà µ1-µ2=0Vìsao: 1. Philosophical argument: “WE CAN NEVER PROVE SOMETHING TO BE TRUE, BUT WE CAN PROVE SOMETHING TO BE FALSE”: 3000 two children2. PROVIDE WITH THE STARTING POINT FOR ANY STATISTICAL TEST (101,102 vs100)Statistical conclusionsStatistical conclusionsSample statistic (mean, variance, std,Sample statistic (mean, variance, std,……) ) ––test statistics (test statistics (tt, , FF,,χχ22) ) NHDzung–Lesson 1, slide 21Sampling Distributions & Hypothesis TestingSaiSaillầầmmloloạạiiI&III&IIOneOne--and two tailed testand two tailed test40 60 80 100 120 140 1600.010.020.0250.1 e.12X100202.20.2 π16040 X33221100--11--22--33ΖΖ::XX--µ:µ:H1βCritical valuepp==ββpp=1=1--ααType II errorType II errorCorrect Correct decisiondecisionFalse to False to reject Horeject Hopp=1=1--ββ=Power=Powerpp==ααCorrect Correct decisiondecisionType I Type I errorerrorReject HoReject HoHo FalseHo FalseHo TrueHo TrueDecisionDecisionNHDzung–Lesson 1, slide 22Binomial (Bernoulli) distributionNHDzung–Lesson 1, slide 23Binomial distribution –some facts(x + y)(x + y)22= x= x22+ 2xy + y+ 2xy + y22(x + y)(x + y)33= x= x33+ 3x+ 3x22y + 3xyy + 3xy22+ y+ y33(x + y)(x + y)44= x= x44+ 4x+ 4x33y + 6xy + 6x22yy22+ 4xy+ 4xy33+ y+ y44(x + y)(x + y)55= x= x55+ 5x+ 5x44y + 10xy + 10x33yy22+ 10x+ 10x22yy33+ 5xy+ 5xy4 4 +y+y55……()nnnnnyxnnyxnyxnyxnyx022110 .010++++=+−−()knnnknyxknyx−=∑=+0where()!!!knknkn−=NHDzung–Lesson 1, slide 24A typical experimentDesign: 10 consumers were asked to give scores of Design: 10 consumers were asked to give scores of flavourflavourto products A and B.to products A and B.Results: 8 preferred A, 2 Results: 8 preferred A, 2 proferredproferredB.B.Question: Is there evidence that more people preferred A Question: Is there evidence that more people preferred A than B? than B? 5NHDzung–Lesson 1, slide 25A typical experiment -considerationLet Let aabe the probability that consumers preferred A, be the probability that consumers preferred A, then then bb= 1= 1--aais the probability that consumers is the probability that consumers preferred Bpreferred BUnder the null hypothesis of difference, Under the null hypothesis of difference, a = b = 0.5a = b = 0.5The possibilities are:The possibilities are:()1002819010101010 .210110010bababababa++++=+P(10 prefA)P(9 prefA)P(8 prefA)P(0 prefA)()kkbakAprefkP−=1010__NHDzung–Lesson 1, slide 26A typical experiment -solution0.009770.0097710a10a11bb9999110.000980.00098aa00bb10101010000.0493950.04939545a45a22bb8888220.117190.11719120a120a33bb7777330.205080.20508210a210a44bb6666440.246090.24609252a252a55bb5555550.205080.20508210a210a66bb4444660.117190.11719120a120a77bb3333770.043950.0439545a45a88bb2222880.009770.0097710a10a99bb11990.000980.00098aa1010001010ProbabilityProbabilityUnder the null Under the null hypothesis hypothesis aa= 0.5, = 0.5, bb= 0.5= 0.5Number Number preferred preferred BBNumber Number preferred Apreferred AThe result suggested that a = 0.80, a 19 times difference from the null hypothesis of no difference! NHDzung–Lesson 1, slide 27Binomial and Normal distributions0.000.050.100.150.200.250.30012345678910# preferred AProbabilityProb(8 or more preferred A) = 0.0494 + 0.0098 + 0.00098 = 0.060 NHDzung–Lesson 1, slide 28Mean and variance of a proportionFor an individual For an individual i i consumer, the probability he/she consumer, the probability he/she prefers A is prefers A is ppii. Assuming that all consumers are . Assuming that all consumers are independent, then independent, then ppii= = pp Variance of Variance of ppi i is is varvar(p(pii) ) = = p(1p(1--p)p)For a For a sample of n consumerssample of n consumers, the estimated probability of , the estimated probability of preference for A is: preference for A is: npppppn++++= .321and the variance of and the variance of p_barp_baris: is: ()( )nppp−=1varNHDzung–Lesson 1, slide 29Normal approximation of a binomial distributionFor For an individual an individual i i consumerconsumer, the probability he/she , the probability he/she prefers A is prefers A is ppii. Assuming that all consumers are . Assuming that all consumers are independent, then independent, then ppii= = pp Variance of Variance of ppi i is is varvar(p(pii) ) = = p(1p(1--p)p)For a For a sample of n consumerssample of n consumers, the estimated probability of preference , the estimated probability of preference for A is: for A is: npppppn++++= .321and the variance of and the variance of p_barp_baris: is: ()( )nppp−=1varand standard deviation: and standard deviation: ( )npps−=1NHDzung–Lesson 1, slide 30Normal approximation of a binomial distribution -example10 consumers, 8 preferred product A. 10 consumers, 8 preferred product A. Proportion of preference for A: Proportion of preference for A: p = 0.8p = 0.8Variance: Variance: var(pvar(p) ) = = 0.8(0.2)/10 = 0.0160.8(0.2)/10 = 0.016Standard deviation of Standard deviation of pp: : s = 0.126s = 0.12695% CI of p: 0.8 95% CI of p: 0.8 ++1.96(0.126) = 0.55 to 1.001.96(0.126) = 0.55 to 1.00 6NHDzung–Lesson 1, slide 31Test of significance00.10.20.30.40.5-4.0-3.5-3.0-2.5-2.0-1.5-1.0-0.50.00.51.01.52.02.53.03.54.0Z-scoresPercent (%)Is the observed proportion of 0.8 significantly different from Is the observed proportion of 0.8 significantly different from 0.5 (null hypothesis)? 0.5 (null hypothesis)? Z = (0.8 Z = (0.8 ––0.5) / 0.126 = 2.380.5) / 0.126 = 2.38P(Z>2.38) = 0.0584P(Z>2.38) = 0.05842.38NHDzung–Lesson 1, slide 32SummaryData are described by mean, variance and standard deviationData are described by mean, variance and standard deviationNormal and Binomial distributions are cornerstones of sensory Normal and Binomial distributions are cornerstones of sensory evaluation (oh !)evaluation (oh !)A Binomial distribution can be approximated by the standardized A Binomial distribution can be approximated by the standardized Normal distributionNormal distributionNHDzung–Lesson 1, slide 33Statistics for Pairs of VariablesChi-QuareTestNominal LevelSpearman CorrelationKruskal-Wallis TestOrdinal LevelPearson Correlation or Spearman Corr.Spearman CorrelationANOVAInterval LevelPearson Correlation or Spearman Corr.Pearson Correlation or Spearman Corr.Spearman CorrelationANOVARatio LevelRation LevelInterval LevelOrdinal LevelNominal levelPredictorCriterionNHDzung–Lesson 1, slide 34Distribution of χ2( )()( )[ ]2/)(122222221χχχ−−Γ= ekfkk( )∑∑−===−=2222221222)(σµχχσµiNiiNXzzXz( )1;12222221−=−=−NssNNσχσχNHDzung–Lesson 1, slide 35Categorical data• Nominal level•Classification•A set of objects can be classified into exhaustive, mutually exclusive and unique symbol• Ex: religion, sex, location, etcOrdinal level• Classification + Ordering• A set of numbers can be assigned rank values and nothing more.• Ex: socio-economic status, education, levels of satisfaction, etcNHDzung–Lesson 1, slide 36Analysis of one-variable caseExample: Number of bottles (in 1000) produced by a factory per monthMonth:123456789101112# bottles: 403430443958515536483338Question: was the variability random? 7NHDzung–Lesson 1, slide 37The logic of Chi square analysisAssumption of distribution: Expected values (E)Comparison of expected and observed (O) values( )∑−=EEO22χDecision: if the deviation is large, reject the null hypothesis;if the deviation is small, accept the null hypothesis.NHDzung–Lesson 1, slide 38The logic of Chi square analysisHow “large”or “small”? •Degrees of freedom•Magnitude of errors (type I = alpha level or significance level)Degrees of freedom (difficult concept): the number of independent comparisons that can be made between the members of a sample.•Example: Given three values and a mean, if we know two individual values, we can infer the third value. In this case, the degrees of freedom is 2.NHDzung–Lesson 1, slide 39Critical values of the Chi-sq distributionSignificance levelD.F0.050.0115.026.6327.389.2139.3511.34411.1413.28512.8315.09614.4516.81716.0118.47817.5320.09919.0221.671020.4823.21Significance levelD.F0.050.011121.9224.721223.3426.221324.7427.691426.1229.141527.4930.581628.8532.001730.1933.411831.5334.811932.8536.192034.1737.57NHDzung–Lesson 1, slide 40One-variable case: an exampleConsiderations• There were 506 bottles• If the variability is uniform, we would expected 506/12 = 42 bottles per month.Month:123456789101112Observed403430443958515536483338Expected424242424242424242424242D = O-E-2-8-122-316913-66-9-4NHDzung–Lesson 1, slide 41One-variable case: an exampleMonth:123456789101112Observed403430443958515536483338Expected424242424242424242424242(O-E)2/E0.111.583.510.080.245.951.853.910.900.812.990.41χ2= 0.11 + 1.58 + 3.51 + …+ 0.41 = 21.3 Degree of freedom (DF) = 11 (since 1 has been used to estimate the expected frequencies)Conclusion: reject the null hypothesis.NHDzung–Lesson 1, slide 42Distribution of χ2 - One-way classificationGoodness-of-fit test : good fit: data(observedfrequencies) vstheory (expected frequencies)µ = Npσ2= NpqN = 32, p=9, q= 23Ho : ?AB 8NHDzung–Lesson 1, slide 43Distribution of χ2- One-way classification( ) ( )()125.616162316)169()(2222222212112=−+−=−=−+−=∑ttiiittEEOEEOEEOχχOOii: observed : observed (X, N(X, N--X)X)EEii: expected : expected ((NpNp, , NqNq))NNếếuuχχ22tttt> > χχ22tbtb: : bbááccbbỏỏHoHoNHDzung–Lesson 1, slide 44Distribution of χ2- Multicategorycase88888888ExpectedExpected1515885544ObservedObservedDDCCBBAABeer chosenBeer chosen( )()()()()25.988158888858842222223=−+−+−+−=−=∑EEOχNHDzung–Lesson 1, slide 45TWO CLASSIFICATION VARIABLE CONTENGENCY TABLE ANALYSIS358358100100258258SumSum1811817676(50.559)(50.559)105105(130.441)(130.441)ManyMany1771772424(49.441)(49.441)153153(127.559)(127.559)LittleLittleSumSumExptExptReRe--traittraitFaultFaultDecisionDecisionNHDzung–Lesson 1, slide 46TWO CLASSIFICATION VARIABLE CONTENGENCY TABLE ANALYSIS976976499499477477ExptExpt17721772675675121121SumSum8698692902908080ManyMany9039033853854141LittleLittleSumSumReRe--TraitTraitOutOutFaultFaultDecisionDecisionNHDzung–Lesson 1, slide 47Chi-Square Test of IndependenceExample (from Statistical Methods for Psychology, David C. Howell)You are a university administrator preparing to purchase a largenumber of new personal computers for three of the schools that constitute your university: the School of Arts and Science, the School of Education, and the School of Business. For a given School, you may purchase either IBM compatible computers or Macintosh computer, and you need to know which type of computer the students within each school tend to prefer. In general terms, your research question is”Is there a relationship between the following two variables: a) school of enrolment, and b) computer preference?”Questionnaire:In which school are you enrolled? (circle one)a. School of Arts and Scienceb. School of Businessc. School of EducationWhich type of computer do you prefer that we purchase for your school ?a. IBM compatibleb. MacintoshNHDzung–Lesson 1, slide 48Chi-Square Test of IndependenceN=120N=120N=40N=40N=60N=60MacintoshMacintoshN=20N=20N=100N=100N=30N=30IBM IBM compatiblecompatibleEducationEducationBusinessBusinessArt & Art & ScienceScienceSchool of School of EnrollmentEnrollmentComputer Preference . 1TIN HỌC TRONG CNTPNguyNguyễễnnHoHoààngngDDũũngng, , PhD.PhD.TrưTrườờngngĐĐạạiihhọọccBBááchchkhoakhoaTpTp.. Table95% 4NHDzung–Lesson 1, slide 19Sampling Distributions & Hypothesis TestingLogic is typical of most test of hypothesis:1. We want to test the hypothesis,

Ngày đăng: 24/01/2013, 16:27

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan