Descriptive Statistics

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	2
Dung lượng	62,55 KB

Nội dung

11CHAPTER2Statistics, Probability and NoiseStatistics and probability are used in Digital Signal Processing to characterize signals and theprocesses that generate them. For example, a primary use of DSP is to reduce interference, noise,and other undesirable components in acquired data. These may be an inherent part of the signalbeing measured, arise from imperfections in the data acquisition system, or be introduced as anunavoidable byproduct of some DSP operation. Statistics and probability allow these disruptivefeatures to be measured and classified, the first step in developing strategies to remove theoffending components. This chapter introduces the most important concepts in statistics andprobability, with emphasis on how they apply to acquired signals. Signal and Graph TerminologyA signal is a description of how one parameter is related to another parameter.For example, the most common type of signal in analog electronics is a voltagethat varies with time. Since both parameters can assume a continuous rangeof values, we will call this a continuous signal. In comparison, passing thissignal through an analog-to-digital converter forces each of the two parametersto be quantized. For instance, imagine the conversion being done with 12 bitsat a sampling rate of 1000 samples per second. The voltage is curtailed to 4096(212) possible binary levels, and the time is only defined at one millisecondincrements. Signals formed from parameters that are quantized in this mannerare said to be discrete signals or digitized signals. For the most part,continuous signals exist in nature, while discrete signals exist inside computers(although you can find exceptions to both cases). It is also possible to havesignals where one parameter is continuous and the other is discrete. Sincethese mixed signals are quite uncommon, they do not have special names givento them, and the nature of the two parameters must be explicitly stated.Figure 2-1 shows two discrete signals, such as might be acquired with adigital data acquisition system. The vertical axis may represent voltage, light The Scientist and Engineer's Guide to Digital Signal Processing12intensity, sound pressure, or an infinite number of other parameters. Since wedon't know what it represents in this particular case, we will give it the genericlabel: amplitude. This parameter is also called several other names: the y-axis, the dependent variable, the range, and the ordinate. The horizontal axis represents the other parameter of the signal, going bysuch names as: the x-axis, the independent variable, the domain, and theabscissa. Time is the most common parameter to appear on the horizontal axisof acquired signals; however, other parameters are used in specific applications.For example, a geophysicist might acquire measurements of rock density atequally spaced distances along the surface of the earth. To keep thingsgeneral, we will simply label the horizontal axis: sample number. If thiswere a continuous signal, another label would have to be used, such as: time,distance, x, etc. The two parameters that form a signal are generally not interchangeable. Theparameter on the y-axis (the dependent variable) is said to be a function of theparameter on the x-axis (the independent variable). In other words, theindependent variable describes how or when each sample is taken, while thedependent variable is the actual measurement. Given a specific value on thex-axis, we can always find the corresponding value on the y-axis, but usuallynot the other way around.Pay particular attention to the word: domain, a very widely used term in DSP.For instance, a signal that uses time as the independent variable (i.e., theparameter on the horizontal axis), is said to be in the time domain. Anothercommon signal in DSP uses frequency as the independent variable, resulting inthe Descriptive Statistics Descriptive Statistics By: OpenStaxCollege Descriptive Statistics Class Time: Names: Student Learning Outcomes • The student will construct a histogram and a box plot • The student will calculate univariate statistics • The student will examine the graphs to interpret what the data implies Collect the Data Record the number of pairs of shoes you own Randomly survey 30 classmates about the number of pairs of shoes they own Record their values Survey Results _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Construct a histogram Make five to six intervals Sketch the graph using a ruler and pencil and scale the axes 1/2 Descriptive Statistics Calculate the following values ¯ x = _ s = _ Are the data discrete or continuous? How you know? In complete sentences, describe the shape of the histogram Are there any potential outliers? List the value(s) that could be outliers Use a formula to check the end values to determine if they are potential outliers Analyze the Data Determine the following values Min = _ M = _ Max = _ Q1 = _ Q3 = _ IQR = _ Construct a box plot of data What does the shape of the box plot imply about the concentration of data? Use complete sentences Using the box plot, how can you determine if there are potential outliers? How does the standard deviation help you to determine concentration of the data and whether or not there are potential outliers? What does the IQR represent in this problem? Show your work to find the value that is 1.5 standard deviations: above the mean below the mean 2/2 4/20/20091PHÉP THỬ MÔ TẢLựa chọn phép thửBáo cáo kết quảVấn đề cần giải quyếtPhân tích cảmquan Thị hiếuPhân tích cảm quan Thị hiếuCác sản phẩm có khác nhau?Cặp đôi, Tam giácXếp thứ tự cường độSo hàngMức độ khác biệt của sản phẩm?Cho điểmMô tảSự khác biệt có quan trọng đối với NTD?Ưu tiênChấp nhận2-3, 2-5Phân tích cảm quan Số liệu cảm quan 4/20/20092Chữ hoặcsố với thang thích ứng được dùng để mô tả mùi vị củacácsảnphẩmthựcphẩm.Vớiphương pháp này có thể xác định đượcsự sai biệt khá nhỏ giữahaimẫu, có nồng độ pha trộn, mức độ giống nhau, hoặccácchỉ tiêu cảmquan tổng thểđốivớisảnphẩm.Các kiếnthứcvề mùi vị phải đượcnắmvững để mô tả,vìtrongtrườnghợpnàykhôngápdụng để phân tích thống kê. Các cảmquanviênphảiđượchuấnluyệnkỹ,cótrìnhđộ cao.Thuộc tính A (385) B (408)Khoai tây chiên 7 5 4 8Hai sản phẩm có chung các Khoai tây chiên 7.5 4.8Khoai tây sống 1.1 3.7Dầuthựcvật3.6 1.1Mặn 6.2 13.5Ngọt2.21.0mô tả định tính nhưng rất khác nhau về cường độ của từng thuộc tính 8-12 người thử, lựa chọn theo sự năng nổ, tự nguyện, khả năng nhận biết và mô tả các thuộc tính cảm quan thụ cảm được của sản phẩm Người thử được huấn luyện cách phân biệt và đánh giá định lượng ẫNgườithửhay cường độ của những thuộc tính của mẫu và xác định mức độ thuộc tính hiện diện trong sản phẩm. Ưu điểm: Thông tin thu được rất chi tiết Mối quan hệ với ý kiến của người tiêu dùngGiới hạn: Thời gian để xây dựng và đi vào hoạt động tương đối lớn 4/20/20093•Thay đổi công thức chế biến• Thay đổi nguyên liệuPhạmvi ứng dụng• Thời hạn bảo quản• Thay đổi bao bì• Xác định các chỉ tiêu kiểm tra chất lượng• Thực hiện trước khi tiến hành phép thử thị hiếu• Phân tích động cơ từ chối sử dụng sản phẩm của NTD•Ghi nhậnnhững thuộctínhcảmthụ đượccủasảnphẩm để liênGhi nhận những thuộc tính cảm thụ được của sản phẩm để liên hệ với đo đạc bằng máy móc, hóa học hay lý tính thực phẩm•Đo lường các thay đổi ngắn hạn cường độ thuộc tính nào đó theo thời gian (phân tích thời gian-cường độ). 4/20/20094+ Lựa chọn và huấn luyện thành viên Hội đồng Các bướcxâydựng phép thử mô tả+ Lựa chọn thuật ngữ+ Huấn luyện nhóm người thử với danh sách các thuật ngữ tìm được+ Chọn thang đo+ Xây dựng bảng câu hỏi+ Làm quen với sản phẩm+Thử nghiệm chính thức+ Thử nghiệm chính thức Â Bề ngoài+ Sáng+ Bóng+ Nhẵn, trơnÂ Cấu trúc+ Mềm+ Dẻo+ Dòn+CátÂ Mùi vị:+ Tiêu+ Bơ+ DấmVí dụ các thuậtngữ+ CátÂ Vị:+ Chua+ Ngọt+ Mặn+ ĐắngÂ Thị hiếu:+ Bẩn+ Ngon+ Ghê, tởm• Am hiểu các nguyên tắckỹ thuậtvàtâmlýhọcvề mùi vị cấuAm hiểu các nguyên tắc kỹ thuật và tâm lý học về mùi vị, cấu trúc, hình trạng của sản phẩm•Huấn luyện kỹ: các cảm quan viên hiểu và ứng dụng các thuật ngữ theo một cách giống nhau•Sử dụng mẫu chuẩn cho thuật ngữ để bảo đảm việc sử dụng thuật ngữ như nhau. 4/20/20095ÂDùng một danh sách thuậtngữđãcótừ trướcÂ Thiếtlập danh sách các thuậtngữ (do Hội đồng thựchiện)Tìm một danh sách nhiềunhấtcácthuậtngữ20-40 người. Giớithiệulầnlượtchotừng ngườimộtdãykhoảng 15 sảnphẩm(cùngloại),3-5 sảnphẩm/buổi. Các sảnphẩmtương đối khác nhauđể xác định không gian chung. Sau giai đoạnlàmviệc độclậpngườithửlà iệ h ới ời điề hà h để ó hể đ á h ậ ữ khálàmviệcchung với người điều hành để có thể đưaracácthuậtngữ khác.Phân loại đầutiên(chất)Giữ lạitấtcả các thuậtngữ do ít nhấtmộtngườinêuvàhơnmộtlần đốivớisảnphẩm. Loạibỏ các thuậtngữ thị hiếu và không chính xácPhân loạithứ hai (lượng)Ngườithử nhậnlại danh sách, thử lạicácsảnphẩm, gán cho thuậtngữmột điểmtừ 0 -5 theo thang cường độ cảmgiácnhận được. Loạibỏnhững thuậtngữ mà phầntíchlũy không lớnhơn 10% tổng lượng thôngtintin.Phân loạithứ ba (thống kê)Ma trậnthuậtngữ/sảnphẩm, AFC, CAH để giảmsố lượng thuậtngữÂ Huấnluyện nhóm ngườithử với danh sách các thuậtngữ tìm đượcVí dụ thuật ngữ mô tả cà phê, tràMùi cà phêVị chátVị đắngMùi quả hạchMùi nấuMùigiấyị gMùi khétMùi cao su cháyMùi caramenMùi bơMùi rangMùi lêg yMùi nho khôMùi ôiMùi nhựacâyMùi cao suVị chuaVị tMùi lênmenMùi ngũ cốcMùi xanhMùi bùnVị ngọtMùi thuốcláMùi rượu 4/20/20096ThuậtngữĐịnh nghĩaMẫuthamkhảoChuẩnbịmẫuVí dụVị chát Là mộtcảmgiáchóa ON SOME MULTIVARIATE DESCRIPTIVE STATISTICS BASED ON MULTIVARIATE SIGNS AND RANKS NELUKA DEVPURA NATIONAL UNIVERSITY OF SINGAPORE 2004 ON SOME MULTIVARIATE DESCRIPTIVE STATISTICS BASED ON MULTIVARIATE SIGNS AND RANKS NELUKA DEVPURA (B.Sc.(Statistics) University of Colombo) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY NATIONAL UNIVERSITY OF SINGAPORE 2004 i Acknowledgments I was helped by many people professionally and personally to complete and perfect the thesis Where words alone will not suffice to express my heartiest gratitude to those wonderful people, who assisted me and encouraged me to achieve the objectives of this thesis First of all, I owe an immense depth of gratitude to my supervisor Dr Biman Chakraborty who had provided me much needed support and unending encouragement throughout the thesis I truly appreciate all the time and effort he has spent in helping me and for the valuable comments and suggestions I wish to thank the staffs of my department for providing me very much support during my study and special thanks goes to my colleagues and friends for their generous help given to me during preparation of the thesis I would like to take this opportunity to thank my father Dharmasena Devpura and mother Lakshmi for looking after my daughter for last two years They have been supporting me all the way upto now by taking most of my burden onto them and thanks to them only, I have come so far in my life Finally, I would like to thank my husband and loving sisters for their support given I wish to contribute the completion of my thesis to my dearest family ii Contents Acknowledgment Summary xiv Introduction 1.1 Outline of the thesis Multivariate Medians 2.1 2.2 i 1 Notions of Multivariate Symmetry 2.1.1 Spherical Symmetry 2.1.2 Elliptical Symmetry 2.1.3 Central and Sign Symmetry 2.1.4 Angular and Halfspace Symmetry Notions of Multivariate Medians 2.2.1 Co-ordinatewise Median 10 CONTENTS 2.3 2.4 iii 2.2.2 Spatial Median 11 2.2.3 Convex Hull Peeling Median 12 2.2.4 Oja’s Simplex Volume Median 13 2.2.5 Liu’s Simplicial Median 15 2.2.6 Tukey’s Half-space Depth Median 16 Transformation Retransformation Based Approaches 16 2.3.1 Data Driven Co-ordinate System 17 2.3.2 Tyler’s Approach 18 Computing the TR Median Multivariate Quantiles, Signs and Ranks 3.1 3.2 23 3.1.1 Computing lp -Quantiles 26 3.1.2 Affine Equivariant lp-Quantiles 28 Multivariate Signs and Ranks 29 Quantile Contour Plot 32 Examples with Real Data Sets 40 Some Multivariate Descriptive Statistics 4.1 23 Multivariate lp-Quantiles 3.2.1 3.3 20 Scale Curves 45 45 CONTENTS 4.2 4.3 4.4 iv 4.1.1 Algorithm for Computation of Central Rank Regions 47 4.1.2 Affine Equivariant Scale Curve 48 4.1.3 Scale Curves for Real Data Sets 59 Bivariate Boxplots 64 4.2.1 Constructing Bivariate Boxplot 64 4.2.2 Affine Equivariant Boxplot 69 4.2.3 Examples with Real Data 72 Multivariate Kurtosis Curve 78 4.3.1 Applications 84 Multivariate Skewness Curve 91 4.4.1 99 Applications Multivariate Skew-Symmetric Distributions 106 5.1 Multivariate g-and-h Distribution 106 5.2 Conclusion 114 Appendix 117 Bibiliography 136 v List of Figures 3.1 Co-ordinatewise quantile contour plot for bivariate normal data Numerical Summary Measures Boxplots Exploratory techniques for paired data Statistics in Geophysics: Descriptive Statistics II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/29 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Background The numerical summaries presented in this section can be subdivided into measures of location, spread and shape Location refers to the central tendency of the data values Spread denotes the degree of variation or dispersion around the center Measures of shape tell you the amount and direction of departure from symmetry and how tall and sharp the central peak of the data is Let X be the variable of interest Suppose a sample of size n is given with observed values x1 , , xn Winter Term 2013/14 2/29 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Mode The mode, xmod , is the most frequently occurring value or category of X The mode is the most important measure of location for categorical variables The mode of the sample {1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17} is Given the list of data {1, 1, 2, 4, 4} the mode is not unique the data set may be said to be bimodal, while a set with more than two modes may be described as multimodal Winter Term 2013/14 3/29 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Arithmetic mean The arithmetic mean or average of a sample is x¯ = for which it holds that n n xi , i=1 n i=1 (xi − x¯) = For frequency data with different observed values a1 , , ak and relative frequencies f1 , , fk the mean is k x¯ = aj fj j=1 The mean is a meaningful measure for metric data It is not a robust statistic, meaning that it is strongly affected by outliers Winter Term 2013/14 4/29 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Median The sorted, or ranked, data values from a particular sample are called the order statistics of that sample Given x1 , x2 , , xn the order statistics x(1) , x(2) , , x(n) for this sample are the same numbers, sorted in ascending order Equal proportions of the data fall above and below the median, xmed Formally, xmed = x( n+1 ) if n is odd 2 (x(n/2) + x(n/2 +1) ) if n is even The median is a resistant measure of location and is meaningful for variables that possess at least an ordinal scale of measurement Winter Term 2013/14 5/29 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Quantiles A sample quantile, xp , is a number having the same units as the data, which exceeds that proportion of the data given by the subscript p, with < p < Computation: xp = x( np +1) if np is not an integer (x(np) + x(np+1) ) if np is an integer , where np is the largest integer not greater than np Commonly used quantiles: x0.5 = xmed ; x0.25 : first (or lower) quartile; x0.75 : third (or upper) quartile Winter Term 2013/14 6/29 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Variance The empirical variance of x1 , , xn is ˜s = n n (xi − x¯)2 i=1 Since E(˜s ) = σ (n − 1)/n, an unbiased estimator for the population variance, σ , is the sample variance s2 = n−1 n (xi − x¯)2 i=1 √ The standard deviation, s, is obtained as s = + s Both s and s are not resistant measures of dispersion Winter Term 2013/14 7/29 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Variance decomposition k groups (x11 , x21 , , xn1 ,1 ), · · · , (x1k , x2k , , xnk ,k ) with nj x¯j = nj and ˜sn2j = nj with n = k j=1 nj (j = 1, , k) nj (xij − x¯j )2 , (j = 1, , k) i=1 Then ˜sn2 = xij , i=1 n k nj (¯ xj − x¯)2 + j=1 and x¯ = Winter Term 2013/14 n n k ¯j j=1 nj x 8/29 k nj ˜sn2j j=1 Numerical Summary Measures Boxplots Exploratory techniques for paired data Location Spread Shape Setting the scene Frequency distributions Statistics in Geophysics: Descriptive Statistics Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Background Observing systems and computer models in geophysical sciences produce torrents of numerical data One important application of statistical ideas is in making sense of a set of data The goal is to extract insights about the processes underlying the generation of the numbers Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data (sample) More recently, a collection of summarisation techniques has been formulated under the heading of exploratory data analysis Winter Term 2013/14 2/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Elementary unit and population Definition: Elementary unit Objects for which a statistical analysis is desired Symbol: ω Definition: Population Aggregation of all elementary units defines a population Symbol: Ω ωi ∈ Ω, i = 1, , N N is the size of the population Winter Term 2013/14 3/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Elementary unit and population Example: Households in Germany ωi : a household in Germany Ω: all households in Germany Population size N: about 40.1 million (as of 2008) Example: Fish in a lake ωi : a fish in a lake Ω: all fish in a lake Population size: ? Winter Term 2013/14 4/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Sample Definition: Sample A sample is a subset of the elementary units, drawn from the population by means of a sampling method (e.g random sample) Sampling theory is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population Sample size: n (n < N) Statistical analysis of the sample allows us to draw conclusions about the population of interest (inferential statistics) Winter Term 2013/14 5/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Variable and values of a variable Definition: Variable or statistical variable Properties, characteristics or attributes of an elementary unit Definition: Variable values The different values a variable can take The values can be qualitative: variable values are not numbers, but may be coded by numerical values Such variables are often called categorical quantitative: variable values are numbers (numerical values) discrete: finite or countable set of different values continuous: uncountable set of different values quasi-continuous: data are continuous but measured in a discrete way Winter Term 2013/14 6/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Variable and values of a variable Examples Gender: qualitative Coding: 1=male, 2=female Hair colour: qualitative Coding: 1=red, 2=brown, et cetera Temperature: quantitative, (quasi-)continuous Number of car accidents in 2012 in Germany: quantitative, discrete School grades: qualitative Values: 1,2,3,4,5,6 Winter Term 2013/14 7/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Level of measurements The level at which a variable is measured determines the choice of numerical summary measures to describe the main features of the data, what kind of graphical representations are useful for exploratory data analysis, which methods of statistical inference can be applied Winter Term 2013/14 8/32 Setting the scene Frequency distributions Population and sample Variables Types of measurement scales Measurement scales Definition: Nominal scale Lowest level, unordered set of values Relation or operation: .. .Descriptive Statistics Calculate the following values ¯ x = _ s = _ Are the data discrete or continuous?

Ngày đăng: 31/10/2017, 17:25

Xem thêm