1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Maximum entropy psychophysics

57 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Bits of the ROC: Signal Detection as Information Transmission
Tác giả Peter R. Killeen, Thomas J. Taylor
Trường học Arizona State University
Chuyên ngành Psychology
Thể loại thesis
Thành phố Tempe
Định dạng
Số trang 57
Dung lượng 457,1 KB

Nội dung

MAXENT ROC Running head: MINIMAL SDT Bits of the ROC: Signal Detection as Information Transmission Peter R Killeen & Thomas J Taylor Arizona State University Correspond with: Peter Killeen Department of Psychology Box 1 Arizona State University Tempe, AZ 8 - 1 email: killeen@asu.edu FAX: (480) - 4 Voice: (480) - 5 MAXENT ROC Abstract The framework for detection and discrimination called Signal Detection Theory (SDT) is reanalyzed from an information-theoretic perspective Receiver-operating characteristics (ROCs) for the isoinformative processor describe arcs in the unit square, and lie close t o those described by constancy of A’ Necessary and sufficient conditions for performance to fall on these arcs is that the p a y off matrix be h o n e s t , and that as bias shifts changes in the information expected from y e s responses are complemented by those from n o responses Asymmetric ROCs require further characterization The simplest maximum-entropy of the underlying distributions distributions on the evidence axis a r e exponential, and yield power-law ROCs that describe many data T h e success of ROCs constructed from confidence ratings shows that m o r e information is available from the signal than the experimenter’s b i n a r y classification lets pass Such ratings comprise category scaling of signal strength Power operating characteristics are consistent with Weber’s law Where Weber’s law holds, channel capacity on a dimension e q u a l s the logarithm of its Weber fraction MAXENT ROC Bits of the ROC: Signal Detection as Information Transmission Signal Detection Theory (SDT) and Information Theory (IT) were t h e jewels in the crown of 0t h century experimental psychology As a n avatar of statistical decision theory, SDT provided a technique f o r reducing a 2x2 table of relations between stimulus and response i n t o measures of detectability and bias, of sensitivity and selectivity, t h e r e b y untangling a 100-year old confound It has been well popularized b y Swets (e.g Swets, Dawes & Monahan, 2000) and thoroughly analyzed b y Macmillan and Creelman (1991) Information theory, the brilliant invention of Claude Shannon (Shannon, 1949), provided algorithms f o r quantifying the amount of information t r a n s m itted by a response Although both theories were formulated at the same time (the l a t e 1940s), and both concern similar phenomena (quantifying the a c c u r a c y of imperfect discriminations), there has been very little use of o n e theory to reinforce and c o m p l e m ent the other By the end of the 0t h century, SDT remains an important theory while IT is rarely m e n t i o n e d Classic SDT (here, CSDT) is an application of Thurstone scaling (see, e.g., Juslin & Olsson, 1997; Lee, 1969; Luce, 1977; Luce, 1994) Whereas many stimuli a r e susceptible to measurement in physical units, such as decibels o f intensity, some are not These latter are often of most central concern t o society, involving measures of complex stimuli such as beauty, q u a l i t y MAXENT ROC of life, i m p a c t of punishment Thurstone suggested a metric for t h e distance between stimuli that would embrace both the simple a n d complex: The unit of distance between stimuli would be the s t a n d a r d deviation of the percept associated with the stimulus Thurstone called the distribution of perceptions issuing from a stimulus a “discriminal process”, and its standard deviation σ the “discriminal dispersion” Two such processes are shown in Figure 1, issuing from two stimuli ++ Figure (discrim disp) and Table (calc) about here ++ Tables and give the data from which such machinery is inferred Table shows the joint probabilities of signals and responses In Table 2, these are divided by the row marginals to give the conditional probabilities of responding “A” or “B” given the stimulus value Although here viewed as a symmetric discrimination, the origin of CSDT was in detection tasks, where Sa was the background, or noise stimulus This gave rise to the terminology of Correct Rejection (CR) for r e s p o n s e s in the RaSa cell, and Misses (M) for responses in the RaSb cell ++ Tables & about here ++ Table de s c r i bes the performance from the perspective of a n experimenter, one who knows the stimulus condition and assays t h e probability of a response It is useful for characterizing the behavior o f the detector In the applications of SDT, we are usually given t h e response the verdicts of radiologists, juries, and children who c r y “wolf” and wish to know the probability that a signal was in fact present This requires dividing the cells in Table by their c o l u m n MAXENT ROC marginals, yielding Table One may go between T a b les & by u s i n g Bayes’ theorem to convert the arguments of a conditional probability Table is more user-friendly, in that consumers of SDT analyses a r e seldom given the state of nature, and wish to evaluate that state, n o t characterize the detector The impact of these different conditionalizations can be quite different: A child who always cries “wolf” when their prevalence is 1% will be assigned a hit rate of 100 % by the conventional Table 2, but only 1% by Table No matter h o w Table is conditionalized it has two degrees of freedom, and n o evaluation of a discrimination is complete without reporting b o t h accuracy of affirmatives and accuracy of negatives Table is o f t e n more convenient for information-theoretic analyses CSDT invokes normal discriminal processes to translate t h e probabilities in Table into the two measures of theoretical interest (d’ and C) Other distributions are reviewed by Egan (1975) The data a r e often consistent with these assumptions However, the discriminal processes are never observed, and carry more degrees of freedom t h a n the data they represent There are five degrees of freedom available i n constructing Figure 1: The means and standard deviations of signal a n d noise distributions, and the location of the criterion These overendowed distributions are then slimmed down by stipulating an origin and unit for the perceptual abcissae The origin is set either at the m e a n of the first percept, or halfway between the means of the two p e r c e p t s (as inferred from the data) The unit the standard deviation is set t o 1.0 This leaves the scale value of the second stimulus, d’ and t h e MAXENT ROC location of the criterion, C, as the recovered parameters that r e - p r e s e n t the information found in the hit and false alarm rates For the data i n Table 1, d’ = 0.385 - (-0.675) = 1.06 If the origin is placed at the m e a n of the two distributions, then C = -(z(H) + z(F))/2 Macmillan a n d Creelman (1991, Equation 2.2) include the factor 1/2 so that the range of t h e bias statistic is the same as that of d’ For the data in Table 2, C = (0.385 -0.675) = 0.29 As shown in Figure 1, the criterion is slightly above the mean of the percepts, indicating a conservative criterion: T h e observer is more likely to reject a marginal perception as noise than t o accept it as a signal CSDT was a trail breaking innovation Now standing near t h e summit, a glance back shows that CSDT did not pick out the most d i r e c t route to the goal of representing discrimination performance Too m u c h that is unverifiable is assumed, only to be later nullified Alternative nonparametric measures of sensitivity and bias have been d e v e l o p e d out of a condign sense of parsimony Macmillan and Creelman (1996) reviewed these alternatives in an article whose title h e d g e d “nonparametric”, because the measures reviewed either made s u b t l e assumptions of underlying distributions or mechanisms or were at least consistent with such distributions Subsumption Psychophysics It this paper we m a ke assumptions about mechanisms a n d distributions in incremental fashion, in the style of Brooks (1991), w h o coined the term subsumption a r c h i t e c t u r e to describe such a bottom-up MAXENT ROC approach Build until it breaks, then see what additional is necessary The conceptual tool that permits this approach to be applied to signal detection theory is the maximum entropy formalism (MEFJaynes, 1986; Skilling, 1989; Tribus, 1969) The MEF stipulates that inference should be b a s e d on statements of everything known, with all other constraints maximally random (i.e., having maximum entropy) If they are not maximally random, then we are implicitly making additional inferences about t h e i r nature It is the goal of MEF to make all such knowledge explicit, leaving nothing hidden in implicit assignments of parameters or distributions We instantiate the MEF for detection by (a) describing a d e t e c t i o n theory that makes no assumptions concerning underlying distributions; then (b) describing one t h a t invokes underlying o n e - p a r a m e t e r distributions of signal strengths; and then (c) describing one t h a t invokes underlying two-parameter distributions of signal s t r e n g t h Minimal SDT Assume that observers attempt to maximize performance, given t h e constraint imposed by limits on the information available from t h e stimuli This goal is equivocal until “maximal performance” is defined Value Whenever an experimenter stipulates proper behavior for a n observer (e.g., “Respond B only when you are sure you have o b s e r v e d the stimulus”), they are imposing an index of merit for t h e i r performance Often this is vague, as in the example given One of t h e many important contributions of CSDT was its emphasis on the explicit assignment of indices of merit to p e r f o r m a n c e An example is given i n MAXENT ROC Table 4, where the entries indicate the values assigned to each of t h e outcomes For instance, an experimenter may provide points, convertible into goods, for performance in the following manner: v(CR) = v(H) = +5; v(F) = -3; v(M) = -1 This would generate a slightly conservative bias in subjects attempting to maximize their expected payoff The perceived utility of the goods is, however, often a n o n l i n e a r function of the points assigned (Kornbrot, Donnelly & Galanter, 1981) Some subjects, motivated by a sense of propriety that outweighs the payoff matrix, may attempt to maximize their % correct Because of t h e potential ambiguity of what subjects may be maximizing, p o i n t predictions are seldom made Instead, what is predicted is the nature o f the curve that describes the locus of points p(H), p (F) in the u n i t square, and the changes in parameters of that curve, or of t h e observer’s location on it, with changes in the payoff matrix or t h e discriminability of the s t i mulus (see Figure ) ++ Insert Figure (ROC) and Table (payoff) around here +++ Symmetry Consider the case in which there is no reason to t h i n k that Sa is qualitatively different than Sb , so that it is arbitrary which is called A and which B, and thus arbitrary which conditional probability in Table is called hit and which c o r r e c t rejection Switching t h o s e labels gives the open circle shown in Figure as an equally valid locus for the data; it is where the data would be found if the only i m p o r ta n t distinction between the two stimuli were the labels the e x p e r i m e n t e r assigned to them, and those could be arbitrarily reassigned What t h e s e two data points have in common is that they convey an equal amount o f MAXENT ROC information from the stimulus through the observer to t h e experimenter We now generalize this relation Information The related concepts of randomness (entropy) and its r e d u c t i o n (information) have been given explicit formulation only in this c e n t u r y by Brillouin, Cox, Jaynes, Weiner, and most importantly, Shannon Brief histories of these ideas by major contributors are (Tribus, 1979) and (Jaynes, 1979) In particular, Jaynes reformulated both statistical mechanics a n d inferential statistics using MEF and Bayes’s Theorem Because information is the central concept in this regrounding of SDT, i t requires explication Entropy is a thermodynamic measure of disorder, which changes a s a function of the energy added to a system relative to its temperature It is intimately related to information, which is a measure of the r e d u c t i o n of entropy by some operation Shannon’s (1949) key insight was t h e development of entropy theory for the measurement of i n f o r m a t i o n transmission “Shannon’s paper takes its rightful place alongside t h e works of Newton, Carnot, Gibbs, Einstein, Mendeleev and the o t h e r giants of science on whose shoulders we all stand (Tribus, 1979, p 10) Information is a relative concept; it is relative to context, and to t h e state of the receiver A coded message may look completely r a n d o m until we are given a key (context), which permits the extraction o f useful information The amount of information is not the same as its value Small amounts of information may be of greater value than t h a t derivable from encyclopedias: Lamps in a belfry may be inscrutable MAXENT ROC 10 without the key “One if by land, two if by sea”, in which context t h e y provide approximately one bit of very important information T h e y would provide less than a bit if the route of invasion were a l r e a d y known, or known with some probability; they would provide more t h a n a bit if the timing or color or brightness of the signal c o n v e y e d additional information, such as distance or magnitude of the force Because information is defined as change either as a difference i n discrete systems or as a differential in continuous systems it is a process/behavioral construct, rather than a content/cognitive construct Information does not reside in the source, nor in t h e message, nor in the communication channel; nor in the receiver It resides nowhere Information is the reduction in the u n c e r t a i n t y (entropy) of a response by use of a stimulus Books not c o n t a i n information Books may reduce the uncertainty entropy of the reader’s response to questions such as “who killed the white whale” The book informs the response, but does not c o n t a i n information T h e book is a key that permits the student to decipher the correct answer t o the question The font of the text is u n i n f o r mative Unless the q u e s t i o n concerns typography The number of chapters is uninformative Unless the game is trivial pursuit No specification of the information value of a stimulus such as a book can be made without knowledge of the range o f possible q u e s tions and answers (their entropy), and the degree to w h i c h the answers are less random than they would be without that stimulus To say a book is informative means that it will permit the reader t o respond to a wide range of relevant (to the reader) q u e s t i ons in a non- MAXENT ROC Table The rating scale ROC data matrix Response: Stimulus Sa Sb -n … -3 -2 -1 +1 +2 +3 … +n 43 MAXENT ROC 44 Probability Density C Percept Value Pb Pa Stimulus Value 250 Figure 500 Sa 750 1000 Sb 1250 The machinery of CSDT The discriminal processes a r e Gaussian densities representing the probability that a stimulus will give rise to a perceptual event of a particular magnitude The observer says “B” whenever a percept exceeds a criterion C, represented by the vertical line Normally an Sb stimulus gives rise to a percept that falls above t h e criterion, and the affirmative response is called a hit (H) Sometimes an Sa stimulus gives rise to a percept that exceeds the criterion, and t h e affirmative r e s p o n se is then called a false a l a r m (F) The discriminability of two stimuli is given by the difference of their z-scores, as inferred f r o m the accuracy of their performance In particular, d’ = -[z(H) - z(F)] MAXENT ROC T = 0.50 45 T = 0.12 p(H) 0.8 0.6 0.4 0.2 Isoinformation ROCs 0 0.2 0.4 0.6 0.8 p(F) Figure The data from Table plotted as a filled circle in the u n i t square The open circle is derived by assuming symmetry of signals T h e curves are drawn through transmitted by the observer points that conserve the information MAXENT ROC 46 Transinformation (bits) T ≈ 0.1d' 0.1 0.01 0.001 0.1 d' Figure The relation between transinformation measured as d’ and detectability MAXENT ROC Complementarity of Entropy Expected from Responses Average Entropy | yes 47 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 Average Entropy | no Figure between Conservation the expected of information equivocation accomplished (information loss) by a shift from a yes response, p[y]U[S|y] and that from a n o response p [n]U[S|n] The small filled circles show the data from Figure 2; the unfilled circles show t h e data from Figure in the condition where the signal probability was h e l d constant MAXENT ROC 48 Isoinformation ROC p(H) 0.8 0.6 0.4 T = 0.05 0.2 Green & Swets (1966) 0 0.2 0.4 0.6 0.8 p(F) Figure Data reported by Green and Swets for an observer b i a s e d by varying the signal presentation probability (squares; their Figure - ) and by varying the payoffs (circles; their Figure 4-2) The curve is t h e isoinformation ROC, which is not visually discriminable constant A’ (also drawn through the points) from that f o r MAXENT ROC 49 p(H) 0.8 0.6 0.4 0.2 Green & Swets (1966) 0 0.2 0.4 0.6 0.8 p(F) Figure Data reported by Green and Swets (their Figure 4-5) f o r another observer biased by varying the signal presentation probability (the same condition as shown by the squares in Figure 5) The c o n t i n u o u s curve is the isoinformation ROC The dashed curve is given by Equation Probability MAXENT ROC 50 Noise Signal 10 15 20 25 30 Stimulus Figure Maxent exponential distributions of two random variables on the positive line in the continuous case The variable with the smaller mean is called Noise, and the other Signal MAXENT ROC 51 1 0.8 0.8 0.6 0.6 p(H) p(H) Swets, Tanner & Birdsall (1964) 0.4 0.2 0.4 0.2 0 0.2 0.4 0.6 0.8 0.2 0.6 0.8 p(F) p(F) 1 0.8 0.8 0.6 0.6 p(H) p(H) 0.4 0.4 0.2 0.4 0.2 0 0.2 0.4 0.6 p(F) 0.8 0.2 0.4 0.6 0.8 p(F) Figure Data from observers detecting brief flashes of light, f r o m (Swets, Tanner & Birdsall, 1964) Performance was manipulated varying the payoff matrices by MAXENT ROC 52 1 0.8 0.8 0.6 0.6 p(H) p(H) Norman (1963) 0.4 0.2 0.4 0.2 A: ∆v/v = 0.017 B: ∆v/v = 0.019 0 0.2 0.4 0.6 0.8 0.2 1 0.8 0.8 0.6 0.6 0.4 0.6 0.8 0.4 0.2 0.2 D: ∆v/v = 0.023 C: ∆v/v = 0.022 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 p(F) p(F) 1 0.8 0.8 0.6 0.6 p(H) p(H) 0.4 p(F) p(H) p(H) p(F) 0.4 0.2 0.4 0.2 E: ∆v/v = 0.029 F: ∆v/v = 0.033 0 0.2 0.4 0.6 0.8 p(F) 0.2 0.4 0.6 0.8 p(F) Figure Data from one observer detecting brief increments in t h e intensity of tones (Norman, 1963) Each panel shows the data for a different signal-to-noise ratio Bias was varied with differential payoffs MAXENT ROC 53 Two indices of merit compared 2*ln[H/(1-H)] ≈ d' 0 log (S/N) Figure The equivalence of the indices of merit for logistic SDT and power ROCs The x-axis is the parameter β MAXENT ROC 54 Parameter of IOCs as a Function of Relative Signal Amplitude S µ /µ N 10 1.02 1.03 1.04 1.05 (∆v+v)/v Figure 1 Beta, the ratio of the exponential means inferred f r o m power functions fit to the data of observers, versus the relative increment in signal voltage associated with them Data from ( N o r m a n , ) MAXENT ROC 55 Emmerich (1968) 0.6 0.4 10 S µ /µN p(r|S) 0.8 0.2 10 12 10 log(E/N ) 0 0.2 0.4 0.6 0.8 p(r|N) Figure Rating scale operating characteristics The inset shows the parameter of the power function against the signal-to-noise ratio i n dB Each datum along a curve is obtained by re-aggregating the d a t a around successive ratings, as though different ratings corresponded different criteria to MAXENT ROC Figure medical The entropy measurement may of stimuli such as an encyclopedia be indefinitely large 56 or a Information transmission is limited by the variable with the smallest entropy This may be the experimenter stimulus, the experimenter or the observer If t h e imposes a binary classification, the maximum i n f o r m a t i o n that may transmitted between observer and experimenter is bit, e v e n though the observer may be able to make finer discriminations MAXENT ROC 57 Figure Digitization loss increases with the relative entropy of t h e signal C is channel capacity for a continuous Gaussian signal, and w h e n encoding is restricted to binary signals; W is the bandwidth of t h e signal, and S/ N is the signal-to-noise ratio R eprinted from ( H a r m o n , ), with permission ... i o n New York: McGraw-Hill Jaynes, E T (1979) Where we stand on maximum entropy? In R D Levine & M Tribus (Eds.), The maximum entropy formalism, (Vol 15-118, ) Cambridge, MA: Masachussetts Institute... scaling literature The entropy of a stimulus, and thus the maximum i n f o r m a t i o n available to the observer, may vary as a function of its mean T h e measure of entropy depends on the... discussion is limited to n e t entropy, with another source of entropy, such as noise, canceling o u t the infinitesimal For both exponential and normal discriminal processes, the maximum information available

Ngày đăng: 13/10/2022, 14:41

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN