Continuous Probability Functions tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất cả các lĩnh...
Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Pr ocessing Volume 2011, Article ID 294010, 14 pages doi:10.1155/2011/294010 Research Ar ticle On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing D. Kumar, P. Vimal, and Rajesh M. Hegde Department of Electrical Engineering, Indian Institute of Technology, Kanpur 208016, India Correspondence should be addressed to Rajesh M. Hegde, rhegde@iitk.ac.in Received 25 July 2010; Revised 8 February 2011; Accepted 2 March 2011 Academic Editor: Jar Ferr Yang Copyright © 2011 D. Kumar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Multimodal speech processing has been a subject of investigation to increase robustness of unimodal speech processing systems. Hard fusion of acoustic and visual speech is generally used for improving the accuracy of such systems. In this paper, we discuss the significance of two soft belief functions developed for multimodal speech processing. These soft belief functions are formulated on the basis of a confusion matr ix of probability mass functions obtained jointly from both acoustic and visual speech features. The first soft belief function (BHT-SB) is formulated for binary hypothesis testing like problems in speech processing. This approach is extended to multiple hypothesis testing (MHT) like problems to formulate the second belief function (MHT-SB). The two soft belief functions, namely, BHT-SB and MHT-SB are applied to the speaker diarization and audio-visual speech recognition tasks, respectively. Experiments on speaker diarization are conducted on meeting speech data collected in a lab environment and also on the AMI meeting database. Audiovisual speech recognition experiments are conducted on the GRID audiovisual corpus. Experimental results are obtained for both multimodal speech processing tasks using the BHT-SB and the MHT-SB functions. The results indicate reasonable improvements when compared to unimodal (acoustic speech or visual speech alone) speech processing. 1. Introduction Multi-modal speech content is primarily composed of acous- tic and visual speech [1]. Classifying and clustering multi- modal speech data generally requires extraction and com- bination of information from these two modalities [2]. The streams constituting multi-modal speech content are naturally different in terms of scale, dynamics, and temporal patterns. These differences make combining the informa- tion sources using classic combination techniques difficult. Information fusion [3] can be broadly classified as sensor level fusion, feature level fusion, score-level fusion, rank- level fusion, and decision-level fusion. A hierarchical block diagram indicating the same is illustrated in Figure 1. Number of techniques are available for audio-visual infor- mation fusion, which can be broadly grouped into feature fusion and decision fusion. The former class of methods are the simplest, as they are based on training a traditional HMM classifier on the concatenated vector of the acoustic and visual speech features, or an appropriate transformation on it. Decision fusion methods combine Continuous Probability Functions Continuous Probability Functions By: OpenStaxCollege We begin by defining a continuous probability density function We use the function notation f(x) Intermediate algebra may have been your first formal introduction to functions In the study of probability, the functions we study are special We define the function f(x) so that the area between it and the x-axis is equal to a probability Since the maximum probability is one, the maximum area is also one For continuous probability distributions, PROBABILITY = AREA 1 Consider the function f(x) = 20 for ≤ x ≤ 20 x = a real number The graph of f(x) = 20 is a horizontal line However, since ≤ x ≤ 20, f(x) is restricted to the portion between x = and x = 20, inclusive f(x) = 20 for ≤ x ≤ 20 The graph of f(x) = 20 is a horizontal line segment when ≤ x ≤ 20 The area between f(x) = base = 20 and height = 20 20 where ≤ x ≤ 20 and the x-axis is the area of a rectangle with 1/8 Continuous Probability Functions ( 201 ) = AREA = 20 Suppose we want to find the area between f(x) = AREA = (2 – 0) 20 and the x-axis where < x < ( 201 ) = 0.1 (2 – 0) = = base of a rectangle Reminder area of a rectangle = (base)(height) The area corresponds to a probability The probability that x is between zero and two is 0.1, which can be written mathematically as P(0 < x < 2) = P(x < 2) = 0.1 Suppose we want to find the area between f(x) = AREA = (15 – 4) ( 201 ) = 0.55 AREA = (15 – 4) ( 201 ) = 0.55 20 and the x-axis where < x < 15 2/8 Continuous Probability Functions (15 – 4) = 11 = the base of a rectangle The area corresponds to the probability P(4 < x < 15) = 0.55 Suppose we want to find P(x = 15) On an x-y graph, x = 15 is a vertical line A vertical line has no width (or zero width) Therefore, P(x = 15) = (base)(height) = (0) 20 = ( ) P(X ≤ x) (can be written as P(X < x) for continuous distributions) is called the cumulative distribution function or CDF Notice the "less than or equal to" symbol We can use the CDF to calculate P(X > x) The CDF gives "area to the left" and P(X > x) gives "area to the right." We calculate P(X > x) for continuous distributions as follows: P(X > x) = – P (X < x) Label the graph with f(x) and x Scale the x and y axes with the maximum x and y values f(x) = 20 , ≤ x ≤ 20 To calculate the probability that x is between two values, look at the following graph Shade the region between x = 2.3 and x = 12.7 Then calculate the shaded area of a rectangle 3/8 Continuous Probability Functions P(2.3 < x < 12.7) = (base)(height) = (12.7 − 2.3) ( 201 ) = 0.52 Try It Consider the function f(x) = 7.5) for ≤ x ≤ Draw the graph of f(x) and find P(2.5 < x < P (2.5 < x < 7.5) = 0.625 Chapter Review The probability density function (pdf) is used to describe probabilities for continuous random variables The area under the density curve between two points corresponds to the probability that the variable falls between those two values In other words, the area under the density curve between points a and b is equal to P(a < x < b) The cumulative distribution function (cdf) gives the probability as an area If X is a continuous random variable, the probability density function (pdf), f(x), is used to draw the graph of the probability distribution The total area under the graph of f(x) is one The area under the graph of f(x) and between values a and b gives the probability P(a < x < b) 4/8 Continuous Probability Functions The cumulative distribution function (cdf) of X is defined by P (X ≤ x) It is a function of x that gives the probability that the random variable is less than or equal to x Formula Review Probability density function (pdf) f(x): • f(x) ≥ • The total area under the curve f(x) is one Cumulative distribution function (cdf): P(X ≤ x) Which type of distribution does the graph illustrate? Uniform Distribution Which type of distribution does the graph illustrate? 5/8 Continuous Probability Functions Which type of distribution does the graph illustrate? Normal Distribution What does the shaded area represent? P( _< x < _) What does the shaded area represent? P( _< x < _) P(6 < x < 7) For a continuous probablity distribution, ≤ x ≤ 15 What is P(x > 15)? What is the area under f(x) if the function is a continuous probability density function? one For a continuous probability distribution, ≤ x ≤ 10 What is P(x = 7)? 6/8 Continuous Probability Functions A continuous probability function is restricted to the portion between x = and What is P(x = 10)? zero f(x) for a continuous probability function is , and the function is restricted to ≤ x ≤ What is P(x < 0)? f(x), a continuous probability function, is equal to ≤ x ≤ 12 What is P (0 < x < 12)? 12 , and the function is restricted to one Find the probability that x falls in the shaded area Find the probability that x falls in the shaded area 0.625 Find the probability that x falls in the shaded area 7/8 Continuous Probability Functions f(x), a continuous probability function, is equal to ( x ≤ ...Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2010, Article ID 179303, 12 pages doi:10.1155/2010/179303 Research Article Audio Query by Example Using Similarity Measures between Probability Density Functions of Features Marko Hel ´ en and Tuomas Virtanen (EURASIP Member) Department of Signal Processing, Tampere University of Technology, Korkeakoulunkatu 1, 33720 Tampere, Finland Correspondence should be addressed to Marko Hel ´ en, marko.helen@tut.fi Received 22 May 2009; Revised 14 October 2009; Accepted 9 November 2009 Academic Editor: Bhiksha Raj Copyright © 2010 M. Hel ´ en and T. Virtanen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This paper proposes a query by example system for generic audio. We estimate the similarity of the example signal and the samples in the queried database by calculating the distance between the probability density functions (pdfs) of their frame-wise acoustic features. Since the features are continuous valued, we propose to model them using Gaussian mixture models (GMMs) or hidden Markov models (HMMs). The models parametrize each sample efficiently and retain sufficient information for similarity measurement. To measure the distance between the models, we apply a novel Euclidean distance, approximations of Kullback- Leibler divergence, and a cross-likelihood ratio test. The performance of the measures was tested in simulations where audio samples are automatically retrieved from a general audio database, based on the estimated similarity to a user-provided example. The simulations show that the distance between probability density functions is an accurate measure for similarity. Measures based on GMMs or HMMs are shown to produce better results than that of the existing methods based on simpler statistics or histograms of the features. A good performance with low computational cost is obtained with the proposed Euclidean distance. 1. Introduction The enormous growth of personal and on-line multime- dia content has created the need for tools of automatic database management. Such management tools include, for instance, query by humming or query by example, multimedia classification, and speaker recognition. Query by example is an audio retrieval task where a user provides an example signal and the retrieval system returns similar samples from the database. The main problem in the query by example and the other above content management applications is to determine the similarity between two database items. The fundamental problem when measuring the simi- larity between audio samples is the imperfect definition of similarity. For example, a human can judge the similarity of two speech signals by the topic of the speech, by the speaker identity, or by any sounds on the background. There are retrieval approaches where the imperfect definition of similarity is circumvented differently. First, the similarity criterion can be defined beforehand. For example, query by humming [1, 2] retrieves pieces of music which have a musically similar melody to an input humming. Query-by- beat-boxing [3], on the other hand, aims at retrieving music pieces which are rhythmically similar to the example. These retrieval methods are based on extracting features which are tuned for the Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 482585, 10 pages doi:10.1155/2009/482585 Research Article Data Fusion Boosted Face Recognit ion Based on Probability Distribution Functions in Different Colour Channels Hasan Demirel (EURASIP Member) and Gholamreza Anbarjafari Department of Electrical and Electronic Engineering, Eastern Mediterranean University, Gazima ˘ gusa, KKTC, 10 Mersin, Turkey Correspondence should be addressed to Hasan Demirel, hasan.demirel@emu.edu.tr Received 20 November 2008; Revised 9 April 2009; Accepted 20 May 2009 Recommended by Satya Dharanipragada A new and high performance face recognition system based on combining the decision obtained from the probability distribution functions (PDFs) of pixels in different colour channels is proposed. The PDFs of the equalized and segmented face images are used as statistical feature vectors for the recognition of faces by minimizing the Kullback-Leibler Divergence (KLD) between the PDF of a given face and the PDFs of faces in the database. Many data fusion techniques such as median rule, sum rule, max rule, product rule, and majority voting and also feature vector fusion as a source fusion technique have been employed to improve the recognition performance. The proposed system has been tested on the FERET, the Head Pose, the Essex University, and the Georgia Tech University face databases. The superiority of the proposed system has been shown by comparing it with the state-of-art face recognition systems. Copyright © 2009 H. Demirel and G. Anbarjafari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. Introduction The earliest work in computer recognition of faces was reported by Bledsoe [1], where manually located feature points are used. Statistical face recognition systems such as principal component analysis- (PCA-) based eigenfaces introduced by Turk and Pentland [2]attractedalotofatten- tion. Belhumeur et al. [3] introduced the fisherfaces method which is based on linear discriminant analysis (LDA). Many of these methods are based on greyscale images; however colour images are increasingly being used since they add additional biometric information for face recognition [4]. Colour PDFs of a face image can be considered as the signature of the face, which can be used to represent the face image in a low-dimensional space. Images with small changes in translation, rotation, and illumination still possess high correlation in their corresponding PDFs, which prompts the idea of using PDFs for face recognition. PDF of an image is a normalized version of an image histogram. Hence the published face recognition papers using histograms indirectly use PDFs for recognition, there is some published work on application of histograms for the detection of objects [5]. However, there are few publications on application of histogram or PDF-based methods in face recognition: Yoo and Oh used chromatic histograms of faces [6]. Ahonen et al. [7] and Rodriguez and Marcel [8] divided a face into several blocks and extracted the Local Binary Pattern (LBP) feature histograms from each block and concatenated into a single global feature histogram to represent the face image; the face was recognized by a simple distance based grey-level histogram matching. Demirel and Anbarjafari Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 43218, 7 pages doi:10.1155/2007/43218 Research Article A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector H. Othman and T. Aboulnasr School of Information Technology and Engineering, Faculty of Engineering, University of Ottawa, Ontario, Canada K1N 6N5 Received 15 December 2005; Revised 13 November 2006; Accepted 28 November 2006 Recommended by Thippur V. Sreenivas We introduce an efficient hidden Markov model-based voice activity detection (VAD) algorithm with time-variant state-transition probabilities in the underlying Markov chain. The transition probabilities vary in an exponential charge/discharge scheme and are softly merged with state conditional likelihood into a final VAD decision. Working in the domain of ITU-T G.729 parameters, with no additional cost for feature extraction, the proposed algorithm significantly outperforms G.729 Annex B VAD while providing a balanced tradeoff between clipping and false detection errors. The performance compares very favorably with the adaptive multi- rate VAD, option 2 (AMR2). Copyright © 2007 H. Othman and T. Aboulnasr. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. INTRODUCTION Actual speech activities normal ly occupy 60% of the time of a regular conversation in a telecommunication system [1]. Voice activity detection (VAD) enables reallocating re- sources during the periods of speech absence. In mod- ern telecommunication systems, VADs, in conjunction with comfort noise generator (CNG) and discontinuous transmis- sion (DTX) modules, play a critical role in enhancing the sys- tem performance. A VAD distinguishes between speech and nonspeech frames in the presence of background noise. In general, VAD errors can be categorized into two main types of errors, no- tably clipping errors and false detection errors. Clipping er- rors occur when speech frames are misclassified as noise frames, which is intolerable in speech encoders due to its ef- fect on speech intelligibility, while false detection errors are due to misclassifying noise frames as speech frames. Echo cancellation systems are normally sensitive to this type of er- rors because it results in incorrect parameter adaptation. Traditional VAD algorithms rely on legacy features such as frame energy and zero-crossing rate (ZCR). In recent VAD algorithms, more features are used in different schemes. Among those are likelihood ratio (LR) that is based on complex Gaussian distribution of the signal discrete Fourier transform (DFT) in [2, 3], Higher-order statistics (HOS) of the LPC residuals of the signal that include skewness and kur- tosis in [4], power envelope dynamics in [5], and fractals in [6]. In this paper, we focus on voice ac tivity detection in one of the popular standards in voice and multimedia com- munications, namely G.729. This voice coding standard was introduced by the International Telecommunication Union (ITU) along with a recommended VAD algorithm in G.729- Annex B [7] ( G.729B) and was tested by Rockwell Interna- tional in [1]. The reason we chose G.729 is that it is one of the first coder standards that implement line spectral frequen- cies. This facilitates integrating the proposed work in any of the newer coders that adopt the same features. G.729B VAD is based on a simple piecewise linear de- cision boundary between the set of differential parameters and their respective long-term values. The advantage of the G.729B VAD is that it works in the parameter domain of the underlying coder w ith no extra lo ad for feature Original article Modelling resin production distributions for Pinus Pinaster Ait using two probability functions Nikos Nanos a,* , Wubalem Tadesse a , Gregorio Montero a , Luis Gil b and Ricardo Alia a a Centro de Investigacion Forestal, CIFOR-INIA, Apdo 8111, 28080 Madrid, Spain b Unit of Physiology and Genetics, ETSI Montes, 28040 Madrid, Spain (Received 12 April 1999; accepted 10 February 2000) Abstract – The Weibull and the Chaudhry-Ahmad probability density functions were used to model resin production distributions for maritime pine stands. Maximum likelihood was used for parameter estimation. Data were collected during one season in two sets of plots. Set 1 consisted of two 50-tree and one 100-tree plots. Bootstrap re-sampling showed that the Weibull parameters had smaller estimation errors for small sample sizes. Set 2 consisted of thirty-seven 10-tree plots. No significant differences in the fit of the den- sity functions were detected. Parameters of both models were found to be well correlated with the mean plot production as well as with the within plot coefficient of variation. The results did not reveal any major differences between the Weibull and the Chaudhry- Ahmad probability functions. The most appropriate model should be chosen at later stages when parameters of both functions are regressed against easily measured stand attributes. resin production distribution / Pinus pinaster / Weibull / Chaudhry and Ahmad / modelling Résumé – Modélisation de la distribution de production de résine pour Pinus pinaster Ait au moyen de deux lois de probabi- lité. La distribution de la production de résine de peuplements de pin maritime est modélisée par les fonctions de densité de probabi- lité de Weibull et de Chaudhry-Ahmad La méthode du maximum de vraisemblance est utilisée pour l’estimation des paramètres. Les données ont été mesurées dans deux groupes de placettes pendant une saison de récolte. Pour le premier groupe qui est composé de deux placettes de 50 arbres et d’une autre de 100 arbres, le re-échantillonnage «bootstrap» a montré que les paramètres de la fonction de Weibull ont une erreur plus faible pour les petits échantillons que celle de la fonction de Chaudhry-Ahmad. Le second groupe est constitué de 37 placettes de 10 arbres. Aucune différence significative entre l’ajustement des deux fonctions de probabilité n’est mise en évidence. Les paramètres des deux modèles sont corrélés avec les productions moyennes des placettes et avec les coefficients de variation intra-placettes. Les résultats ne montrent pas de différences significatives entre les fonctions de probabilité de Weibull et de Chaudry-Ahmad. Le modèle définitif sera choisi ultérieurement après la mise en relation entre les paramètres des deux fonctions et des variables dendrométriques facilement mesurables. distribution de la production de résine / Pinus pinaster / Weibull / Chaudhry et Ahmad / modélisation 1. INTRODUCTION Resin tapping was an important rural activity in the Mediterranean basin until the 1970s when the internation- al crisis in natural resin prices rendered this traditional labor no longer profitable. Presently Pinus pinaster Ait (maritime pine) is the only species tapped in Spain. Resin tapping is restricted to a few areas (mainly Central Spain), where trees produce a sufficient quantity of resin and extraction is facilitated by favorable terrain. Recently, an increased demand for natural resins has pushed up prices and many of the abandoned stands are tapped again. Ann. For. Sci. 57 (2000) 369–377 369 © INRA, EDP Sciences * Correspondence and reprints Tel. 34 91 347 68 15; Fax. 34 91 357 31 07; e-mail: nanos@inia.es N. Nanos et al. 370 Scientific interest in this forest product has traditionally been focused on improving the extraction method and pro- ducing improved genetic material since it was noted early that resin production is under high genetic control [15]. Despite long-standing scientific interest, there is a lack of ... is P(x = 7)? 6/8 Continuous Probability Functions A continuous probability function is restricted to the portion between x = and What is P(x = 10)? zero f(x) for a continuous probability function... that x falls in the shaded area 0.625 Find the probability that x falls in the shaded area 7/8 Continuous Probability Functions f(x), a continuous probability function, is equal to ( x ≤ Describe... gives the probability P(a < x < b) 4/8 Continuous Probability Functions The cumulative distribution function (cdf) of X is defined by P (X ≤ x) It is a function of x that gives the probability