Data Analysis Machine Learning and Applications Episode 1 Part 10 ppt

212 Antonello D’Ambra, Pietro Amenta and Valentin Rousson values (a negative value, zero, and a positive value), the sum of the loadings being zero for each component (defining hence proper contrasts of categories). The goal of Simple NSCA is to find the optimal system of components among the simple ones, where optimality is calculated according to Gervini and Rousson (2004). The percentage of extracted variability V(L) accounted by a system L of m = min(I,J) −1 components is given by V(L)= l  1 Sl 1 tr(') + 1 tr(') m  k=2 l  k  S −SL (k−1) (L  (k−1) SL (k−1) ) −1 L  (k−1) S  l k , where l k is the kth column of L, and where L (k−1) is the m×(k−1) matrix containing the first (k −1) columns of L. Whereas the numerator of the first term of this sum is equal to the variance of the first component, the numerator of the kth term can be interpreted as the variance of the part of the kth component which is not explained by (which is independent from) the previous (k−1) components. Thus, correlations are "penalized" by this criterion which is hence uniquely maximized by PCA, i.e. by taking L = E m , the matrix of the first m eigenvectors of S (Gervini and Rousson, 2004). The optimality of a system L is then calculated as V(L)/V(E m ). In our sequential algorithms below, the k th simple component is obtained by regressing the original row/column categories on the previous k −1 simple components already in the system, by computing the first eigenvector of the residual variance hence obtained, and by shrinking this eigenvector towards the simple difference component which maximizes optimality. Here are two algorithms providing simple components for the rows and the columns. Simple solutions for the rows 1. Let S = 3D j 3  ,letL be an empty matrix and let ˆ S = S. 2. Let a =(a 1 , ,a I )  be the first eigenvector of ˆ S. 3. For each cut-off value among g = {0, |a 1 |, ,|a I |}, consider the shrunken vector b(g)={b 1 (g), ,b I (g)}  with elements b k (g)=sign(a k ) if |a k |> g and b k (g)= 0 otherwise (for k = 1, ,I). Update and normalize it such that  b k (g)=0 and  b 2 k (g)=1. 4. Include into the system the difference component b(g) which maximizes b(g)  ˆ Sb(g) (i.e. add the column b(g) to the matrix of loadings L). 5. If the maximum number of components is attained stop. Otherwise let ˆ S = S − SL(L  SL) −1 L  S and go back to step 2. Simple solutions for the columns 1. Let S = D 1/2 j 3  3D 1/2 j ,letL be an empty matrix and let ˆ S = S. 2. Let a =(a 1 , ,a J )  be the first eigenvector of ˆ S. Simple Non Symmetrical Correspondence Analysis 213 3. For each cut-off value among g = {0,|a 1 |, ,|a J |}, consider the shrunken vector b(g)={b 1 (g), ,b I (g)}  with elements b k (g)=sign(a k ) if |a k |> g and b k (g)= 0 otherwise (for k = 1, ,J). Update and normalize it such that  b k (g)=0and  b 2 k (g)=1. 4. Include into the system the difference component b(g) which maximizes b(g)  ˆ Sb(g) (i.e. add the column b(g) to the matrix of loadings L). 5. If the maximum number of components is attained, let L = D −1/2 j L and stop. Otherwise let ˆ S = S −SL(L  SL) −1 L  S and go back to step 2. 4 Father’s and son’s occupations data To illustrate the technique of Simple NSCA, we applied it to the well known Father’s and Son’s Occupations. This data set (Perrin, 1904) was collected to study whether and how the professional occupation of some man depends on the occupation of his father. Occupations of 1550 men were cross-classified according to father’s and son’s occupation reparted into 14 occupations. The conclusion of the study was that such a dependence existed. Two measures of predicability, the Goodman-Kruskal’s W (1954) and the Light and Margolin’s C = (n −1)(I −1)W (1971), have been computed. Note that the C-statistic can be used to formally test for association, being asymptotically chi-squared distributed with (I −1)(J −1) degrees of freedom under the hypothesis of no association (Light and Margolin, 1971). The overall increase in predicability of a man’s occupation when knowing the occupation of his father was equal to 14% (W = 0.14; C = 2880.8; df = 169, p value 0.0001). According to the NSCA decomposition of the numerator of W (W num =  M k=1 O 2 k = 0.1288), we have for the first two axes O 1 = 0.24 and O 2 = 0.16, which are the weights of the axes in the joint plot of Figure 1. The first axis accounts for 100 × (0.24) 2 /0.1288 = 43.7% of the dependence between the two variables while the second one represents 20.7%. Therefore Figure 1 accounts for 64.4% of the total inertia. Unfortunately, the two-dimensional NSCA solution (Figure 1) does not give a clear description of the dependence of the two variables as well as of the association between rows and columns. Thus, NSCA is difficult to interpret and a simple solution has been calculated according to Simple NSCA. From Table 1, one can see that the first component defined by Simple NSCA for the rows contrasts son’s occupation “Art” versus the group of occupations {Army, Divinity, Law, Medicine, Politics & Court and Scholarship & Science}. This simple component explains 42.5% of the variance compared to 43.7% for optimal solution above. Thus, the first simple row solution is 42.5%/43.7%=97.4% optimal. One can conclude that the influence of father’s occupation on son’s occupation mainly contrasts these two groups of occupation. The second simple row solution provided by Simple NSCA contrasts son’s occupation “Divinity” versus the group of occupations {Army and Politics & Court}. 214 Antonello D’Ambra, Pietro Amenta and Valentin Rousson Fig. 1. Non Symmetrical Correspondence Analysis (NSCA): Joint plot. The same table also contains the Simple NSCA solution for the columns. The first simple column solution contrasts father’s occupation “Art” versus “Divinity”, and is 81.9% optimal. The second simple column solution contrast groups of father’s occupations {Army, Landownership, Law and Politics & Court} versus {Art and Divinity} with an optimality value of 90.4%. Similarly, further simple constrats can be defined for both the rows and the columns (see Table 1 for the first 5 solutions). Simple Non Symmetrical Correspondence Analysis 215 Table 1. Simple NSCA solutions for the first five axes. SON (row) FATHER (column) Axis1 Axis2 Axis3 Axis4 Axis5 Axis1 Axis2 Axis3 Axis4 Axis5 Army 0,15 -0,41 -0,44 -0,37 -0,50 0,00 -0,89 -1,20 3,21 0,00 Art -0,93 0,00 0,00 0,00 0,00 -2,04 1,77 -1,20 0,00 0,00 TCCS 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 Crafts 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,86 0,00 0,00 Divinity 0,15 0,82 -0,44 0,00 0,00 2,04 1,77 -1,20 0,00 0,00 Agricolture 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,86 0,00 0,00 Landownership 0,00 0,00 0,00 0,00 0,00 0,00 -0,89 -1,20 0,00 0,00 Law 0,15 0,00 0,33 0,55 -0,50 0,00 -0,89 0,86 -1,61 -2,65 Literature 0,00 0,00 0,33 0,00 0,00 0,00 0,00 0,86 0,00 0,00 Commerce 0,00 0,00 0,33 0,00 0,00 0,00 0,00 0,86 0,00 0,00 Medicine 0,15 0,00 0,00 -0,37 0,50 0,00 0,00 0,86 0,00 2,65 Navy 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 POLCOURT 0,15 -0,41 -0,44 0,55 0,50 0,00 -0,89 -1,20 -1,61 0,00 SCSCIENCE 0,15 0,00 0,33 -0,37 0,00 0,00 0,00 0,86 0,00 0,00 Explained variance (%) Optimal solution 43,70 64,40 75,30 83,00 89,20 43,70 64,40 75,30 83,00 89,20 Simple solution 42,50 62,20 72,30 79,70 85,70 35,80 58,20 68,50 75,10 80,30 Optimality 97,40 96,60 96,10 96,10 96,10 81,90 90,40 91,00 90,50 90,00 Note: TCCS, POLCOURT and SCSCIENCE stand for “Teacher, Clerck and Civil Servant”, “Politics & Court” and “Scolarship & Science”, respectively. To better summarize and visualize the relationship between father’s and son’s occupation, it is helpful to plot the solutions for rows and columns for each axis on a same graphic (Figure 2). One can see that the first Simple NSCA solution highlights the fact that a son has the tendency to choose the same occupation as his father if this occupation is “Art”, while father’s occupation “Divinity” is linked with a son’s occupation within {Army, Divinity, Law, Medicine, Politics & Court and Scholarship & Science}. Similarly, one can try to interpret the second Simple NSCA solution. In summary, Simple NSCA provides a clearcut picture of the situation, the optimality of the first two axes being in this example of more than 95% (for the rows) and 90% (for the columns). Thus, the price to pay for simplicity is about 5% (for the rows) and 10% (for the columns), which is not much. In this sense, Simple NSCA may be a worth alternative to NSCA. 5 Conclusions In general, all PCA-based methods are tuned to condense information in an optimal way. However, they define some abstract scores which often are not meaningful or not well interpretable in practice. This was also the case in our example above for 216 Antonello D’Ambra, Pietro Amenta and Valentin Rousson Fig. 2. Summary of Simple NSCA solutions for the axes 1 and 2. NSCA. To enhance interpretability, Simple NSCA focus on simplicity and seeks for “optimal simple components”, as illustrated in our example. It provides a clearcut interpretation of the association between rows and columns, the price to pay for simplicity being relatively low. In this sense, Simple NSCA may be a worth alternative to NSCA. Extensions of this approach for the Classical Correspondence Analysis and for ordinal variables are under investigation. Simple Non Symmetrical Correspondence Analysis 217 References D ’AMBRA, L. and LAURO, N.C. (1989): Non symmetrical analysis of three-way contingency tables. In: R. Coppi and S. Bolasco (Eds.): Multiway Data Analysis. North Hol- land, 301–314. LIGHT, R. J. and MARGOLIN, B. H. (1971): An analysis of variance for categorical data. Journal of the American Statistical Association, 66, 534–544. GERVINI, D. and ROUSSON, V. (2004): Criteria for evaluating dimension-reducing components for multivariate data. The American Statistician, 58, 72–76. GOODMAN, L. A. and KRUSKAL, W. H. (1954): Measures of association for cross- classifications. Journal of the American Statistical Association, 49, 732–7644. PERRIN, E. (1904): On the Contingency Between Occupation in the Case of Fathers and Sons. Biometrika, 3, 4, 467–469. ROUSSON, V. and GASSER, Th. (2004): Simple component analysis. Applied Statistics, 53, 539–555. TENENHAUS, M. and YOUNG, F.W., (1985): An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling and other methods for quantifying categorical multivariate data. Psychometrika, 50, 90–104. A Comparative Study on Polyphonic Musical Time Series Using MCMC Methods Katrin Sommer and Claus Weihs Lehrstuhl für Computergestützte Statistik, Universität Dortmund, 44221 Dortmund, Germany sommer@statistik.uni-dortmund.de Abstract. A general harmonic model for pitch tracking of polyphonic musical time series will be introduced. Based on a model of Davy and Godsill (2002) the fundamental frequencies of polyphonic sound are estimated simultaneously. For an improvement of these results a preprocessing step was be implemented to build an extended polyphonic model. All methods are applied on real audio data from the McGill University Master Samples (Opolko and Wapnick (1987)). 1 Introduction The automatic transcription of musical time series data is a wide research domain. There are many methods for the pitch tracking of monophonic sound (e.g. Weihs and Ligges (2006)). More difficult is the distinction of polyphonic sound because of the properties of the time series of musical sound. In this research paper we describe a general harmonic model for polyphonic musical time series data, based on a model of Davy and Godsill (2002). After trans- forming this model to an hierarchical bayes model the fundamental frequencies of this data can be estimated with MCMC methods. Then we consider a preprocessing step to improve the results. For this, we intro- duce the design of an alphabet of artificial tones. After that we apply the polyphonic model to real audio data from the McGill Uni- versity Master Samples (Opolko and Wapnick (1987)). We demonstrate the building of an alphabet on real audio data and present the results of utilising such an alphabet. Further, we show first results of combining the preprocessing step and the MCMC methods. Finally the results are discussed and an outlook to future work is given. 2 Polyphonic model In this section the harmonic polyphonic model will be introduced and its components will be illustrated. The model is based on the model of Davy and Godsill (2002) and has the following structure: 286 Katrin Sommer and Claus Weihs 0.000 0.005 0.010 0.015 0.020 time (in sec.) Ŧ10 1 amplitude Fig. 1. Illustration of the modelling with basis functions. Modelling time-variant amplitudes of a real audio signal y t = K  k=1 H  h=1 I  i=0 I t,i  a k,h,i cos(2Shf k / f s t)+b k,h,i sin(2Shf k / f s t)  + H t , The number of observations of the audio signal y t is T, t ∈{0, ,T −1}. Each signal is normalized to [−1,1] since the absolute overall loudness of different recordings is not relevant. The signal y t is made up of K tones each composed out of harmonics from H k partial tones. In this paper the number of tones K is assumed to be known. The first partial of the k-th tone is the fundamental frequency f k , the other H k −1 partials are called overtones. Further, f s is the sampling rate. To reduce the number of parameters to be estimated, the amplitudes a k,h,t and b k,h,t of the k−th tone and the h-th partial tone at each timepoint t are modelled with I + 1 basis functions. The basis functions I t,i are equally spaced hanning windows with 50% overlap: I t,i := cos 2 [S(t −i')/(2')]1 [(i−1)',(i+1)'] (t), ' =(T −1)/I. So the a k,h,i and b k,h,i are the amplitudes of the k-th tone, the h-th partial tone and the i-th basis function. Finally, H t is the model error. Figure 1 shows the necessity of using basis functions and thus modelling time- variant amplitudes. In the figure the points are the observations of the real signal. The assumption of constant amplitudes over time cannot depict the higher amplitudes at the beginning of the tone (black line). Modelling with time-variant amplitudes (grey line) leads to better results. The model can be written as a hierarchical bayes model. The estimation of the parameters results from stochastic search for the best coefficients in a given region with different prior distributions. The region and the probabilities are specified by distributions. This leads to the implementation of MCMC methods (Gilks et al. (1996)). Polyphonic Musical Time Series 287 For the sampling of the fundamental frequency f k variants of the Metropolis- Hastings-Algorithm are used where the candidate frequencies are generated in different ways. In the first variant the candidate for the fundamental frequency is sampled from a uniform distribution in the range of the possible frequencies. In the second variant the new candidate for the fundamental frequency is the half or the double frequency of the actual fundamental frequency. In the third variant a random walk is used which allows small changes of the fundamental frequency f k to get a more precise result. For the determination of the number of partial tones H k a reversible jump MCMC was implemented. In each iteration of the MCMC-computation one of these algorithms is chosen with a distinct probability. The parameters of the amplitude a k,h,i and b k,h,i are computed conditional on the fundamental frequency f k and the number of partial tones H k . There is no full generation of the posterior distributions due to the computational burden. Instead we use a stopping criterion to stop the iterations if the slope of the model error is no longer significant (Sommer and Weihs (2006)). 3 Extended polyphonic model An extented polyphonic model with an additional preprocessing step to the MCMC- algorithms will be established in this section. The results of this step could be the starting values for the MCMC algorithm in order to improve the results. For this purpose we constructed an alphabet of artificial tones. These artificial tones are compared with the audio data to be analysed. The artificial tones are composed by evaluating the periodograms of seven time intervals with 512 observations of a real audio signal with 50% overlap. So a time interval of 2048 observations is re- garded. At a sampling rate of 11 025 Hz a time interval of 0.186 seconds is observed. These seven periodograms are averaged to a mean periodogram. For better com- parability all values in this periodogram are set to zero which are smaller than one percent of the maximum peak. All artificial tones together form the alphabet. In figure 2 (upper part) a periodogram of a c4 (262 Hz) played by an electric guitar can be seen. The lower part of figure 2 shows the small values of the periodogram. The horizontal line reflects the value of one percent of the maximum value of the periodogram. All values below this line are set to zero in the alphabet. To determine the correct notes, every combination of two artificial tones of the alphabet is matched to the periodogram of the real audio signal. The modified periodograms of the two artificial tones are summed up to one periodograms. These periodograms are compared with the audio signal. The notes corresponding to the two artificial tones which cause minimal error are considered as estimates for the true notes. Finally, voting over ten time intervals leads to the estimation of the fundamental frequencies. 288 Katrin Sommer and Claus Weihs 0 500 1000 1500 200 0 0.00 0.10 0.20 0.30 Frequency normalized periodogram 0 500 1000 1500 200 0 0.000 0.004 0.008 Frequency normalized periodogram Fig. 2. Periodogram of note c4 played with an electric guitar. Original (upper part) and zoomed in with cut-off line (lower part) 4 Results In this section results of estimating the fundamental frequencies of real audio data will be figured out. First, the data used in our studies will be introduced. Then first results are shown. Further the construction of an alphabet will be reconsidered and then the results based on this alphabet are depicted. Finally additional results are shown. 4.1 Data The data used for our monophonic and polyphonic studies are real audio data from the McGill University Master Samples (Opolko and Wapnick (1987)). We chose 5 instruments (electric guitar, piano, violin, flute and trumpet) each with 5 notes (262, [...]... pian trum viol c4–c4 c4–e4 c4–g4 c4–a4 c4–c5 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1∗ 1 1∗ 1 1 1 instrument notes flu guit pian trum viol c4–c4 c4–e4 c4–g4 c4–a4 c4–c5 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1∗ 1 1 1 1 1∗ 1 1∗ 1 1 1 4.3 Results with extended polyphonic model In a first study with an alphabet of artificial tones we used 30 notes from g3 (19 6 Hz) to c6 (1 047 Hz) of the same five instruments as for... 1 1 if both notes were correctly identified, 0 otherwise The left hand table requires the exact note to be estimated, the right table also counts octaves of the note as correct instrument notes flu guit pian trum viol c4–c4 c4–e4 c4–g4 c4–a4 c4–c5 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 1 instrument notes flu guit pian trum viol c4–c4 c4–e4 c4–g4 c4–a4 c4–c5 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1... Springer-Verlag, New-York RABINER, L and JUANG, B H (19 93): Fundamentals of Speech Recognition Prentice-Hall SCHERER, K R and OSHINSKY J S (19 77): Cue utilization in emotion attribution from auditory stimuli Motivation and Emotion, 1- 4, 3 31 346 SLANEY, M (19 98): Auditory Toolbox Version 2 Technical Report 19 98- 010 , Interval Research Corporation TOIVIAINEN, P and KRUMHANSL, C (2003): Measuring and modeling real-time... induction, Perception, 32-6, 7 41 766 TOIVIAINEN, P and SNYDER J S (2003): Tapping to Bach: Resonance-based modeling of pulse Music Perception, 21( 1), 43–80 TZANETAKIS, G and COOK, P (19 99): Multifeature audio segmentation for browsing and annotation Proceedings of the 19 99 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics New-York TZANETAKIS, G and COOK, P (2002): Musical genre... (2002): Musical genre classification of audio signals IEEE Transactions on Speech and Audio Processing, 10 (5), 293Ð302 VESANTO, J (19 99): Self-organizing map in Matlab: the SOM Toolbox Proceedings of the Matlab DSP Conference 19 99 Espoo, Finland,35–40 WITTEN, I H and FRANK, E (2005): Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition Morgan Kaufmann, San Francisco, 2005 A Probabilistic... 1 illustrates possible relations defined over the distance and bearing state variables in a traffic scenario Table 1 Example distance and bearing relations for a traffic scenario Relation Distances Relation Bearing angles equal(R, R ) close(R, R ) medium(R, R ) far(R, R ) [0 m, 1 m) [1 m, 5 m) [5 m, 15 m) [15 m, ) in_front_of(R, R ) right(R, R ) behind(R, R ) left(R, R ) [ 315 ◦ , 45◦ ) [45◦ , 13 5◦ ) [13 5◦... and Claus Weihs SOMMER K and WEIHS C (2007): Using MCMC as a stochastic optimization procedure for monophonic and polyphonic sound In: R Decker and H Lenz (Eds.): Advances in Data Analysis, Springer, Heidelberg, 645–652 WEIHS, C and LIGGES, U (2006): Parameter Optimization in Automatic Transcription of Music In: Spiliopoulou, M., Kruse, R., Nürnberger, A., Borgelt, C and Gaul, W (eds.): From Data and. .. Music Perception, 14 , 383– 418 JUSLIN, P N and LAUKKA, P (2003): Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin (12 9), 770- 814 KRUMHANSL, C (19 90): Cognitive Foundations of Musical Pitch Oxford University Press, New York 268 Olivier Lartillot, Petri Toiviainen and Tuomas Eerola KRUMHANSL, C and KESSLER, E J (19 82): Tracing the... c4–g4 c4–a4 c4–c5 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 330, 390, 440 and 523 Hz) out of two groups of instruments, string instruments and wind instruments The three string instruments are played in different ways, namely picked, struck and bowed The two wind instruments are a woodwind instrument and a brass instrument For polyphonic data we superimposed the oszillations of two tones The... FOOTE, J and COOPER, M (2003): Media segmentation using self-similarity decomposition In Proceedings of SPIE Storage and Retrieval for Multimedia Databases, 50 21, 16 7-75 GOMEZ, E (2006): Tonal description of polyphonic audio for music content processing INFORMS Journal on Computing, 18 -3, 294–304 JUSLIN, P N (19 97): Emotional communication in music performance: A functionalist perspective and some data . 1 1 1 1 ∗ c4–g4 1 1 1 1 1 c4–a4 1 1 1 1 1 c4–c5 1 1 0 1 ∗ 1 instrument notes flu guit pian trum viol c4–c4 0 0 1 ∗ 1 ∗ 1 c4–e4 1 1 1 1 1 ∗ c4–g4 1 1 1 1 1 c4–a4 1 1 1 1 1 c4–c5 1 1 1 ∗ 1 ∗ 1 4.3. 0 0 0 0 0 c4–a4 1 1 1 0 0 c4–c5 1 1 1 1 1 instrument notes flu guit pian trum viol c4–c4 1 1 1 1 1 c4–e4 0 1 0 1 1 c4–g4 1 1 1 1 1 c4–a4 1 1 1 0 0 c4–c5 1 1 1 1 1 330, 390, 440 and 523 Hz) out. 75 ,10 80,30 Optimality 97,40 96,60 96 ,10 96 ,10 96 ,10 81, 90 90,40 91, 00 90,50 90,00 Note: TCCS, POLCOURT and SCSCIENCE stand for “Teacher, Clerck and Civil Servant”, “Politics & Court” and

Định dạng
Số trang	25
Dung lượng	715,38 KB