Utilizing EEG signal in music information retrieval

UTILIZING EEG SIGNAL IN MUSIC INFORMATION RETRIEVAL ZHAO WEI B.Sc. OF ENGINEERING UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA 2006 A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF COMPUTER SCIENCE SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE 2010 Abstract Despite significant progresses in the field of music information retrieval (MIR), grand challenges such as the intention gap and the semantic gap still exist. Inspired by the current successes in the Brain Computer Interface (BCI), how to utilize electroencephalography (EEG) signal to solve the problems of MIR is investigated in this thesis. Two scenarios are discussed respectively: EEG-based music emotion annotation and EEG-based domain specific music recommendation. The former project addresses the problem that how to classify music clips to different emotion categories based on audiences’ EEG signal when they listen to the music. The latter project presents an approach to analysis sleep quality from EEG signal as a component of an EEG-based music recommendation system which recommends music according to the user’s sleep quality. i Acknowledgement This thesis would not have been possible without the support of many people. I wish to express my greatest gratitude to my supervisor, Dr. Wang Ye who offered valuable support and guidance since I started my study in School of Computing. I also owe my gratitude to Dr. Tan from Singapore General Hospital for her professional suggestions about music therapy, to Ms. Shi Dongxia of National University Hospital for her generous help in annotating the sleep EEG data. I would like to thanks Wang Xinxi, Li Bo and Anuja for their assistance and help in the system implementation of my work. Special thanks also to all participants involved in the EEG experiments: Ye Ning, Zhang Binjun, Lu Huanhuan, Zhao Yang, Zhou Yinsheng, Shen Zhijie, Xiang Qiaoliang, Ai Zhongkai, et al. I am deeply grateful to my beloved families, for their consistent support and endless love. To support my research, my wife even has attached electrodes on her scalp during sleep for a week. Without the support of those people, I would not be able to finish this thesis. Thanks you so much! ii Contents Abstract i Acknowledgement ii Contents iii List of Publications v List of Figures vi List of Tables vii 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 2 EEG-based Music Emotion Annotation System 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Emotion Recognition in Affective Computing . . . . . . 2.3 Physiology-based Emotion Recognition . . . . . . . . . 2.3.1 General Structure . . . . . . . . . . . . . . . . . 2.3.2 Emotion Induction . . . . . . . . . . . . . . . . 2.3.3 Data Acquisition . . . . . . . . . . . . . . . . . 2.3.4 Feature Extraction and Classification . . . . . . 2.4 A Real-Time Music-evoked Emotion Detection System 2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . 2.4.2 System Architecture . . . . . . . . . . . . . . . 2.4.3 Demonstration . . . . . . . . . . . . . . . . . . 2.5 Current Challenges and Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 4 4 6 7 8 9 10 14 16 16 18 24 26 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 29 iii CONTENTS 3.1 3.2 3.3 3.4 3.5 Introduction . . . . . . . . . . . . . . . . 3.1.1 Music Recommendation according 3.1.2 Normal Sleep Physiology . . . . . 3.1.3 Paper Objectives . . . . . . . . . 3.1.4 Organization of the Thesis . . . . Literature review . . . . . . . . . . . . . 3.2.1 Manual PSG analysis . . . . . . . 3.2.2 Computerized PSG Analysis . . . Methodology . . . . . . . . . . . . . . . 3.3.1 Feature Extraction . . . . . . . . 3.3.2 Classification . . . . . . . . . . . 3.3.3 Post Processing . . . . . . . . . . Experiment Results . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . iv . . . . . to Sleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 32 33 34 35 37 37 37 39 40 41 42 45 50 4 Conclusion and Future work 51 4.1 Content-based Music Similarity Measurement . . . . . . . . . . . . 53 Bibliography 58 List of Publications Automated Sleep Quality Measurement using EEG signal-First Step Towards a Domain Specific Music Recommendation System, Wei Zhao, Xinxi Wang and Ye Wang, ACM Multimedia International Conference (ACM MM), 25-29th October 2010, Firenze, Italy. v List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 Recognize Musical Emotion from Acoustic Features of Music Recognize Musical Emotion from Audience’s EEG Signal . . System Architecture . . . . . . . . . . . . . . . . . . . . . . Human Nervous System . . . . . . . . . . . . . . . . . . . . EEG signal Acquisition Experiments . . . . . . . . . . . . . Physiology-based Music-evoked Emotion Detection System . Electrode Position in the 10/20 International System . . . . Feature Extraction and Classification Module . . . . . . . . Music Game Module . . . . . . . . . . . . . . . . . . . . . . 3D Visualization Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 8 12 15 19 21 22 23 25 3.1 3.2 3.3 3.4 3.5 3.6 Physiology-based Music Rating Component . . . . . . . Typical Sleep Cycles . . . . . . . . . . . . . . . . . . . . Traditional PSG system with Three Physiological Signals Band Power Features and Sleep Stages . . . . . . . . . . Position of Fpz and Cz in the 10/20 System . . . . . . . Experiment Over the Recording, st7052j0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 34 36 41 46 49 4.1 Content-based Music Recommendation Component . . . . . . . . . 54 vi . . . . . . . . . . . . List of Tables 2.1 2.2 2.3 Targeted Emotion and Associated Stimuli . . . . . . . . . . . . . . Physiological Signals related to Emotion . . . . . . . . . . . . . . . Extracted Feature and Classification Algorithm . . . . . . . . . . . 11 14 17 3.1 3.2 3.3 3.4 3.5 3.6 Accuracy of SVM Classifier in 10-fold Cross-validation Confusion Matrix on st7022j0 . . . . . . . . . . . . . . Confusion Matrix on st7052j0 . . . . . . . . . . . . . . Confusion Matrix on st7121j0 . . . . . . . . . . . . . . Confusion Matrix on st7132j0 . . . . . . . . . . . . . . Accuracy of SVM and SVM with Post-processing . . . 47 47 47 48 48 48 vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction 1.1 Motivation With the rapid development of digital music industry, music information retrieval (MIR) has received much attention in last decades. Over years of development, however, critical problems still remain, such as the intention gap between users and systems and the semantic gap between low-level features and high-level music semantics. These problems significantly influence the performance of current MIR systems. User feedback plays a important role in Information Retrieval (IR) systems. It has been presented as an efficient method to improve the performance of IR system by conducting relevance assessment [1]. This technique is also useful for MIR systems. Recently, physiological signal was presented to be new approach 1 Chapter 1 Introduction 2 to continuously collects reliable information from users without interrupt towards them [2]. However, physiological signals have received few attentions in the MIR community. For the last two years, I have been conducting research about Electroencephalography (EEG) signal analysis and its applications in MIR. My decision to choose this topic is also inspired by the successful stories of Brain Computer Interface (BCI) [3]. Two years ago, I was surprised by the amazing applications of BCI technology such as P300 speller [4], and motor-imaginary-controlled robot [5], etc. At that time I came up the idea that utilizing EEG signal in traditional MIR systems. I have been trying to find a scenario where EEG signal can be integrated in a MIR system. So far two projects have been conducted: EEG-based musical emotion recognition and EEG-assisted music recommendation system. The first project is musical emotion recognition from audience’s EEG feedback. Music emotion recognition is an important but challenging task in music information retrieval. Due to the well-known semantic gap problem, musical emotion cannot be accurately recognized from the low level features extracted from music items. Consequently I try to recognize musical emotion from Audience’s EEG signal instead of music item. An online system was built to demonstrate this concept. Audience’s EEG signal is captured when s/he listens to the music items. Then Alpha frontal power feature is extracted from EEG signal. A SVM classifier is used to classify each music items in three musical emotions: happy, sad, and peaceful. In the second project, an EEG-assisted music recommendation system is proposed. This work addresses a healthcare scenario, music therapy, that utilizing Chapter 1 Introduction 3 music to heal people who suffer from sleep disorders. Music therapy research has indicated that music does have beneficial effects on sleep. During the process of music therapy, people are asked to listen to a list of music, which is pre-selected by music therapist. In spite of its clear benefits towards sleep quality, current approach is difficult to be widely used because it is a time consuming task for music therapist to produce a personalized music list. Based on this observation, an EEG-assisted music recommendation system was proposed, which automatically recommends music for user according to his sleep quality estimated from EEG signal. As the first attempt, how to measure sleep quality from EEG signal is investigated. This work was recently selected for poster presentation in ACM Multimedia 2010. 1.2 Organization of the Thesis The thesis is organized as follows. EEG-based Music Emotion Annotation System is presented in detail in Chapter 2. Chapter 3 discuss the EEG-assisted music recommendation system. Future work and Perspective are summarized in Chapter 4. Chapter 2 EEG-based Music Emotion Annotation System 2.1 Introduction Like genre and culture, emotion is one important factor of music which has attracted much attention in MIR community. Musical emotion recognition is usually regarded as a classification problem in earlier studies. To recognize the emotion of one music clip, low-level features are extracted and fed into a classifier which is trained based on labeled music clips [6], as presented in Figure 2.1. Due to the semantic gap problem, low-level features, such as MFCC, cannot reliably describe the high level factors of music. In this chapter I explore an alternative approach which recognizes music emotion from human’s physiological signal instead of low- 4 Chapter 2 EEG-based Music Emotion Annotation System 5 Figure 2.1: Recognize Musical Emotion from Acoustic Features of Music Figure 2.2: Recognize Musical Emotion from Audience’s EEG Signal level feature of music item, as described in Figure 2.2. A physiology-based music emotion annotation approach is investigated in this part. The research problem is how to recognize human’s perceptive emotion from physiology signal while he or she listen to emotional music. As human emotion detection was first emphasized in the affective computing community [7], we briefly introduce affective computing in Chapter 2.2. A survey about emotion detection from physiology signal is given in Chapter 2.3. Our research prototype, a online music-evoked emotion detection system, is presented in Chapter 2.4. The challenge and perspective are discussed in Chapter 2.5. Chapter 2 EEG-based Music Emotion Annotation System 2.2 6 Emotion Recognition in Affective Computing Emotion is regarded as a complex mental and physiological state associated with a large amount of feeling and thought. When human communicate with each other, their behavior considerably depends on their emotion state. Different emotion states, happy, sad and disgust always influence the decision of human and the efficiency of the communication. To efficiently cooperate with others, people need to take account of this subjective factor of human, the emotion. For example, a salesman talks with many people every day. To promote his product, he has to adjust his communication strategy in accordance with the respondent emotion of consumers. The implication is clear to all of us: emotion plays a key role in our daily communication. Since human is subject to their emotion states, the efficiency of communication between human and machine is also affected by the user’s emotion. Obviously it is beneficial that if the machine can response differently according to the user’s emotion, as what a salesman has to do. There is no doubt that taking account of human emotion can considerably improve the performance for human machine interaction [7, 8, 9]. But so far few emotion-sensitive systems have been built. The problem behind this is that emotion is generated by the mental activity which is hidden in our brain. Because of the ambiguous definition of emotion, it is difficult to recognize emotion fluctuation accurately. Since automated recognition of human emotion has a big impact and implies a lot of applications in Human Computer Chapter 2 EEG-based Music Emotion Annotation System 7 Interaction, it has attracted a large body of attention from researcher in computer science, psychology, and neuroscience. There are two main approaches to recognize the emotion: Physiology-based Emotion Recognition and Facial&Vocal-based Emotion Recognition. On the one hand, some researchers have obtained many results to detect emotion from facial image, human voice [10]. These face and voice signals, however, depends on human explicit and deliberately expression of emotion [11]. With the advances in sensor technology, on the other hand, physiological signal are introduced to recognize the emotion. Since emotion is results of human intelligence, it is believed that emotion can be recognized from physiology signal, which is generated by human nervous system, the source of human Intelligence [12]. In contrast with face and voice, the main advantage of physiology approach is that emotion can be analyzed from physiological signal without subject’s deliberately expression of emotion. 2.3 Physiology-based Emotion Recognition Current approaches of physiology-based emotion detection are investigated in this part. As discussed in Chapter 2.3.1, an typical emotion detection system consists of four components, emotion induction, data acquisition, feature extraction, and classification. The methods and algorithms employed in these components are respectively summarized in Chapter 2.3.2, Chapter 2.3.3, and Chapter 2.3.4. Chapter 2 EEG-based Music Emotion Annotation System 8 Figure 2.3: System Architecture 2.3.1 General Structure To detect emotion states from physiological signal, the general approach can be summarized as the answers for four questions as follow: a. What emotion states are going to be detected? b. What stimuli are used to evoke the specific emotion states? c. What physiological signals are collected while the subject obtains the stimuli? d. Given the signals, how to extract feature vector and do the classification? As described in Figure 2.3, a typical physiology-based emotion recognition system consists of four components: emotion induction module; data acquisition module; feature extraction & classification module. Each component addresses one question as given above. Emotion induction component is responsible to evoke the specific emotion by Chapter 2 EEG-based Music Emotion Annotation System 9 using the emotional stimuli. For example, emotion induction components may play back peaceful music or display a picture of traffic accident to help the subject to reach the specific emotion stage. While the subjects obtain the stimuli, data acquisition model keep on collecting the signal from subject. The sensors attached on subject’s body are used to collect physiological signal during the experiment. Different kind of sensor is used to collect the specific physiological signals such as Electroencephalography (EEG), Electromyogram (EMG), Skin conductivity response (SCR), and Blood Volume Pressure (BVP) etc. For example to collect the EEG signal, the subject is usually required to wear an electrode caps in the experiment. After several runs of experiment, many physiological signal fragments can be collected to build a signal data set. Given such a data set, the feature extraction and classification component is applied to classify EEG segment into different emotion categories. First the data set is divided into two parts: training set and testing set. Then the classifier is built based on the training data set. 2.3.2 Emotion Induction Emotion can be categorized into several basic states such as fearful, angry, sad, disgust, happy, and surprise [13]. To recognize emotion states, the emotion have to be define clearly in the beginning. The categorization of emotion varies in different papers. In our system, we recognize three emotion stages: sad, happy, and peaceful. Chapter 2 EEG-based Music Emotion Annotation System 10 Once the emotion categorization is defined, another problem arises: how to induce the subject to obtain the specific emotion states. Currently, the popular solution is to provide some emotional cues to help the subject experience the emotion. Many stimuli are presented for such purpose, such as sound clips, music item, picture and even movie clips. These stimuli can be categorized into four main types: a. Subject obtain the emotion by imagine. b. Visual stimuli. c. Audition stimuli. d. Combination of visual and audition stimuli. The emotion and stimuli presented in earlier papers are summarized in Table 2.1. 2.3.3 Data Acquisition The human nervous system can be divided into two parts: the Central Nervous System (CNS) and the Peripheral Nervous System (PNS). As described in Figure 2.4, CNS contains the majority of the nervous system and consists of brain and spinal cord. PNS extends the CNS and connect the CNS to the limbs and other organs. Human nervous system is the source of physiological signal, and thus physiology signals can be categorized into two categories: CNS-generated signals and PNS-generated signals. The details of those two kinds of physiology signals are discussed in the following part. Chapter 2 EEG-based Music Emotion Annotation System 11 Table 2.1: Targeted Emotion and Associated Stimuli Categorization of Emotion Stimuli to Evoke Emotion Authors Disgust, Happiness, Neutral Images from International Affec- [14], [15] tive Picture System (IAPS) Disgust, Happiness, Neutral (1)Images (2)Self-induced Emo- [16] tion (3)Computer Game Positive Valence v.s. High (1)Self-induced Emotion by imag- [17] Arousal; Positive Valence ining past experience (2)Images v.s. Low Arousal; Negative from IAPS (3)Sound clips from Valence v.s. High Arousal; IADS (4)the Combination of Negative Valence v.s. Low above Stimuli Arousal; Joy, Anger, Sadness, Plea- Music selected from Oscar’s [18], [19], [20] sure movie soundtracks No Emotion, Anger, Hate, Self-induced Emotion [21], [22], [23], Grief, Rove, Romantic [24] Love, Joy, Reverence Joy, Anger, Sadness, Plea- Music selected by subjects [25], [26], [27], sure [28], [29] Amusement, Contentment, Images from IAPS [30] Disgust, Fear, No emotion (Neutrality), Sadness 5 emotions on two emo- Images from IAPS [31] tional dimensions, valence and arousal Chapter 2 EEG-based Music Emotion Annotation System Figure 2.4: Human Nervous System [32] 12 Chapter 2 EEG-based Music Emotion Annotation System 13 Electromyogram (EMG) is the electric signal generated by muscle cells when these cells are active or at rest. The EMG potential usually ranges from 50 mV to 30 mV. The typical frequency of EMG is about 7-20 Hz. Because face activity is abundant and indicates human emotion, some researchers capture the EMG signal from farcical muscle, and employ it in the emotion detection system [33]. Skin conductivity response (SCR) / Galvanic Skin Response (GSR) is one of the most well studied physiological signals. It describes the change of levels of sweat in the sweat glands. SCR is generated by sympathetic nervous system (SNS) which is part of peripheral nervous system. Since SNS always becomes active while the human feel stress, SCR is also related with the emotion. Blood Volume Pressure (BVP) is the indicator of blood flaw, it measure the force of blood pushing against blood vessel. BVP is measured by a unit called Mm Hg (millimeters of mercury). Each time the heart pumps blood in the blood vessel, resulting in a peak in BVP signal. Heart rate (HR) signal can be extracted from BVP easily. BVP is also influenced by emotions and stress. Active feeling such as anger, fear or happiness always increases the value of BVP signal. Electroencephalography (EEG), the electric signal generated by neuron cells can be captured by placing electrodes on the scalp, as described in Figure 2.5. It has been proven that the difference of spectral power between left and right brain hemispheres is an indicator of the fluctuations in emotions [34]. Specifically, pleasant music causes a decrease in left frontal alpha power, whereas unpleasant music elicits a decline of right frontal alpha power. Based on this phenomenon, one feature called Asymmetric Frontal Alpha Power is extracted from EEG to Chapter 2 EEG-based Music Emotion Annotation System 14 Table 2.2: Physiological Signals related to Emotion Physiological signal Authors EEG [17], [18], [19], [20], [31] (1)EMG (2)GSR (3)Respiration (4)Blood [21], [22], volume pressure [23], [24] (1)EMG (2)ECG/EKG (3)skin conductivity [25], [26], (4)Respiration [27], [28], [29] (1)Blood volume pulse (2)EMG (3)Skin [30] Conductance Response (4)Skin Temperature (5)Respiration (1)EEG (2)GSR (3)Blood pressure (4)Respi- [15] ration (5)Temperature (1)Video recording (2)fNIRS (3)EEG [14] (4)GSR (5)Blood pressure (6)Respiration (1)EEG (2)GSR (3)Respiration (4)BVP [16] (5)Finger temperature recognize the emotion [35, 36, 37]. In spite of the physiological signals discussed above, Temperature of skin, Respiration, and functional Near-Infrared Spectroscopy (fNIRS) are also used to detect the emotion. The varieties of physiological signals, which are employed to detect emotion states in earlier works, are summarized in Table 2.2. 2.3.4 Feature Extraction and Classification To decode the emotion from physiological signals, many features have been presented. Two popular features are spectral density in frequency domain and statistical information in time domain. Chapter 2 EEG-based Music Emotion Annotation System (a) EEG Electrode Cap and EEG Amplifier (b) Experiment conducted on Zhao (c) Experiment conducted on Yi Yu Wei (d) Experiment conducted on Zhao Yang Figure 2.5: EEG signal Acquisition Experiments 15 Chapter 2 EEG-based Music Emotion Annotation System 16 EEG signals are usually divided into 5 frequency bands: delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz), and gamma (31-50 Hz). One common feature of EEG is the average spectral density on specific frequency band. Furthermore, subtract between different channel and ratio of different bands is also used as feature vectors. Besides, the signals generated by PNS just cover a small range of frequency band. Consequently, the signal such as blood pressure, respiration, and skin conductivity cannot be divided into several frequency bands. Usually time-domainbased features are extracted from these signals, such as peak rate, statistical mean, and variance. The extracted feature and classification algorithms used in previous papers are summarized in Table 2.3. 2.4 A Real-Time Music-evoked Emotion Detection System 2.4.1 Introduction Advances in sensor and computing technologies have made it possible to capture and analyze human physiological signals in different applications. These capabilities unpack a new scenario wherein the subject’s emotions evoked by external stimuli such as music can be detected and visualized in real-time. Two approaches Chapter 2 EEG-based Music Emotion Annotation System Table 2.3: Extracted Feature and Classification Algorithm Features Classification (1)averaged spectral power of 6 frequency (1)Native Bayesian classifier bands (2)wavelet coefficients (CWT) of (2)Fisher Discriminant Analyheart rate (3)the mean, variance, mini- sis mum and maximum of peripheral signals To select the best feature, several meth- (1)Naive Bayesian classifier ods are applied: (1)filter (ANOVA, Fisher (2)Discriminant Analysis based and FCBF) (2)wrapper (SFFS) fea- (3)SVM (4)Relevance Vector ture selection algorithms Machines (RVM) based on three EEG channels, Fpz and binary linear FDA (Fisher’s F3/F4, following feature are extracted: Discriminant Analysis) classi(1)alpha, beta, alpha and beta, beta/al- fier pha power (2)beta power / alpha power (1)the means of raw signal; (2)the stan- Sequential Floating Forward dard deviation of the raw signal; (3)the Search (SFFS) is used to select means of the absolute values of the first the best features from feature differences of the raw signals; (4)the space. Finally, three strategies means of the absolute values of the first are presented to do the clasdifferences of the normalized signal; (5)the sification task: (1)Sequential means of the absolute values of the second floating forward search (SFFS) differences of the raw signal; (6)the means feature selection with K-NN; of the absolute value of the second differ- (2)Fisher Projection (FP) with ence of the normalized signal. MAP classification; (3)A hybrid SFFS-FP method. Eleven feature extracted from signal (1)Fisher projection matrix (2)SFFS (3)K-NN Many methods are investigated to find the (1)linear discriminant function best features from feature space: (1)analy- (LDF) (2)k-nearest neighbors sis of variance (ANOVA) (2)sequential for- (KNN) (3)multiplayer percepward selection (SFS) (3)sequential back- tron (MLP) ward selection (SBS) (4)PCA (5)fisher projection 30 feature values are extracted from five (1)Fisher discriminant classisignals fier (2)SVM asymmetry frontal alpha power over 12 multiplayer perceptron classielectrode pairs fier (MLP) (1)asymmetry frontal alpha power over SVM 12 EEG electrode pairs (2)spectral power density of 24 EEG channels (1)asymmetry frontal alpha power over hierarchical SVM 12 EEG electrode pairs (2)spectral power density of 24 EEG channels 17 Authors [15] [16] [17] [21], [24] [22], [23] [25], [27], [29] [30] [18] [19] [20] [26], [28], Chapter 2 EEG-based Music Emotion Annotation System 18 have been identified to make use of these physiological signals in multimedia systems. First, physiological signals can be visualized continuously while the subject interacts with a multimedia system. Second, the physiological signals can be used as a control message in applications such as game. Our system is designed to combine these two approaches and to demonstrate an application scenario of real-time emotion detection. EEG signals generated as a response to the musical stimuli are captured to detect the subject emotion states. This information is then used to control a simple emotion-based music game. While the subject plays the music game, his EEG is visualized on a 3D head model which serves as a synchronized feedback for monitoring the subject. In such a case, our system provides a real-time tool to monitor the subject and can serve as a useful input for music therapists, for example. 2.4.2 System Architecture The proposed system is shown in Figure 2.6 which shows how the four modules are connected. The data acquisition and analysis modules together constitute the music-evoked emotion detection subsystem. Before providing the stimuli to evoke the subject’s emotion, the subject is asked to wear an electrode cap which consists of 40 electrode channels. Each channel captures EEG signals continuously. To perform real-time analysis, the EEG signals are collected by the signal acquisition module. The module buffers the continuous signals and feeds them as smaller EEG segments of 1s in duration, into the analysis module. The analysis module calculates the spectral power density and the frontal Chapter 2 EEG-based Music Emotion Annotation System 19 Figure 2.6: Physiology-based Music-evoked Emotion Detection System alpha power feature from each EEG segment. The frontal alpha power feature is discussed in details in following paragraphs. Using these features followed by a SVM classifier, subject emotions are classified into three states: happy, sad and peaceful. Finally, the classification result is sent to the music game module to drive the game and the spectral powers of each channel and emotions are fed into the 3D module for visualization. Data Acquisition Module To capture EEG signals, we have used products of NeuroScan, Quik-Caps and NuAmps. Quik-Caps consist of 40 electrodes located on the head of the subject, in accordance with the 10-20 system standards [38]. The electrical signals captured Chapter 2 EEG-based Music Emotion Annotation System 20 by the electrodes are amplified by NuAmps. The sampling rate of the EEG signal is 500 Hz. Since the effective frequency range of EEG is from 1 to 50 Hz, the EEG signals are first band-pass filtered to retain only components between 1 and 200 Hz. The filtered EEG signals are continuously sent from the data acquisition module to the analysis module. Frontal Alpha Power Feature The analysis module consists of two main components: feature extraction and classification. To detect the music-evoked emotion from EEG, we have used the asymmetry features commonly used in the physiological community [39]. It has been proven that the difference of spectral power between left and right brain hemispheres is an indicator of the fluctuations in emotions. Specifically, pleasant music causes a decrease in left frontal alpha power, whereas unpleasant music elicits a decline of right frontal alpha power [40]. In comparison to most existing BCI systems, we have not used any artifact rejection/removal method in our system. The rationale is that artifacts usually have very similar effects on both the electrodes, which are symmetrically located on two hemispheres. Asymmetric features are subtractions between symmetric electrode pairs thus compensating artifacts caused by eye blinking for example [40]. Since 8 selected electrodes are symmetrically located on frontal lobe in our electrode cap, 4 pairs of electrodes can be used to calculate the asymmetry features: Fp1-Fp2, F7-F8, F3-F4 and FC3-FC4. The position of these 8 electrodes is illustrated in the Figure 2.7. EEG signals are usually divided into 5 frequency bands: delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz) and gamma (31-50 Hz). The averaged differential spectral Chapter 2 EEG-based Music Emotion Annotation System 21 Figure 2.7: Electrode Position in the 10/20 International System power over the alpha band is calculated as a feature from each electrode pair. The dimension of the resulting feature vector is 4. Using this asymmetric feature vector, emotion detection becomes a multi-class classification problem. SVM Classifier In the beginning of the music game, the subject is required to listen to 3 music clips associated with 3 emotion states (happy, sad and peaceful). The evoked EEG features are used as training data to build an SVM model. This model is then used to predict the emotion state of the incoming EEG features in real-time. Libsvm Chapter 2 EEG-based Music Emotion Annotation System 22 Figure 2.8: Feature Extraction and Classification Module was used to implement the training and prediction [41]. Four-dimension feature is extracted from each 1-second EEG segment. Existing kernels in the Libsvm package are used to conduct the experiment. The classifier is trained based on a 6-minute EEG recording, which covers each emotion for 2 minutes. We noticed other offline emotion classification systems (e.g. [42]) using much higher dimension of feature vectors. Unfortunately, those offline systems cannot be simply extended to a real-time system with acceptable performance. To reduce the duration between training data collection and real-time prediction, we have implemented a simple GUI (see Figure 2.8) to make the training process more convenient and efficient. This has mitigated the performance degradation of the real-time system, although it cannot solve the problem completely. Chapter 2 EEG-based Music Emotion Annotation System 23 Figure 2.9: Music Game Module Music Game Module As in Figure 2.9, the game module has two main functions: to playback music for evoking the required emotion state and to visualize emotion states transition in real-time. The interface of the game is simple yet functional. This module, however, needs to be improved for real life applications, such as music therapy. 3D Visualization Module As Figure 2.10, the 3D visualization module displays the spectral power of each EEG channel with different colors on a 3D head model which adopted and modified Chapter 2 EEG-based Music Emotion Annotation System 24 from an open source project, Tempo [43]. The spectral energy changes of each EEG channels are displayed with different colors on a 3D head model. We believe that 3D visualization is more intuitive to human beings and this could be a useful feedback to experiment conductors. Since visual patterns are friendlier to human eyes than the decimal numbers, an intuitive illustration of the EEG changes can also be gained. By observing the classification results and the EEG energy visualized in the 3D module, we can monitor the performance of the proposed approach during the whole experiment. For example, the classifier might produce a wrong label after the subject move head slightly. Therefore, which events might influence the accuracy of proposed system can be distinguished. This kind of information could be useful to improve the proposed system in the future work. 2.4.3 Demonstration We proposed a research prototype to detect music-evoked emotion states in realtime using EEG signals synchronized with two visualization modules. We also show its potential applications such as music therapy. As a start of the project, we have re-implemented an offline system described in [42] and have achieved an accuracy of up to 93% based on k-fold cross-validation, which is similar to the reported performance. However, the accuracy drops to randomly guess (about 35% in 3-class classifi- Chapter 2 EEG-based Music Emotion Annotation System Figure 2.10: 3D Visualization Module 25 Chapter 2 EEG-based Music Emotion Annotation System 26 cation) in online prediction. The original offline system employs a 60-dimension feature extracted from whole head area. We have then modified the approach by extracting features only from the frontal lobe. Therefore, a 4-dimentions feature is extracted from 8 EEG channels of frontal lobe, as described in the Figure 2.7. With the reduced features, we have managed to improve the prediction accuracy based on our preliminary evaluations, which is discussed in details in Chapter 2.5. 2.5 Current Challenges and Perspective Many papers have been published to recognize emotion from physiological signals. To the best of our knowledge, no one has succeeded to extend their algorithm into practical application. Although the accuracy of emotion recognition comes up to 90% in the experiment of cross-validation, few works can obtain acceptable accuracy in prediction. Based on the results in our experiment, the accuracy of emotion recognition varies considerable under different validation strategy such as prediction, k-fold cross-validation, and one-leave-out cross-validation. In our preliminary work described in Chapter 2.4, to detect emotion from EEG, asymmetric frontal alpha power features are extracted from 8 EEG channel. These feature vectors are fed into a SVM classifier to do 3-class classification. Under cross-validation, the accuracy can come up to 90%, however the accuracy drops to randomly guess (35%) in the prediction. The exceedingly difference be- Chapter 2 EEG-based Music Emotion Annotation System 27 tween accuracy associated with prediction and cross-validation implies that the cross-validation might not be used correctly. EEG signal has consistency in a short time period, thus results in the high similarity between the feature vectors extracted from neighboring EEG segments. Meanwhile, the soundness of k-fold cross-validation is partially based on the independency between feature vectors. Consequently, the dependency between frontal alpha power feature results in a considerable distortion in the k-fold cross-validation (randomly select feature vector) where training feature vectors and testing feature vectors are extracted from neighboring EEG segments. Another issue which might influence the accuracy is the ground truth problem. This is equivalent to how to guarantee the subject obtain the specific emotion during experiment. Actually, many stimuli are introduced to help the subject to experience the emotion. For example, music from terrible movie, sound clips such as baby laughing, and picture of car accident are used as stimuli. However, no matter how strong stimuli we used to evoke the emotion, it is still impossible to verify the subject obtain the emotion indeed. In addition, current systems do not considerate the difference caused by stimuli. Many researchers use the similar methods to detect the emotion which is evoked by different stimuli. Different stimulus, audition or visual will induce signal in different brain area, and evoke the emotion in different way. More attention should be focus on how the motion generated in our brain, and what is the difference between the emotion stages evoked by different stimuli such as happy image and happy music. Further effort need to employ this knowledge to improve the accuracy of emotion Chapter 2 EEG-based Music Emotion Annotation System 28 detection from physiological signal. Furthermore, since physiological signal is quite ambiguous, more attention needs to be focused on how to extract feature from these ambiguous signal. Unfortunately, unlike facial image or human voice, there is no a gold standard to verify which pattern of physiological signal is good or bad. To conclude, how to accurately recognize human emotion from physiology signals and employ this technique to annotate music emotion is still an open problem. Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System With the rapid pace of modern life, millions of people suffer from sleep problems. Music therapy, as a non-medication approach to mitigating sleep problems, has attracted increasing attention recently. However the adaptability of music therapy is limited by the time-consuming task of choosing suitable music for users. Inspired by this observation, we discuss the concept of a domain specific music recommendation system, which automatically recommends music for users according to their sleep quality. The proposed system requires multidisciplinary efforts 29 Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 30 including automated sleep quality measurement and content-based music similarity measure. As a first step towards the proposed recommendation system, I focus on the automated sleep quality measurement in this Chapter. A literature survey about content-based music similarity measurement is given in Chapter 4.1. To measure sleep quality, standard Polysomnography (PSG) approach require various signals such as EEG, ECG and EMG etc. Although satisfactory accuracy can be obtained based on analysis of multiple signals, those systems might make subjects feel uncomfortable because they are asked to wear many sensors to collect the signals during sleep. The situation become evenly worse especially for the home-based health-care applications in which users’ experience is relatively more important than systems’ accuracy. To improve users’ experience, it become a important problem how to measure sleep quality from few signals while reasonable accuracy still can be achieved. Motivated by this, an EEG-based approach is proposed to measure user’s sleep quality. The advantages of the proposed approach over standard PSG method are: 1) it measures sleep quality by recognizing three sleep categories rather than six sleep stages, thus higher accuracy can be expected; 2) three sleep categories are recognized by analyzing Electroencephalography (EEG) signal only, so the user experience is improved because he is attached with fewer sensors during sleep. Based on the experiments conducted over standard data set, the proposed approach achieves high accuracy and hence shows promising potential for the music recommendation system. Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 3.1 31 Introduction People spend about one-third of their whole life on sleep. Although it is still an open question why human need sleep, there is no doubt that sleep is necessary to maintain our overall health. Deficiency of good sleep could result in severe physical and mental effects such as fatigue, tiredness, and depression. Nowadays millions of people are affected by sleep problems, many of whom remain undiagnosed. Although various medicines were developed to cure sleep problems, medicines are not recommended to use because of their negative side effects. At present music therapy offers an alternative healing method, which improves sleep quality by playing back music at bed time. With its development in 2000s, music therapy research has indicated that music does have beneficial effects on sleep for children [44], young people [45] and older adults [46]. During the process of music therapy, people are asked to listen to a list of music, which is pre-selected by a music therapist. In spite of its clear benefits towards sleep quality, current approach is difficult to be widely used because it is a time consuming task for music therapist to produce a personalized music list. Based on this observation, a domain specific music recommendation system is introduced to automatically recommends music for user according to his sleep quality. Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 32 Figure 3.1: Physiology-based Music Rating Component 3.1.1 Music Recommendation according to Sleep Quality The proposed recommendation system consists of two main components: EEGbased music rating and content-based music recommendation. After music is played back at bed time, the former component will monitor user’s sleep quality. Music items will be rated according to user’s sleep quality, as figure 3.1 shows. It is our contention that similar music pieces share the similar influence towards Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 33 sleep quality. The music items associated with good sleep quality are used as queries in the music recommendation component. The music which are similar to the query items will be recommended to the user by conducting music similarity analysis, as illustrated in figure 4.1. As the first step towards the proposed recommendation system, in this thesis, I focus on automated sleep quality measurement which is the major task in the physiology-based music rating component. 3.1.2 Normal Sleep Physiology According to current sleep analysis standard [47], normal sleep physiology consists of Non Rapid Eye Movement (NREM) and Rapid Eye Movement (REM). NREM is subdivided into four stages: stages 1 (S1), stage 2 (S2), stage 3 (S3) and stage 4 (S4). S3 and S4 are also called deep sleep. A healthy person will experience about 5 complete sleep cycles per night. [48]. S1 is the first stage in a sleep cycle. S2 is the second stage. Then S3 and S4 will occur in the cycle consecutively. After the completion of four stages of NREM, these four stages will reverse rapidly and then followed by REM, as Figure 3.2 shows. Given the sleep cycles over night, three main parameters can be calculated to measure sleep quality: sleep latency, sleep efficiency and percentage of deep sleep. Specifically, sleep latency is the time that it takes to finish the transition from wakefulness to the first sleep stage. Sleep efficiency is the ratio of time spent asleep to the time spent in bed. Percentage of deep sleep is the ratio of deep sleep to the all sleep stages. To calculate these parameters, we do not need to recognize every sleep stages in sleep cycles. It is enough if three sleep categories Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 34 Figure 3.2: Typical Sleep Cycles can be distinguished: wakefulness, deep sleep (S3,S4), and other sleep stages (S1,S2,REM). Consequently, the problem of sleep quality measurement is converted into how to recognize these three sleep categories. 3.1.3 Paper Objectives Now Polysomnography (PSG) technique is widely used in hospital to monitor patients’ sleep cycles. First developed in 1960s [49], this method, also known as sleep scoring, has become a golden standard in sleep studies. While users are sleeping, three physiological signals are monitored: Electroencephalography (EEG), Electrooculography (EOG), and Electromyography (EMG). Based on the analysis of these three signals, six sleep stages can be recognized by human expert: wakefulness, S1, S2, S3, S4 and RAM [50]. These stages are scored based on 30-second epoch. The standard PSG approach utilizes 9 sensors to monitor the required signals: Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 35 EEG, EOG and EMG [51]. Although satisfactory accuracy can be obtained based on analysis of multiple signals, standard PSG systems might make subjects feel uncomfortable because they are asked to wear many sensors to collect the signals during sleep, as described in Figure 3.3. The situation become evenly worse especially for the home-based health-care applications in which users’ experience is relatively more important than systems’ accuracy. To improve the user experience, we use fewer signal channels to recognize sleep stages. As discussed above, to calculate sleep quality parameters, we just need to recognize three categories: wakefulness, deep sleep and other stages. On the account of the fact that EOG and EMG are mainly used to distinguish between REM and S1, these two signals may not contribute much in the calculation of sleep quality. Consequently, we recognize the three sleep categories from EEG signal only. 3.1.4 Organization of the Thesis Based on the analysis of current PSG techniques in Section 3.2, we proposed to recognize the three sleep categories from EEG signal only. The details of our approach is presented in Section 3.3. Section 3.4 discuses the experiment results. Conclusions are given in Section 3.5. Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System (a) PSG experiment for Adults [52] (b) PSG experiment for Children [53] Figure 3.3: Traditional PSG system with Three Physiological Signals 36 Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 3.2 3.2.1 37 Literature review Manual PSG analysis In 1968, Rechtschaffen and Kales first proposed a standard method, which called R&K scoring system [49], to recognize human’s sleep stages by manual. Over 40 years of development, the manual scoring system has been modified and improved [54, 55, 56], especially for the recording procedure [51], signal amplifier [57] and artifact issue [58]. Actually R&K method is still the most popular approach in clinic practice, nowadays. 3.2.2 Computerized PSG Analysis Intense research has been conducted on computerized PSG analysis, that is how to automated recognize sleep stages from physiological signals. Two nice survey papers were published about computerized PSG analysis [59, 60]. Based on the literature, automated PSG analysis can achieve reasonable accuracy, compared with the accuracy of human scoring. However its performance is influenced by many factors, such as subject selection, electrode application and recording quality. Current automated scoring systems can be mainly categorized into two families: rule-based expert system and classifier-based system. Intuitively, rule-based expert system imitates the process of human scoring by considering domain knowledge such as R&K scoring method. Thus sleep stages is recognized by the expert system in the similar way as it is done by human expert. Alternatively, classifier-based Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 38 system treats the sleep scoring as a normal classification problem. The typical works of computerized PSG system are discussed in detail as below. Based on the survey in literature, most of existing works intend to provide a service of sleep scoring for clinic application. The main differences between our approach and these works are that our approach is specifically designed for a multimedia application rather than clinic one. In particular, we balance the user experience and system accuracy, a EEG-based approach is proposed to recognize three sleep categories. Rule-based Expert System Peter Anderer et al. [61, 62] developed an expert system, which is called Somnolyzer 24 X 7, to recognize six sleep stages (W,S1,S2,S3,S4 and REM) from 30-second epoch of three signals (EEG, EMG and EOG). The Somnolyzer 24 X 7 system consists of two main components: feature generator and rule-based expert system. First, feature vector are extracted from EEG, EMG and EOG respectively. EEG signal is filtered into five frequency bands: delta (0.5-2 Hz), theta (2-7 Hz), alpha (7-12 Hz), beta (12-20 Hz) and fast beta(20-40 Hz). Then density, mean amplitude, mean frequency and variability are calculated from each 30-second epoch in different frequency bands as well as the full-frequency band (0.5-40 Hz). For the EMG signal, squared amplitude is calculated from each 1-second epoch, then the minimum, the maximum and the mean of squared amplitude are determined for 30-second epoch of EMG. Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 39 A decision tree is built based on 10 linear discriminant analysis classifiers. The 80% agreement between expert system scoring and human scoring was achieved. classifier-based system The commercial package, BioSleep of Oxford BioSignals Company, is a typical classifier-based system [63]. Auto regressive coefficients are extracted as feature vectors from EEG segment, then a neural network classifier is used to classify each EEG segment into different sleep stages. The BioSleep package obtains reasonable results with the comparison of human scoring in the third-part evaluation [64]. Other methods In spite of expert system and classifier approaches, additional solutions have been proposed also. such as, fuzzy reasoning [65], higher order statistical analysis [66], hidden markov model [67]. 3.3 Methodology To measure sleep quality, we need to recognize three sleep stage categories: wakefulness, deep sleep, and other sleep stages. To recognize these three sleep categories, we extract spectral power feature from each 30-seconds EEG epoch. LibSVM package [41] is used to build a SVM classifier to classify each EEG epoch into one Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 40 of the three categories. Additionally, as discussed in Section 3.1, the transition of sleep stages follows the trend of sleep cycles. For example, if current epoch belongs to deep sleep, then the next epoch probably also belongs to deep sleep. However, SVM classifier treats each 30-second epoch independently, thus it does not take the advantage of correlations between nearby epochs. Consequently, we further model the sleep transition as a Markov chain. A matrix will be learned from training data to indicate the transition probability between different sleep stages, as Figure 3.6 illustrates. Based on this matrix the classification results are further refined. The probability estimated by SVM classifier for each epoch is processed by a dynamic programming algorithm. 3.3.1 Feature Extraction Band power features EEG is the summation of electrical signal generated by millions of neurons in the brain. It was first recorded by Richard in 1875 [68], now has been widely used in PSG studies. EEG signal is usually divided into five different frequency bands: delta (0.5-2 Hz), theta (2-7 Hz), alpha (8-12 Hz), beta (12-20 Hz) and fast beta (20-40 Hz). According to previous studies, spectral powers of EEG in different frequency bands highly relate to sleep stages. In particular, some sleep stages are recognized by the presence of EEG signal in the specific frequency band. Spectral power in specific frequency band of EEG is a well-known pattern in PSG studies [50]. For example, deep sleep is recognized by the present of high amplitude Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 41 (a) Band Power Features extracted from 30-second EEG epoch (b) Sleep Stages annotated by Human Expert Figure 3.4: Band Power Features and Sleep Stages delta wave of EEG. One epoch will be scored as wakefulness when alpha wave presents in EEG. Consequently, we use spectral power extracted from five frequency bands as the feature vector. As sleep stage is usually scored based on 30-second epoch, feature vector is generated for each 30-second EEG epoch. 3.3.2 Classification Support Vector Machine Platt et al. [69] presented a work to convert the results of binary SVM into posterior probabilities. Later this work is extended to estimate multi-class SVM probabil- Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 42 ities by combining the output of multiple one-against-one binary classifiers [70]. With this technique, we use a multi-class SVM classifier to estimate P (x|Ft ), the probability of epoch t belonging to sleep stage x. Ft is the feature vector extracted from epoch t. We also define C(t, x) = P (x|Ft ). C(t, x) will be used in the following section. In SVM classification, epoch t is scored as the stage which could maximize the probability C(t, x), as following equation shows: ψ(t) = arg max C(t, x), x where ψ(t) is the label generated by classifier for epoch t. 3.3.3 Post Processing Sleep Stage Transition Modeling As SVM classifier treats each epoch independently, it does not utilize the correlation between epochs. To take the advantage of sequential information, we model the transition of sleep stages as a discrete time Markov chain. The effectiveness of this modeling has been proved by Gibellato [71]. The property of Markov chain can be formalized in equation 3.1. It is also called first-order Markov assumption, the stage of current epoch just depends on the stage of previous epoch. Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 43 P r(Xt = xt |Xt−1 = xt−1 , Xt−2 = xt−2 , . . . , X1 = x1 ) = P r(Xt = xt |Xt−1 = xt−1 ) = Pxt−1 xt , (3.1) where Xt indicates the sleep stage of epoch t; Pxt−1 xt indicates the transition probability from stage xt−1 to stage xt . As x has three possible stages, there are nine possible values for Pxt−1 xt . These nine values construct a 3 by 3 matrix which indicates the transition probability between three stages. This matrix is learned from training data as follows: Count(i, j) Pij = ∑3 k=1 Count(i, k) where Count(i, j) counts the transitions from stage i to stage j. The calculation of this matrix is also illustrated in Figure 3.6. Dynamic Programming for Post-processing Based on the probability generated by SVM and the transition matrix learned from training data, a dynamic programming (DP) algorithm is designed to score each epoch in the way that optimum overall posterior probability can be obtained. The subproblem of dynamic programming is defined in Equation 3.2, which presents the maximum overall probability from the first epoch to current epoch t, where Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 44 current epoch is scored as stage v. L(t, v) = max x1 ,··· ,xt−1 { P r(X1 = x1 , · · · , Xt−1 = xt−1 , Xt = v) t ∏ } C(i, Xi ) (3.2) i=1 Because first-order Markov assumption is hold, Equation 3.2 can be simplified as follows: L(t, v) = max {L(t − 1, u)P r(Xt = v|Xt−1 = u)} C(t, v) u Thus L(t, v) can be calculated on top of L(t − 1, ·), and this forms an optimal substructure for the subproblem. Therefore, a dynamic programming algorithm can be designed to find the optimum solution. To eliminate the float point overflow problem, logarithm probability is used: S(t, v) = ln L(t, v). For the first epoch, we define S(1, v) = ln C(1, v).    ln C(1, v) t=1 S(t, v) =   maxu {S(t − 1, u) + ln Puv } + ln C(t, v) t > 1 In backtracing, variable α(t, v) is defined to record the optimum sleep stage of previous epoch. α(t, v) = arg max{S(t − 1, u) + ln Puv + ln C(t, v)} u Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 45 The process of backtracing is shown in the following formulation:    α(t + 1, ϕ(t + 1)) t < n ϕ(t) =   arg maxv S(n, v) t = n where ϕ is the refined classification labels which result in the maximum overall posterior probability. 3.4 Experiment Results The automated sleep quality measurement consists of two phases. First, SVM classifier categorizes EEG epoch into one of three classes: wakefulness, deep sleep, and other sleep stages. Second, a dynamic programming algorithm is used to refine the results of SVM by utilizing the sequential information. Two experiments are respectively conducted to evaluate the classifier and the post-process algorithm. Experiments are conducted based on the Sleep EDF Database, which is selected from the PhysioBank that is a large achieve of digital recording for biomedical research community [72]. Four recordings which were obtained from healthy subjects in the hospital are used: st7022j0, st7052j0, st7121j0, and st7132j0. Three signals (EEG, EOG and EMG) were monitored during night with good signal quality. Only the EEG channel (Fpz-Cz) is utilized in our experiment. The channels position of Fpz and Cz is illustrated in the Figure 3.5. This data set is annotated by human expert according to PSG standard [49]. Human annotation is used as the ground truth in our experiments. Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System Figure 3.5: Position of Fpz and Cz in the 10/20 System 46 Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 47 Table 3.1: Accuracy of SVM Classifier in 10-fold Cross-validation Recording st7022j0 st7052j0 st7121j0 st7132j0 SVM 88.4% 93.9% 90.9% 92.7% Table 3.2: Confusion Matrix on st7022j0 Deep Others Wake Deep 238 45 1 Others 26 545 15 Wake 0 22 53 Sleep Stage Classification In the first experiment, we evaluate the accuracy of SVM classifier. Four sleep recordings of Sleep EDF Database are divided into 30-second epochs, and 3874 epochs are generated in total. The spectral power feature is extracted from each epoch. 10-fold cross-validation is conducted based on the band power feature vectors of each recording. The classifier achieves the average accuracy up to 93.9% at frame level, as described in Table 3.1. The confusion matrix of SVM are showed in table 3.2, table 3.3, table 3.4 and table 3.5. Post Processing In the second experiment, we evaluate the performance of DP algorithm. As DP algorithm utilizes sequential information, it processes on continuous epochs. Table 3.3: Confusion Matrix on st7052j0 Deep Others Wake Deep 158 18 4 5 724 14 Others Wake 0 22 105 Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 48 Table 3.4: Confusion Matrix on st7121j0 Deep Others Wake Deep 172 30 1 Others 26 718 9 Wake 1 26 43 Table 3.5: Confusion Matrix on st7132j0 Deep Others Wake Deep 91 10 2 Others 22 664 3 Wake 0 25 35 Consequently, each recording is divided into two continuous parts, one for training and one for testing. The first 400 epochs of each recording are used to train the SVM model and learn the transition matrix. The rest epochs are testing data. The experiment conducted on one recording is demonstrated in Figure 3.6. Based on the experiment results showed in Table 3.6, DP algorithm stably improves the accuracy in each recording. Comparing the sleep cycle generated by classifier and DP algorithm, in spite of the improvement in accuracy, DP algorithm also generates a more smooth sleep cycle than the SVM classifier, as Figure 3.6 shows. In the early stage of this work, the proposed approach is evaluated based on a data set of four EEG recordings. Due to limitation of the size of data set, training and testing are conducted on the same subject in the same session. To evaluate the generalization of the proposed approach, more recording should be collected and further evaluation should be done in the future work. Table 3.6: Accuracy of SVM and SVM with Post-processing Recording st7022j0 st7052j0 st7121j0 st7132j0 SVM 87.5% 92.7% 93.4% 94.2% SVM+DP 89.3% 95.8% 96.8% 99.3% Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System Figure 3.6: Experiment Over the Recording, st7052j0 49 Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step Towards a Domain Specific Music Recommendation System 3.5 50 Conclusions In this thesis, we discussed the concept of a domain specific music recommendation system, which automatically recommends music for users according to their sleep quality. One important problem in the proposed recommendation system is how to automatically monitor users’ sleep quality at night. To address this problem, we first investigate sleep physiology and traditional PSG approach in literature. Considering that standard PSG system may make users feel uncomfortable, we specifically design an approach to recognize three sleep categories from EEG signal only. Then three parameters, sleep latency, sleep efficiency and percentage of deep sleep, can be calculated to measure sleep quality. The experiment results demonstrates that our approach can achieve high accuracy though only the EEG channel is used. These results motivate us to further develop the content-based music recommendation components and evaluate the full functions of EEG-based music recommendation system in the future. As the future work, content-based music similarity, which is the major task in content-based music recommendation system, is discussed in Chapter 4. Chapter 4 Conclusion and Future work How to utilize EEG signal in music retrieval systems is investigated in this thesis. Two projects are conducted: EEG-based music emotion annotation and EEGbased music recommendation. In the first project, an online system is built to recognize audience’s musicevoked emotion from EEG signal. EEG signal is recorded and processed while subject listens to music clips. Frontal alpha power feature is extracted from 4 pairs of EEG channels: Fp1-Fp2, F7-F8, F3-F4, and FC3-FC4. A SVM classifier is built to classify the feature vector into three emotion categories: happy, sad, and peaceful. The proposed approach achieves accuracy around 90% in 10-fold crossvalidation (randomly select feature vector), and 35% in the prediction. To the best of our knowledge, a few works also present the accuracy up to 90% in the crossvalidation, but nobody achieve reasonable accuracy in the prediction. As discussed in Chapter 2.5, two conclusions can be derived from this observation. First, special 51 Chapter 4 Conclusion and Future work 52 attention need to be paid on the evaluation of EEG-related system, because the dependency between feature vectors extracted from neighboring EEG segments can result in considerable distortion in the k-fold cross-validation. Second, how to recognize human emotion from EEG signal is still an open question because nobody can achieve reasonable accuracy in the prediction. In the second project, a technique is presented to analysis sleep qualify from EEG signal, as a component of an EEG-based music recommendation system. The system consists of two components, EEG-based music rating and contentbased music recommendation. Band Power Feature is extracted from one EEG channel Fpz-Cz to recognize three sleep categories, wakefulness, deep sleep, and other stages. The proposed approach achieves accuracy up to 90% and shows huge potential for the EEG-based music rating component. The next step is to implement the content-based music recommendation component and to evaluate the full function of whole system. In the initial stage of content-based music recommendation component, a literature survey about music similarity measurement is given in Chapter 4.1. Three approaches are discussed: distance-based approaches, cluster-based approaches and model-based approaches. Based on the results reported in literatures, model-based approach produces best results and is the most competent solution for the content-based music recommendation system. Chapter 4 Conclusion and Future work 4.1 53 Content-based Music Similarity Measurement As showed in the Figure 4.1, music similarity is the major task in the content-based music recommendation component. As one of the most fundamental problems in MIR, how to measure the similarity between music pieces by analyzing content information is still a challenging research topic. Intense works have been conducted in this field. On the account of the fact that the semantics of music can be ambiguous and complex, to measure their similarity, music is first represented as a set of feature vector. Thus the music similarity measurement is converted to be how to measure the similarity of two set of feature vector in a multi-dimensional space. Therefore, music similarity measurement can be mainly divided to be two phases: feature extraction from music and similarity measure in a multi-dimensional space. In the phase of feature extraction, feature vectors are extracted from music pieces and then placed in a multi-dimensional space. Then music piece is represented by points in high-dimension space. How to extract reliable and effective feature from music piece is a traditional pattern recognition problem. So far a large amount of features were presented to describe the semantic of music or audio piece, such as temporal feature, spectral feature, rhythmic feature, and timbral feature [73]. Over years of development, many convenient tools are implemented to effectively extract those features from music such as the Marsyas package [74]. Given the feature vectors of music, many methods were proposed to measure their similarity. Those methods can be mainly categorized into three families: distance-based approach, cluster-based approach and model-based approach. Chapter 4 Conclusion and Future work Figure 4.1: Content-based Music Recommendation Component 54 Chapter 4 Conclusion and Future work 55 The most generic approach is to calculate the Euclidean distance in the raw feature space. Therefore the similarity between two music pieces is measured by the Euclidean distance between feature vectors. This approach implies that feature value in different dimension is equally important. Thus one unit of distance in X direction is equal to one unit of distance in Y direction. However, this is not always the case. Addressing on this problem, Malcolm etc. described and evaluated five approaches (whitening, LDA, NCA, LMNN and RCA) to rotate and scale the raw feature space with a linear transform [75]. To evaluate the performance of the proposed approaches, a straight-forward distance-based classifier, kNN, is employed to process a classification task in the modified space. A cluster-based approach, which regards the set of feature vectors as a entity, was proposed by Logan and Salomn [76]. The proposed method consists of two steps: frame clustering and cluster model similarity. First, feature vectors are clustered into different groups by using K-means. Then the distribution of feature vectors is regarded as a signature of music pieces. Second, the similarity is calculated by comparing the signature using the Earth Movers Distance (EMD) [77]. This approach was improved by Aucouturier and Pachet later [78, 79]. Gaussian mixture models (GMM) was introduced to cluster the feature vectors. Monte Carlo (MC) sampling is employed to measure the similarity of cluster model, which is the likelihood of samples from model A is generated by model B. This approached was implemented by Elias Pampalk [80] in an open source Package and used in a genre classification task by Pampalk etc. [81]. In spite of the distance-based approach and cluster-based approach, West and Chapter 4 Conclusion and Future work 56 Lamere [82] proposed a model-based approach to constructing music similarity. Authors stated that high-level music factors, such as genre, are also important to measure the similarity. Instead of directly calculating the distance between feature vectors, feature vectors are first classified into different categories, and then an internal profile is build to present the music. Suppose that Px = {cx0 , cx1 , ..., cxn } is the profile of music pieces x in genre dimension, where cxi indicate the probability that x is belong to genre i. The similarity Sx,y between music x and y is calculated based on the Euclidean distance of Px and Py as follows: Sx,y = 1 − n ∑ (cxi − xyi )2 (4.1) i=1 Different classification method can be used such as, LDA or SVM. The genre label can also be augmented with other music semantic factors such as mood, tempo or melody. Zhang etc. [83] extended this approach to represent one music piece as a fuzzy music semantic vectors by considering the profiles of different music dimensions (genre, mood, tempo, etc.) together. They also evaluate the extended approach in a large scale data set. The distance in different dimensions is first calculated as follows: 1∑ x (c − xyi )2 n i=1 i n Disjx,y = (4.2) Chapter 4 Conclusion and Future work 57 The distance values are scaled by a sigmoid function, and multiplied by different weight factors, wpi , then they are summed together and generalized by factors a and b, as following equations: Sx,y = a e+1 Where a = 2 e−1 and b = 2 , e−1 Np ∑ wpi j=1 1 + eDisx,y j and ∑Np j=1 −b (4.3) wpi = 1. Hence the value of Sx,y is in the range of [0,1]. As above discussion, current approaches of music similarity measurement can be mainly categorized into three families: distance-based approaches, cluster-based approaches, and model-based approaches. Based on the evaluation results reported in the literature [82, 79], model-based approaches likely produce the best performance. Consequently, model-based approach is one competent option for our content-based music recommendation system. The similar methods reported in [82, 83] can be employed in the implementation of content-based music recommendation component. Bibliography [1] Donna Harman. Relevance feedback revisited. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 1–10, New York, NY, USA, 1992. ACM. [2] Ioannis K. Ioannis Arapakis. Using facial expressions and peripheral physiological signals as implicit indicators of topical relevance. In ACM Multimedia, 2009. [3] Mikhail A. Lebedev and Miguel A. L. Nicolelis. Brain-machine interfaces: past, present and future. Trends in NeurosciencesVolume 29, Issue 9, pages 536–546, September 2006. [4] Onathan R. Wolpaw, Niels Birbaumer, Dennis J. McFarland, Gert Pfurtscheller, and Theresa M. Vaughan. Braincomputer interfaces for communication and control. Clinical Neurophysiology, Volume 113, Issue 6:767–791, 2002. [5] Leeb. Navigation in virtual environments through motor imagery. Proc. of 9th Computer Vision Winter Workshop, , eds. D. Skocaj, publ.: Slovenian 58 BIBLIOGRAPHY 59 Pattern Recognition Society, Piran, Slowenien, pages 99–108, 2004. [6] Lie L. Liu; and Hong-Jiang Zhang. Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 14:5–19, 2006. [7] Rosalind W. Picard. Affective computing. The MIT Press, Cambridge, September 1997. [8] Yashar Moshfeghi. Affective adaptive retrieval: Study of emotion in adaptive retrieval. In SIGIR 2008. [9] Joemon M. Ioannis Arapakis. Affective feedback: An investigation in the role of emotions in the information seeking process. In ACM SIGIR, 2008. [10] Zhihong Zeng, Jilin Tu, Ming Liu, Thomas S. Huang, Brian Pianfetti, Dan Roth, and Stephen Levinson. Audio-visual affect recognition. Multimedia, IEEE Transactions on, 9(2):424–428, 2007. [11] Maja P. Zhihong Zeng. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Interlligence. [12] Elisabeth A. Jonghwa Kim. Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Interlligence, 30:2067–2083, 2008. [13] Ekman, Levenson, and Friesen. Autonomic nervous system activity distinguishing among emotions. Science, 221, 1983. BIBLIOGRAPHY 60 [14] Guillame Chanel, Koray Ciftci, Javier C. Mota, Arman Savran, Luong H. Viet, Lale Akarun, Alice Caplier, Michele Rombaut, and Bulent Sankur. Emotion detection in the loop from brain signals and facial images. Workshop on Emotion-Based Agent Architectures, Third International Conference on Autonomous Agents, 2006. [15] Guillaume Chanel, Julien Kronegg, Didier Grandjean, and Thierry Pun. Emotion assessment: Arousal evaluation using eeg’s and peripheral physiological signals. Lecture Notes in Computer Science, Multimedia Content Representation, Classification and Security, 2006. [16] Guillaume Chanel. Emotion assessment for affective-computing based on brain and peripheral signals. PhD thesis, University of Geneva, Switzerland, 2009. [17] Danny O. Bos. Eeg-based emotion recognition. 2007. [18] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Multilayer perceptron for eeg signal classification during listening to emotional music. TENCON - IEEE Region 10 Conference, 2007. [19] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Support vector machine for eeg signal classification during listening to emotional music. In IEEE 10th Workshop on Multimedia Signal Processing, 2008. [20] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Eeg-based emotion recognition in music listening: A comparison of schemes for multiclass support vector machine. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’09), 2009. BIBLIOGRAPHY 61 [21] Elias Vyzas. Recognition of emotional and cognitive states using physiological data. Master’s thesis, MIT, 1999. [22] Elias Vyzas and Rosalind W. Picard. Offline and online recognition of emotion expression from physiological data. Workshop on Emotion-Based Agent Architectures, Third International Conference on Autonomous Agents, 1999. [23] Jennifer Healey. Wearable and Automotive Systems for the Recognition of Affect from Physiology. PhD thesis, MIT, 2000. [24] Rosalind W. Picard, Vyzas, and Healey. Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, Issue 10:1175–1191, 2001. [25] Jonghwa Kim. From physiological signals to emotions: Implementing and comparing selected methods for feature extraction and classification. IEEE International Conference on Multimedia & Expo (ICME 2005), pages 940– 943, 2005. [26] Jonghwa Kim and E. Andre. Emotion-specific dichotomous classification and feature-level fusion of multichannel biosignals for automatic emotion recognition. Multisensor Fusion and Integration for Intelligent Systems, 2008. MFI 2008. IEEE International Conference on, 2008. [27] Jonghwa Kim and Elisabeth Andre. Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Interlligence, 30:2067–2083, 2008. BIBLIOGRAPHY 62 [28] Jonghwa Kim and Elisabeth Andre. Biomedical Engineering Systems and Technologies–Four-Channel Biosignal Analysis and Feature Extraction for Automatic Emotion Recognition. Springer Berlin Heidelberg, 2009. [29] Jonghwa Kim and Elisabeth André. Multisensor Fusion and Integration for Intelligent –Fusion of Multichannel Biosignals Towards Automatic Emotion Recognition. Springer Berlin Heidelberg, 2009. [30] C. Maaoui, A. Pruski, and F. Abdat. Emotion recognition for human-machine communication. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008. IROS 2008., pages 1210–1215, 2008. [31] Robert Horlings, Dragos Datcu, and Leon J. M. Rothkrantz. Emotion recognition using brain activity. CompSysTech’08 ACM International Conference Proceedings Series, 2008. [32] Human nervous system, http://en.wikipedia.org/wiki/nervous system. [33] M. Lyons and C. Bartneck. Hci and the face. Proceedings of the Conference on Human Factors in Computing Systems (CHI2006), Extended Abstracts, 2006. [34] John J. B. Allen. Frontal eeg asymmetry, emotion, and psychopathology: the first, and the next 25 years. Biological Psychology, 67:1–5, 2004. [35] John J. Nazarian. Issues and assumptions on the road from raw signals to metrics of frontal eeg asymmetry in emotion. Biological Psychology, 67:183– 218, 2004. [36] James A. Allen. Frontal eeg asymmetry as a moderator and mediator of emotion. Biological Psychology, 67:7–49, 2004. BIBLIOGRAPHY 63 [37] David N. Allen. A better estimate of the internal consistency reliability of frontal eeg asymmetry scores. Psychophysiology, 46:132–142, 2009. [38] http://en.wikipedia.org/wiki/10-20 system (eeg). [39] John T. Cacioppo, Louis G. Tassinary, and Gary Berntson, editors. Handbook of Psychophysiology. Cambridge University Press, March 2007. [40] J. Allen. Issues and assumptions on the road from raw signals to metrics of frontal eeg asymmetry in emotion. Biological Psychology, 67(1-2):183–218, October 2004. [41] Chih C. Chang and Chih J. Lin. LIBSVM: a library for support vector machines, 2001. [42] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Support vector machine for eeg signal classification during listening to emotional music. pages 127–130, October 2008. [43] http://code.google.com/p/tempo. [44] Leepeng P. Tan. The effelcts of background music on quality of sleep in elementary school children. Journal of Music therapy, pages 128–150, 2004. [45] Harmat, László Takács, Johanna Bódizs, and Róbert. Music improves sleep quality in students. Journal of Advanced Nursing, 62:327–335, 2008. [46] Lai, Hui-Ling Good, and Marion. Music improves sleep quality in older adults. Journal of Advanced Nursing, volum 53, issue 1:134, 2006. BIBLIOGRAPHY 64 [47] Iber. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specification. 2007. [48] http://en.wikipedia.org/wiki/sleep, retrieved on 29 april 2010. [49] A. Rechtschaffen and A. Kales. A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects. US Government Printing Office, US Public Health Service, Washington DC, 1968. [50] Michael H. Silber. Staging sleep. Sleep Medicine Clinics, 4(3):343–352, September 2009. [51] Kelly A. Carden. Recording sleep: The electrodes, 10/20 recording system, and sleep system specifications. Sleep Medicine Clinics, 4(3):333–341, September 2009. [52] http://www.clevemed.com/crystalmonitor/overview.shtml, retrieved on july 20, 2010. [53] http://en.wikipedia.org/wiki/polysomnography, retrieved on july 19, 2010. [54] M. Hirshkowitz. Commentary - standing on the shoulders of giants: the standardized sleep manual after 30 years. Sleep Medicine Reviews, 4(2):169– 179, April 2000. [55] S. Himanen. Response to ”standing on the shoulders of giants: The standardized sleep manual after 30 years”. Sleep Medicine Reviews, 4(2):181–182, April 2000. [56] S. Himanen. Limitations of rechtschaffen and kales. Sleep Medicine Reviews, 4(2):149–167, April 2000. BIBLIOGRAPHY 65 [57] Patrick Sorenson. Generating a signal: Biopotentials, amplifiers, and filters. Sleep Medicine Clinics, 4(3):323–331, September 2009. [58] Elise Maher and Lawrence J. Epstein. Artifacts and troubleshooting. Sleep Medicine Clinics, 4(3):421–434, September 2009. [59] T. Penzel. Computer based sleep recording and analysis. Sleep Medicine Reviews, 4(2):131–148, April 2000. [60] T. Penzel, M. Hirshkowitz, J. Harsh, R. D. Chervin, N. Butkov, M. Kryger, B. Malow, M. V. Vitiello, M. H. Silber, C. A. Kushida, and A. L. Chesson. Digital analysis and technical specifications. Journal of clinical sleep medicine, 3(2):109–120, March 2007. [61] Peter Anderer, Georg Gruber, Silvia Parapatics, Michael Woertz, Tatiana Miazhynskaia, Gerhard Klosch, Bernd Saletu, Josef Zeitlhofer, Manuel J. Barbanoj, Heidi Danker-Hopfe, Sari-Leena L. Himanen, Bob Kemp, Thomas Penzel, Michael Grozinger, Dieter Kunz, Peter Rappelsberger, Alois Schlogl, and Georg Dorffner. An e-health solution for automatic sleep classification according to rechtschaffen and kales: validation study of the somnolyzer 24 x 7 utilizing the siesta database. Neuropsychobiology, 51(3):115–133, 2005. [62] Peter Anderer, Georg Gruber, Silvia Parapatics, and Georg Dorffner. Automatic sleep classification according to rechtschaffen and kales. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society, volume 2007, pages 3994–3997. IEEE, August 2007. BIBLIOGRAPHY 66 [63] N. McGrogan, E. Braithwaite, and L. Tarassenko. Biosleep: a comprehensive sleep analysis system. In Engineering in Medicine and Biology Society, Annual International Conference of the IEEE, November 2002. [64] Caffarel Jennifer, Gibson G. John, Harrison J. Phil, J. GRIFFITHS Clive, and J. DRINNAN Michael. Comparison of manual sleep staging with automated neural network-based analysis in clinical practice. Medical & biological engineering & computing, pages 105–110, 2006. [65] D. Alvarez-Estevez, J. M. Fernandez-Pastoriza, and V. Moret-Bonillo. A continuous evaluation of the awake sleep state using fuzzy reasoning. In Engineering in Medicine and Biology Society, Annual International Conference of the IEEE, 2009. [66] U. R. Abeyratne, S. Vinayak, C. Hukins, and B. Duce. A new measure to quantify sleepiness using higher order statistical analysis of eeg. In Engineering in Medicine and Biology Society, Annual International Conference of the IEEE, 2009. [67] Flexer. An automatic, continuous and probabilistic sleep stager based on a hidden markov model. In Applied Artificial Intelligence, pages 199–207, 2002. [68] Maryann Deak and Lawrence J. Epstein. The history of polysomnography. Sleep Medicine Clinics, 4(3):313–321, September 2009. [69] John C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, pages 61–74, 1999. BIBLIOGRAPHY 67 [70] Ting F. Wu, Chih J. Lin, and Ruby C. Weng. Probability estimates for multiclass classification by pairwise coupling. J. Mach. Learn. Res., 5:975–1005, 2004. [71] Marilisa G. Gibellato. Stochastic modeling of the sleep process. PhD thesis, Ohio State University, 2005. [72] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, 2000 (June 13). [73] Beth Logan. Mel frequency cepstral coefficients for music modeling. In International Symposium/Conference on Music Information Retrieval - ISMIR, 2000. [74] George Tzanetakis and Perry Cook. Marsyas: a framework for audio analysis. Org. Sound, 4(3):169–175, December 1999. [75] M. Slaney, K. Weinberger, and W. White. Learning a metric for music similarity. In ISMIR, pages 313–318, 2008. [76] Beth Logan and Ariel Salomon. A music similarity function based on signal analysis. In International Conference on Multimedia and Expo, volume 0, pages 190+, Los Alamitos, CA, USA, 2001. IEEE Computer Society. [77] Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40, 2000. BIBLIOGRAPHY 68 [78] Jean-Julien Aucouturier and Francois Pachet. Music similarity measures: What’s the use? In Ircam, editor, Proceedings of the 3rd International Symposium on Music Information Retrieval, pages 157–163, Paris, France, October 2002. [79] Jean-Julien Aucouturier and Francois Pachet. Improving timbre similarity: How high’s the sky? Journal of Negative Research Results in Speech and Audio Science, 1, 2004. [80] E. Pampalk. A matlab toolbox to compute music similarity from audio. In Proceedings of 5th International Conference on Music Information Retrieval, Barcelona, Spain, 2004. [81] Elias Pampalk, Arthur Flexer, and Gerhard Widmer. Improvements of audiobased music similarity and genre classification. In Proc. of Int. Symposium on Music Information Retrieval, 2005. [82] Kris West and Paul Lamere. A model-based approach to constructing music similarity functions. EURASIP J. Appl. Signal Process., 2007(1):149, January 2007. [83] Bingjun Zhang, Jialie Shen, Qiaoliang Xiang, and Ye Wang. Compositemap: a novel framework for music similarity measure. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 403–410, New York, NY, USA, 2009. ACM. [...]... recognize musical emotion from Audience’s EEG signal instead of music item An online system was built to demonstrate this concept Audience’s EEG signal is captured when s/he listens to the music items Then Alpha frontal power feature is extracted from EEG signal A SVM classifier is used to classify each music items in three musical emotions: happy, sad, and peaceful In the second project, an EEG- assisted music. .. Brain Computer Interface (BCI) [3] Two years ago, I was surprised by the amazing applications of BCI technology such as P300 speller [4], and motor-imaginary-controlled robot [5], etc At that time I came up the idea that utilizing EEG signal in traditional MIR systems I have been trying to find a scenario where EEG signal can be integrated in a MIR system So far two projects have been conducted: EEG- based... pumps blood in the blood vessel, resulting in a peak in BVP signal Heart rate (HR) signal can be extracted from BVP easily BVP is also in uenced by emotions and stress Active feeling such as anger, fear or happiness always increases the value of BVP signal Electroencephalography (EEG) , the electric signal generated by neuron cells can be captured by placing electrodes on the scalp, as described in Figure... emotion from human’s physiological signal instead of low- 4 Chapter 2 EEG- based Music Emotion Annotation System 5 Figure 2.1: Recognize Musical Emotion from Acoustic Features of Music Figure 2.2: Recognize Musical Emotion from Audience’s EEG Signal level feature of music item, as described in Figure 2.2 A physiology-based music emotion annotation approach is investigated in this part The research problem... conducted: EEG- based musical emotion recognition and EEG- assisted music recommendation system The first project is musical emotion recognition from audience’s EEG feedback Music emotion recognition is an important but challenging task in music information retrieval Due to the well-known semantic gap problem, musical emotion cannot be accurately recognized from the low level features extracted from music items... follows EEG- based Music Emotion Annotation System is presented in detail in Chapter 2 Chapter 3 discuss the EEG- assisted music recommendation system Future work and Perspective are summarized in Chapter 4 Chapter 2 EEG- based Music Emotion Annotation System 2.1 Introduction Like genre and culture, emotion is one important factor of music which has attracted much attention in MIR community Musical emotion... This work addresses a healthcare scenario, music therapy, that utilizing Chapter 1 Introduction 3 music to heal people who suffer from sleep disorders Music therapy research has indicated that music does have beneficial effects on sleep During the process of music therapy, people are asked to listen to a list of music, which is pre-selected by music therapist In spite of its clear benefits towards sleep... consuming task for music therapist to produce a personalized music list Based on this observation, an EEG- assisted music recommendation system was proposed, which automatically recommends music for user according to his sleep quality estimated from EEG signal As the first attempt, how to measure sleep quality from EEG signal is investigated This work was recently selected for poster presentation in ACM... detect emotion states in earlier works, are summarized in Table 2.2 2.3.4 Feature Extraction and Classification To decode the emotion from physiological signals, many features have been presented Two popular features are spectral density in frequency domain and statistical information in time domain Chapter 2 EEG- based Music Emotion Annotation System (a) EEG Electrode Cap and EEG Amplifier (b) Experiment... electrode channels Each channel captures EEG signals continuously To perform real-time analysis, the EEG signals are collected by the signal acquisition module The module buffers the continuous signals and feeds them as smaller EEG segments of 1s in duration, into the analysis module The analysis module calculates the spectral power density and the frontal Chapter 2 EEG- based Music Emotion Annotation System ... progresses in the field of music information retrieval (MIR), grand challenges such as the intention gap and the semantic gap still exist Inspired by the current successes in the Brain Computer Interface... features are spectral density in frequency domain and statistical information in time domain Chapter EEG- based Music Emotion Annotation System (a) EEG Electrode Cap and EEG Amplifier (b) Experiment... Chapter Automatic Sleep Scoring using EEG Signal- First Step Towards a Domain Specific Music Recommendation System 42 ities by combining the output of multiple one-against-one binary classifiers [70]

Định dạng
Số trang	77
Dung lượng	5,93 MB