Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 77 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
77
Dung lượng
5,93 MB
Nội dung
UTILIZING EEG SIGNAL IN MUSIC
INFORMATION RETRIEVAL
ZHAO WEI
B.Sc. OF ENGINEERING
UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA
2006
A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
SCHOOL OF COMPUTING
NATIONAL UNIVERSITY OF SINGAPORE
2010
Abstract
Despite significant progresses in the field of music information retrieval (MIR),
grand challenges such as the intention gap and the semantic gap still exist. Inspired
by the current successes in the Brain Computer Interface (BCI), how to utilize
electroencephalography (EEG) signal to solve the problems of MIR is investigated
in this thesis. Two scenarios are discussed respectively: EEG-based music emotion
annotation and EEG-based domain specific music recommendation. The former
project addresses the problem that how to classify music clips to different emotion
categories based on audiences’ EEG signal when they listen to the music. The
latter project presents an approach to analysis sleep quality from EEG signal as
a component of an EEG-based music recommendation system which recommends
music according to the user’s sleep quality.
i
Acknowledgement
This thesis would not have been possible without the support of many people.
I wish to express my greatest gratitude to my supervisor, Dr. Wang Ye who offered
valuable support and guidance since I started my study in School of Computing. I
also owe my gratitude to Dr. Tan from Singapore General Hospital for her professional suggestions about music therapy, to Ms. Shi Dongxia of National University
Hospital for her generous help in annotating the sleep EEG data.
I would like to thanks Wang Xinxi, Li Bo and Anuja for their assistance and help
in the system implementation of my work. Special thanks also to all participants
involved in the EEG experiments: Ye Ning, Zhang Binjun, Lu Huanhuan, Zhao
Yang, Zhou Yinsheng, Shen Zhijie, Xiang Qiaoliang, Ai Zhongkai, et al.
I am deeply grateful to my beloved families, for their consistent support and endless love. To support my research, my wife even has attached electrodes on her
scalp during sleep for a week.
Without the support of those people, I would not be able to finish this thesis.
Thanks you so much!
ii
Contents
Abstract
i
Acknowledgement
ii
Contents
iii
List of Publications
v
List of Figures
vi
List of Tables
vii
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . .
2 EEG-based Music Emotion Annotation System
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Emotion Recognition in Affective Computing . . . . . .
2.3 Physiology-based Emotion Recognition . . . . . . . . .
2.3.1 General Structure . . . . . . . . . . . . . . . . .
2.3.2 Emotion Induction . . . . . . . . . . . . . . . .
2.3.3 Data Acquisition . . . . . . . . . . . . . . . . .
2.3.4 Feature Extraction and Classification . . . . . .
2.4 A Real-Time Music-evoked Emotion Detection System
2.4.1 Introduction . . . . . . . . . . . . . . . . . . . .
2.4.2 System Architecture . . . . . . . . . . . . . . .
2.4.3 Demonstration . . . . . . . . . . . . . . . . . .
2.5 Current Challenges and Perspective . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
4
4
6
7
8
9
10
14
16
16
18
24
26
3 Automatic Sleep Scoring using EEG Signal-First Step Towards a
Domain Specific Music Recommendation System
29
iii
CONTENTS
3.1
3.2
3.3
3.4
3.5
Introduction . . . . . . . . . . . . . . . .
3.1.1 Music Recommendation according
3.1.2 Normal Sleep Physiology . . . . .
3.1.3 Paper Objectives . . . . . . . . .
3.1.4 Organization of the Thesis . . . .
Literature review . . . . . . . . . . . . .
3.2.1 Manual PSG analysis . . . . . . .
3.2.2 Computerized PSG Analysis . . .
Methodology . . . . . . . . . . . . . . .
3.3.1 Feature Extraction . . . . . . . .
3.3.2 Classification . . . . . . . . . . .
3.3.3 Post Processing . . . . . . . . . .
Experiment Results . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . .
iv
. . . . .
to Sleep
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Quality
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
32
33
34
35
37
37
37
39
40
41
42
45
50
4 Conclusion and Future work
51
4.1 Content-based Music Similarity Measurement . . . . . . . . . . . . 53
Bibliography
58
List of Publications
Automated Sleep Quality Measurement using EEG signal-First Step
Towards a Domain Specific Music Recommendation System,
Wei Zhao, Xinxi Wang and Ye Wang, ACM Multimedia International Conference
(ACM MM), 25-29th October 2010, Firenze, Italy.
v
List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
Recognize Musical Emotion from Acoustic Features of Music
Recognize Musical Emotion from Audience’s EEG Signal . .
System Architecture . . . . . . . . . . . . . . . . . . . . . .
Human Nervous System . . . . . . . . . . . . . . . . . . . .
EEG signal Acquisition Experiments . . . . . . . . . . . . .
Physiology-based Music-evoked Emotion Detection System .
Electrode Position in the 10/20 International System . . . .
Feature Extraction and Classification Module . . . . . . . .
Music Game Module . . . . . . . . . . . . . . . . . . . . . .
3D Visualization Module . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
8
12
15
19
21
22
23
25
3.1
3.2
3.3
3.4
3.5
3.6
Physiology-based Music Rating Component . . . . . . .
Typical Sleep Cycles . . . . . . . . . . . . . . . . . . . .
Traditional PSG system with Three Physiological Signals
Band Power Features and Sleep Stages . . . . . . . . . .
Position of Fpz and Cz in the 10/20 System . . . . . . .
Experiment Over the Recording, st7052j0 . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
34
36
41
46
49
4.1
Content-based Music Recommendation Component . . . . . . . . .
54
vi
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
2.1
2.2
2.3
Targeted Emotion and Associated Stimuli . . . . . . . . . . . . . .
Physiological Signals related to Emotion . . . . . . . . . . . . . . .
Extracted Feature and Classification Algorithm . . . . . . . . . . .
11
14
17
3.1
3.2
3.3
3.4
3.5
3.6
Accuracy of SVM Classifier in 10-fold Cross-validation
Confusion Matrix on st7022j0 . . . . . . . . . . . . . .
Confusion Matrix on st7052j0 . . . . . . . . . . . . . .
Confusion Matrix on st7121j0 . . . . . . . . . . . . . .
Confusion Matrix on st7132j0 . . . . . . . . . . . . . .
Accuracy of SVM and SVM with Post-processing . . .
47
47
47
48
48
48
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction
1.1
Motivation
With the rapid development of digital music industry, music information retrieval
(MIR) has received much attention in last decades. Over years of development,
however, critical problems still remain, such as the intention gap between users
and systems and the semantic gap between low-level features and high-level music
semantics. These problems significantly influence the performance of current MIR
systems.
User feedback plays a important role in Information Retrieval (IR) systems.
It has been presented as an efficient method to improve the performance of IR
system by conducting relevance assessment [1]. This technique is also useful for
MIR systems. Recently, physiological signal was presented to be new approach
1
Chapter 1 Introduction
2
to continuously collects reliable information from users without interrupt towards
them [2]. However, physiological signals have received few attentions in the MIR
community.
For the last two years, I have been conducting research about Electroencephalography (EEG) signal analysis and its applications in MIR. My decision to choose
this topic is also inspired by the successful stories of Brain Computer Interface
(BCI) [3]. Two years ago, I was surprised by the amazing applications of BCI
technology such as P300 speller [4], and motor-imaginary-controlled robot [5], etc.
At that time I came up the idea that utilizing EEG signal in traditional MIR systems. I have been trying to find a scenario where EEG signal can be integrated
in a MIR system. So far two projects have been conducted: EEG-based musical
emotion recognition and EEG-assisted music recommendation system.
The first project is musical emotion recognition from audience’s EEG feedback.
Music emotion recognition is an important but challenging task in music information retrieval. Due to the well-known semantic gap problem, musical emotion
cannot be accurately recognized from the low level features extracted from music
items. Consequently I try to recognize musical emotion from Audience’s EEG signal instead of music item. An online system was built to demonstrate this concept.
Audience’s EEG signal is captured when s/he listens to the music items. Then
Alpha frontal power feature is extracted from EEG signal. A SVM classifier is used
to classify each music items in three musical emotions: happy, sad, and peaceful.
In the second project, an EEG-assisted music recommendation system is proposed. This work addresses a healthcare scenario, music therapy, that utilizing
Chapter 1 Introduction
3
music to heal people who suffer from sleep disorders. Music therapy research has
indicated that music does have beneficial effects on sleep. During the process of
music therapy, people are asked to listen to a list of music, which is pre-selected
by music therapist. In spite of its clear benefits towards sleep quality, current
approach is difficult to be widely used because it is a time consuming task for
music therapist to produce a personalized music list. Based on this observation,
an EEG-assisted music recommendation system was proposed, which automatically recommends music for user according to his sleep quality estimated from
EEG signal. As the first attempt, how to measure sleep quality from EEG signal
is investigated. This work was recently selected for poster presentation in ACM
Multimedia 2010.
1.2
Organization of the Thesis
The thesis is organized as follows. EEG-based Music Emotion Annotation System
is presented in detail in Chapter 2. Chapter 3 discuss the EEG-assisted music recommendation system. Future work and Perspective are summarized in Chapter 4.
Chapter 2
EEG-based Music Emotion
Annotation System
2.1
Introduction
Like genre and culture, emotion is one important factor of music which has attracted much attention in MIR community. Musical emotion recognition is usually
regarded as a classification problem in earlier studies. To recognize the emotion
of one music clip, low-level features are extracted and fed into a classifier which
is trained based on labeled music clips [6], as presented in Figure 2.1. Due to the
semantic gap problem, low-level features, such as MFCC, cannot reliably describe
the high level factors of music. In this chapter I explore an alternative approach
which recognizes music emotion from human’s physiological signal instead of low-
4
Chapter 2 EEG-based Music Emotion Annotation System
5
Figure 2.1: Recognize Musical Emotion from Acoustic Features of Music
Figure 2.2: Recognize Musical Emotion from Audience’s EEG Signal
level feature of music item, as described in Figure 2.2.
A physiology-based music emotion annotation approach is investigated in this
part. The research problem is how to recognize human’s perceptive emotion from
physiology signal while he or she listen to emotional music. As human emotion
detection was first emphasized in the affective computing community [7], we briefly
introduce affective computing in Chapter 2.2. A survey about emotion detection
from physiology signal is given in Chapter 2.3. Our research prototype, a online
music-evoked emotion detection system, is presented in Chapter 2.4. The challenge
and perspective are discussed in Chapter 2.5.
Chapter 2 EEG-based Music Emotion Annotation System
2.2
6
Emotion Recognition in Affective Computing
Emotion is regarded as a complex mental and physiological state associated with a
large amount of feeling and thought. When human communicate with each other,
their behavior considerably depends on their emotion state. Different emotion
states, happy, sad and disgust always influence the decision of human and the
efficiency of the communication. To efficiently cooperate with others, people need
to take account of this subjective factor of human, the emotion. For example, a
salesman talks with many people every day. To promote his product, he has to
adjust his communication strategy in accordance with the respondent emotion of
consumers. The implication is clear to all of us: emotion plays a key role in our
daily communication.
Since human is subject to their emotion states, the efficiency of communication
between human and machine is also affected by the user’s emotion. Obviously it
is beneficial that if the machine can response differently according to the user’s
emotion, as what a salesman has to do. There is no doubt that taking account
of human emotion can considerably improve the performance for human machine
interaction [7, 8, 9]. But so far few emotion-sensitive systems have been built. The
problem behind this is that emotion is generated by the mental activity which is
hidden in our brain. Because of the ambiguous definition of emotion, it is difficult
to recognize emotion fluctuation accurately. Since automated recognition of human
emotion has a big impact and implies a lot of applications in Human Computer
Chapter 2 EEG-based Music Emotion Annotation System
7
Interaction, it has attracted a large body of attention from researcher in computer
science, psychology, and neuroscience.
There are two main approaches to recognize the emotion: Physiology-based
Emotion Recognition and Facial&Vocal-based Emotion Recognition. On
the one hand, some researchers have obtained many results to detect emotion from
facial image, human voice [10]. These face and voice signals, however, depends on
human explicit and deliberately expression of emotion [11]. With the advances
in sensor technology, on the other hand, physiological signal are introduced to
recognize the emotion. Since emotion is results of human intelligence, it is believed
that emotion can be recognized from physiology signal, which is generated by
human nervous system, the source of human Intelligence [12]. In contrast with
face and voice, the main advantage of physiology approach is that emotion can
be analyzed from physiological signal without subject’s deliberately expression of
emotion.
2.3
Physiology-based Emotion Recognition
Current approaches of physiology-based emotion detection are investigated in this
part. As discussed in Chapter 2.3.1, an typical emotion detection system consists
of four components, emotion induction, data acquisition, feature extraction, and
classification. The methods and algorithms employed in these components are
respectively summarized in Chapter 2.3.2, Chapter 2.3.3, and Chapter 2.3.4.
Chapter 2 EEG-based Music Emotion Annotation System
8
Figure 2.3: System Architecture
2.3.1
General Structure
To detect emotion states from physiological signal, the general approach can be
summarized as the answers for four questions as follow:
a. What emotion states are going to be detected?
b. What stimuli are used to evoke the specific emotion states?
c. What physiological signals are collected while the subject obtains the
stimuli?
d. Given the signals, how to extract feature vector and do the classification?
As described in Figure 2.3, a typical physiology-based emotion recognition system
consists of four components: emotion induction module; data acquisition module;
feature extraction & classification module. Each component addresses one question
as given above.
Emotion induction component is responsible to evoke the specific emotion by
Chapter 2 EEG-based Music Emotion Annotation System
9
using the emotional stimuli. For example, emotion induction components may play
back peaceful music or display a picture of traffic accident to help the subject to
reach the specific emotion stage.
While the subjects obtain the stimuli, data acquisition model keep on collecting
the signal from subject. The sensors attached on subject’s body are used to collect
physiological signal during the experiment. Different kind of sensor is used to
collect the specific physiological signals such as Electroencephalography (EEG),
Electromyogram (EMG), Skin conductivity response (SCR), and Blood Volume
Pressure (BVP) etc. For example to collect the EEG signal, the subject is usually
required to wear an electrode caps in the experiment.
After several runs of experiment, many physiological signal fragments can be
collected to build a signal data set. Given such a data set, the feature extraction
and classification component is applied to classify EEG segment into different
emotion categories. First the data set is divided into two parts: training set and
testing set. Then the classifier is built based on the training data set.
2.3.2
Emotion Induction
Emotion can be categorized into several basic states such as fearful, angry, sad,
disgust, happy, and surprise [13]. To recognize emotion states, the emotion have
to be define clearly in the beginning. The categorization of emotion varies in
different papers. In our system, we recognize three emotion stages: sad, happy,
and peaceful.
Chapter 2 EEG-based Music Emotion Annotation System
10
Once the emotion categorization is defined, another problem arises: how to
induce the subject to obtain the specific emotion states. Currently, the popular
solution is to provide some emotional cues to help the subject experience the emotion. Many stimuli are presented for such purpose, such as sound clips, music
item, picture and even movie clips. These stimuli can be categorized into four
main types:
a. Subject obtain the emotion by imagine.
b. Visual stimuli.
c. Audition stimuli.
d. Combination of visual and audition stimuli.
The emotion and stimuli presented in earlier papers are summarized in Table
2.1.
2.3.3
Data Acquisition
The human nervous system can be divided into two parts: the Central Nervous
System (CNS) and the Peripheral Nervous System (PNS). As described in
Figure 2.4, CNS contains the majority of the nervous system and consists of brain
and spinal cord. PNS extends the CNS and connect the CNS to the limbs and
other organs. Human nervous system is the source of physiological signal, and thus
physiology signals can be categorized into two categories: CNS-generated signals
and PNS-generated signals. The details of those two kinds of physiology signals
are discussed in the following part.
Chapter 2 EEG-based Music Emotion Annotation System
11
Table 2.1: Targeted Emotion and Associated Stimuli
Categorization of Emotion Stimuli to Evoke Emotion
Authors
Disgust, Happiness, Neutral Images from International Affec- [14], [15]
tive Picture System (IAPS)
Disgust, Happiness, Neutral (1)Images (2)Self-induced Emo- [16]
tion (3)Computer Game
Positive Valence v.s. High (1)Self-induced Emotion by imag- [17]
Arousal; Positive Valence ining past experience (2)Images
v.s. Low Arousal; Negative from IAPS (3)Sound clips from
Valence v.s. High Arousal; IADS (4)the Combination of
Negative Valence v.s. Low above Stimuli
Arousal;
Joy, Anger, Sadness, Plea- Music selected from Oscar’s [18], [19], [20]
sure
movie soundtracks
No Emotion, Anger, Hate, Self-induced Emotion
[21], [22], [23],
Grief, Rove, Romantic
[24]
Love, Joy, Reverence
Joy, Anger, Sadness, Plea- Music selected by subjects
[25], [26], [27],
sure
[28], [29]
Amusement, Contentment, Images from IAPS
[30]
Disgust, Fear, No emotion
(Neutrality), Sadness
5 emotions on two emo- Images from IAPS
[31]
tional dimensions, valence
and arousal
Chapter 2 EEG-based Music Emotion Annotation System
Figure 2.4: Human Nervous System
[32]
12
Chapter 2 EEG-based Music Emotion Annotation System
13
Electromyogram (EMG) is the electric signal generated by muscle cells when
these cells are active or at rest. The EMG potential usually ranges from 50 mV to
30 mV. The typical frequency of EMG is about 7-20 Hz. Because face activity is
abundant and indicates human emotion, some researchers capture the EMG signal
from farcical muscle, and employ it in the emotion detection system [33].
Skin conductivity response (SCR) / Galvanic Skin Response (GSR) is
one of the most well studied physiological signals. It describes the change of levels
of sweat in the sweat glands. SCR is generated by sympathetic nervous system
(SNS) which is part of peripheral nervous system. Since SNS always becomes
active while the human feel stress, SCR is also related with the emotion.
Blood Volume Pressure (BVP) is the indicator of blood flaw, it measure the
force of blood pushing against blood vessel. BVP is measured by a unit called
Mm Hg (millimeters of mercury). Each time the heart pumps blood in the blood
vessel, resulting in a peak in BVP signal. Heart rate (HR) signal can be extracted
from BVP easily. BVP is also influenced by emotions and stress. Active feeling
such as anger, fear or happiness always increases the value of BVP signal.
Electroencephalography (EEG), the electric signal generated by neuron cells
can be captured by placing electrodes on the scalp, as described in Figure 2.5.
It has been proven that the difference of spectral power between left and right
brain hemispheres is an indicator of the fluctuations in emotions [34]. Specifically,
pleasant music causes a decrease in left frontal alpha power, whereas unpleasant
music elicits a decline of right frontal alpha power. Based on this phenomenon,
one feature called Asymmetric Frontal Alpha Power is extracted from EEG to
Chapter 2 EEG-based Music Emotion Annotation System
14
Table 2.2: Physiological Signals related to Emotion
Physiological signal
Authors
EEG
[17], [18],
[19], [20],
[31]
(1)EMG (2)GSR (3)Respiration (4)Blood [21], [22],
volume pressure
[23], [24]
(1)EMG (2)ECG/EKG (3)skin conductivity [25], [26],
(4)Respiration
[27], [28],
[29]
(1)Blood volume pulse (2)EMG (3)Skin [30]
Conductance Response (4)Skin Temperature
(5)Respiration
(1)EEG (2)GSR (3)Blood pressure (4)Respi- [15]
ration (5)Temperature
(1)Video recording (2)fNIRS (3)EEG [14]
(4)GSR (5)Blood pressure (6)Respiration
(1)EEG (2)GSR (3)Respiration (4)BVP [16]
(5)Finger temperature
recognize the emotion [35, 36, 37].
In spite of the physiological signals discussed above, Temperature of skin,
Respiration, and functional Near-Infrared Spectroscopy (fNIRS) are also
used to detect the emotion. The varieties of physiological signals, which are employed to detect emotion states in earlier works, are summarized in Table 2.2.
2.3.4
Feature Extraction and Classification
To decode the emotion from physiological signals, many features have been presented. Two popular features are spectral density in frequency domain and statistical information in time domain.
Chapter 2 EEG-based Music Emotion Annotation System
(a) EEG Electrode Cap and EEG Amplifier
(b) Experiment conducted on Zhao (c) Experiment conducted on Yi Yu
Wei
(d) Experiment conducted on Zhao Yang
Figure 2.5: EEG signal Acquisition Experiments
15
Chapter 2 EEG-based Music Emotion Annotation System
16
EEG signals are usually divided into 5 frequency bands: delta (1-3 Hz), theta
(4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz), and gamma (31-50 Hz). One common
feature of EEG is the average spectral density on specific frequency band. Furthermore, subtract between different channel and ratio of different bands is also
used as feature vectors.
Besides, the signals generated by PNS just cover a small range of frequency
band. Consequently, the signal such as blood pressure, respiration, and skin conductivity cannot be divided into several frequency bands. Usually time-domainbased features are extracted from these signals, such as peak rate, statistical mean,
and variance.
The extracted feature and classification algorithms used in previous papers are
summarized in Table 2.3.
2.4
A Real-Time Music-evoked Emotion Detection System
2.4.1
Introduction
Advances in sensor and computing technologies have made it possible to capture
and analyze human physiological signals in different applications. These capabilities unpack a new scenario wherein the subject’s emotions evoked by external
stimuli such as music can be detected and visualized in real-time. Two approaches
Chapter 2 EEG-based Music Emotion Annotation System
Table 2.3: Extracted Feature and Classification Algorithm
Features
Classification
(1)averaged spectral power of 6 frequency (1)Native Bayesian classifier
bands (2)wavelet coefficients (CWT) of (2)Fisher Discriminant Analyheart rate (3)the mean, variance, mini- sis
mum and maximum of peripheral signals
To select the best feature, several meth- (1)Naive Bayesian classifier
ods are applied: (1)filter (ANOVA, Fisher (2)Discriminant
Analysis
based and FCBF) (2)wrapper (SFFS) fea- (3)SVM (4)Relevance Vector
ture selection algorithms
Machines (RVM)
based on three EEG channels, Fpz and binary linear FDA (Fisher’s
F3/F4, following feature are extracted: Discriminant Analysis) classi(1)alpha, beta, alpha and beta, beta/al- fier
pha power (2)beta power / alpha power
(1)the means of raw signal; (2)the stan- Sequential Floating Forward
dard deviation of the raw signal; (3)the Search (SFFS) is used to select
means of the absolute values of the first the best features from feature
differences of the raw signals; (4)the space. Finally, three strategies
means of the absolute values of the first are presented to do the clasdifferences of the normalized signal; (5)the sification task: (1)Sequential
means of the absolute values of the second floating forward search (SFFS)
differences of the raw signal; (6)the means feature selection with K-NN;
of the absolute value of the second differ- (2)Fisher Projection (FP) with
ence of the normalized signal.
MAP classification; (3)A hybrid SFFS-FP method.
Eleven feature extracted from signal
(1)Fisher projection matrix
(2)SFFS (3)K-NN
Many methods are investigated to find the (1)linear discriminant function
best features from feature space: (1)analy- (LDF) (2)k-nearest neighbors
sis of variance (ANOVA) (2)sequential for- (KNN) (3)multiplayer percepward selection (SFS) (3)sequential back- tron (MLP)
ward selection (SBS) (4)PCA (5)fisher
projection
30 feature values are extracted from five (1)Fisher discriminant classisignals
fier (2)SVM
asymmetry frontal alpha power over 12 multiplayer perceptron classielectrode pairs
fier (MLP)
(1)asymmetry frontal alpha power over SVM
12 EEG electrode pairs (2)spectral power
density of 24 EEG channels
(1)asymmetry frontal alpha power over hierarchical SVM
12 EEG electrode pairs (2)spectral power
density of 24 EEG channels
17
Authors
[15]
[16]
[17]
[21],
[24]
[22],
[23]
[25],
[27],
[29]
[30]
[18]
[19]
[20]
[26],
[28],
Chapter 2 EEG-based Music Emotion Annotation System
18
have been identified to make use of these physiological signals in multimedia systems. First, physiological signals can be visualized continuously while the subject
interacts with a multimedia system. Second, the physiological signals can be used
as a control message in applications such as game. Our system is designed to combine these two approaches and to demonstrate an application scenario of real-time
emotion detection. EEG signals generated as a response to the musical stimuli are
captured to detect the subject emotion states. This information is then used to
control a simple emotion-based music game. While the subject plays the music
game, his EEG is visualized on a 3D head model which serves as a synchronized
feedback for monitoring the subject. In such a case, our system provides a real-time
tool to monitor the subject and can serve as a useful input for music therapists,
for example.
2.4.2
System Architecture
The proposed system is shown in Figure 2.6 which shows how the four modules
are connected. The data acquisition and analysis modules together constitute the
music-evoked emotion detection subsystem.
Before providing the stimuli to evoke the subject’s emotion, the subject is asked
to wear an electrode cap which consists of 40 electrode channels. Each channel
captures EEG signals continuously. To perform real-time analysis, the EEG signals
are collected by the signal acquisition module. The module buffers the continuous
signals and feeds them as smaller EEG segments of 1s in duration, into the analysis
module. The analysis module calculates the spectral power density and the frontal
Chapter 2 EEG-based Music Emotion Annotation System
19
Figure 2.6: Physiology-based Music-evoked Emotion Detection System
alpha power feature from each EEG segment. The frontal alpha power feature is
discussed in details in following paragraphs.
Using these features followed by a SVM classifier, subject emotions are classified
into three states: happy, sad and peaceful. Finally, the classification result is sent
to the music game module to drive the game and the spectral powers of each
channel and emotions are fed into the 3D module for visualization.
Data Acquisition Module
To capture EEG signals, we have used products of NeuroScan, Quik-Caps and
NuAmps. Quik-Caps consist of 40 electrodes located on the head of the subject,
in accordance with the 10-20 system standards [38]. The electrical signals captured
Chapter 2 EEG-based Music Emotion Annotation System
20
by the electrodes are amplified by NuAmps. The sampling rate of the EEG signal
is 500 Hz. Since the effective frequency range of EEG is from 1 to 50 Hz, the
EEG signals are first band-pass filtered to retain only components between 1 and
200 Hz. The filtered EEG signals are continuously sent from the data acquisition
module to the analysis module.
Frontal Alpha Power Feature
The analysis module consists of two main components: feature extraction and
classification. To detect the music-evoked emotion from EEG, we have used the
asymmetry features commonly used in the physiological community [39]. It has
been proven that the difference of spectral power between left and right brain
hemispheres is an indicator of the fluctuations in emotions. Specifically, pleasant
music causes a decrease in left frontal alpha power, whereas unpleasant music elicits a decline of right frontal alpha power [40]. In comparison to most existing BCI
systems, we have not used any artifact rejection/removal method in our system.
The rationale is that artifacts usually have very similar effects on both the electrodes, which are symmetrically located on two hemispheres. Asymmetric features
are subtractions between symmetric electrode pairs thus compensating artifacts
caused by eye blinking for example [40]. Since 8 selected electrodes are symmetrically located on frontal lobe in our electrode cap, 4 pairs of electrodes can be
used to calculate the asymmetry features: Fp1-Fp2, F7-F8, F3-F4 and FC3-FC4.
The position of these 8 electrodes is illustrated in the Figure 2.7. EEG signals are
usually divided into 5 frequency bands: delta (1-3 Hz), theta (4-7 Hz), alpha (8-13
Hz), beta (14-30 Hz) and gamma (31-50 Hz). The averaged differential spectral
Chapter 2 EEG-based Music Emotion Annotation System
21
Figure 2.7: Electrode Position in the 10/20 International System
power over the alpha band is calculated as a feature from each electrode pair.
The dimension of the resulting feature vector is 4. Using this asymmetric feature
vector, emotion detection becomes a multi-class classification problem.
SVM Classifier
In the beginning of the music game, the subject is required to listen to 3 music
clips associated with 3 emotion states (happy, sad and peaceful). The evoked EEG
features are used as training data to build an SVM model. This model is then used
to predict the emotion state of the incoming EEG features in real-time. Libsvm
Chapter 2 EEG-based Music Emotion Annotation System
22
Figure 2.8: Feature Extraction and Classification Module
was used to implement the training and prediction [41]. Four-dimension feature
is extracted from each 1-second EEG segment. Existing kernels in the Libsvm
package are used to conduct the experiment. The classifier is trained based on a
6-minute EEG recording, which covers each emotion for 2 minutes.
We noticed other offline emotion classification systems (e.g. [42]) using much
higher dimension of feature vectors. Unfortunately, those offline systems cannot
be simply extended to a real-time system with acceptable performance. To reduce
the duration between training data collection and real-time prediction, we have
implemented a simple GUI (see Figure 2.8) to make the training process more
convenient and efficient. This has mitigated the performance degradation of the
real-time system, although it cannot solve the problem completely.
Chapter 2 EEG-based Music Emotion Annotation System
23
Figure 2.9: Music Game Module
Music Game Module
As in Figure 2.9, the game module has two main functions: to playback music
for evoking the required emotion state and to visualize emotion states transition
in real-time. The interface of the game is simple yet functional. This module,
however, needs to be improved for real life applications, such as music therapy.
3D Visualization Module
As Figure 2.10, the 3D visualization module displays the spectral power of each
EEG channel with different colors on a 3D head model which adopted and modified
Chapter 2 EEG-based Music Emotion Annotation System
24
from an open source project, Tempo [43].
The spectral energy changes of each EEG channels are displayed with different
colors on a 3D head model. We believe that 3D visualization is more intuitive to
human beings and this could be a useful feedback to experiment conductors. Since
visual patterns are friendlier to human eyes than the decimal numbers, an intuitive
illustration of the EEG changes can also be gained.
By observing the classification results and the EEG energy visualized in the
3D module, we can monitor the performance of the proposed approach during
the whole experiment. For example, the classifier might produce a wrong label
after the subject move head slightly. Therefore, which events might influence the
accuracy of proposed system can be distinguished. This kind of information could
be useful to improve the proposed system in the future work.
2.4.3
Demonstration
We proposed a research prototype to detect music-evoked emotion states in realtime using EEG signals synchronized with two visualization modules. We also
show its potential applications such as music therapy. As a start of the project,
we have re-implemented an offline system described in [42] and have achieved an
accuracy of up to 93% based on k-fold cross-validation, which is similar to the
reported performance.
However, the accuracy drops to randomly guess (about 35% in 3-class classifi-
Chapter 2 EEG-based Music Emotion Annotation System
Figure 2.10: 3D Visualization Module
25
Chapter 2 EEG-based Music Emotion Annotation System
26
cation) in online prediction.
The original offline system employs a 60-dimension feature extracted from whole
head area. We have then modified the approach by extracting features only from
the frontal lobe. Therefore, a 4-dimentions feature is extracted from 8 EEG channels of frontal lobe, as described in the Figure 2.7.
With the reduced features, we have managed to improve the prediction accuracy
based on our preliminary evaluations, which is discussed in details in Chapter 2.5.
2.5
Current Challenges and Perspective
Many papers have been published to recognize emotion from physiological signals.
To the best of our knowledge, no one has succeeded to extend their algorithm
into practical application. Although the accuracy of emotion recognition comes
up to 90% in the experiment of cross-validation, few works can obtain acceptable
accuracy in prediction. Based on the results in our experiment, the accuracy of
emotion recognition varies considerable under different validation strategy such as
prediction, k-fold cross-validation, and one-leave-out cross-validation.
In our preliminary work described in Chapter 2.4, to detect emotion from EEG,
asymmetric frontal alpha power features are extracted from 8 EEG channel.
These feature vectors are fed into a SVM classifier to do 3-class classification.
Under cross-validation, the accuracy can come up to 90%, however the accuracy
drops to randomly guess (35%) in the prediction. The exceedingly difference be-
Chapter 2 EEG-based Music Emotion Annotation System
27
tween accuracy associated with prediction and cross-validation implies that the
cross-validation might not be used correctly.
EEG signal has consistency in a short time period, thus results in the high
similarity between the feature vectors extracted from neighboring EEG segments.
Meanwhile, the soundness of k-fold cross-validation is partially based on the independency between feature vectors. Consequently, the dependency between frontal
alpha power feature results in a considerable distortion in the k-fold cross-validation
(randomly select feature vector) where training feature vectors and testing feature
vectors are extracted from neighboring EEG segments.
Another issue which might influence the accuracy is the ground truth problem.
This is equivalent to how to guarantee the subject obtain the specific emotion
during experiment. Actually, many stimuli are introduced to help the subject to
experience the emotion. For example, music from terrible movie, sound clips such
as baby laughing, and picture of car accident are used as stimuli. However, no
matter how strong stimuli we used to evoke the emotion, it is still impossible to
verify the subject obtain the emotion indeed.
In addition, current systems do not considerate the difference caused by stimuli.
Many researchers use the similar methods to detect the emotion which is evoked by
different stimuli. Different stimulus, audition or visual will induce signal in different
brain area, and evoke the emotion in different way. More attention should be focus
on how the motion generated in our brain, and what is the difference between the
emotion stages evoked by different stimuli such as happy image and happy music.
Further effort need to employ this knowledge to improve the accuracy of emotion
Chapter 2 EEG-based Music Emotion Annotation System
28
detection from physiological signal.
Furthermore, since physiological signal is quite ambiguous, more attention needs
to be focused on how to extract feature from these ambiguous signal. Unfortunately, unlike facial image or human voice, there is no a gold standard to verify
which pattern of physiological signal is good or bad.
To conclude, how to accurately recognize human emotion from physiology signals and employ this technique to annotate music emotion is still an open problem.
Chapter 3
Automatic Sleep Scoring using
EEG Signal-First Step Towards a
Domain Specific Music
Recommendation System
With the rapid pace of modern life, millions of people suffer from sleep problems.
Music therapy, as a non-medication approach to mitigating sleep problems, has
attracted increasing attention recently. However the adaptability of music therapy is limited by the time-consuming task of choosing suitable music for users.
Inspired by this observation, we discuss the concept of a domain specific music
recommendation system, which automatically recommends music for users according to their sleep quality. The proposed system requires multidisciplinary efforts
29
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
30
including automated sleep quality measurement and content-based music similarity measure. As a first step towards the proposed recommendation system, I focus
on the automated sleep quality measurement in this Chapter. A literature survey
about content-based music similarity measurement is given in Chapter 4.1.
To measure sleep quality, standard Polysomnography (PSG) approach require
various signals such as EEG, ECG and EMG etc. Although satisfactory accuracy
can be obtained based on analysis of multiple signals, those systems might make
subjects feel uncomfortable because they are asked to wear many sensors to collect
the signals during sleep. The situation become evenly worse especially for the
home-based health-care applications in which users’ experience is relatively more
important than systems’ accuracy. To improve users’ experience, it become a
important problem how to measure sleep quality from few signals while reasonable
accuracy still can be achieved.
Motivated by this, an EEG-based approach is proposed to measure user’s sleep
quality. The advantages of the proposed approach over standard PSG method are:
1) it measures sleep quality by recognizing three sleep categories rather than six
sleep stages, thus higher accuracy can be expected; 2) three sleep categories are
recognized by analyzing Electroencephalography (EEG) signal only, so the user
experience is improved because he is attached with fewer sensors during sleep.
Based on the experiments conducted over standard data set, the proposed approach achieves high accuracy and hence shows promising potential for the music
recommendation system.
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
3.1
31
Introduction
People spend about one-third of their whole life on sleep. Although it is still an
open question why human need sleep, there is no doubt that sleep is necessary to
maintain our overall health. Deficiency of good sleep could result in severe physical
and mental effects such as fatigue, tiredness, and depression.
Nowadays millions of people are affected by sleep problems, many of whom
remain undiagnosed. Although various medicines were developed to cure sleep
problems, medicines are not recommended to use because of their negative side
effects. At present music therapy offers an alternative healing method, which
improves sleep quality by playing back music at bed time. With its development
in 2000s, music therapy research has indicated that music does have beneficial
effects on sleep for children [44], young people [45] and older adults [46].
During the process of music therapy, people are asked to listen to a list of music,
which is pre-selected by a music therapist. In spite of its clear benefits towards
sleep quality, current approach is difficult to be widely used because it is a time
consuming task for music therapist to produce a personalized music list. Based on
this observation, a domain specific music recommendation system is introduced to
automatically recommends music for user according to his sleep quality.
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
32
Figure 3.1: Physiology-based Music Rating Component
3.1.1
Music Recommendation according to Sleep Quality
The proposed recommendation system consists of two main components: EEGbased music rating and content-based music recommendation. After music
is played back at bed time, the former component will monitor user’s sleep quality.
Music items will be rated according to user’s sleep quality, as figure 3.1 shows.
It is our contention that similar music pieces share the similar influence towards
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
33
sleep quality. The music items associated with good sleep quality are used as
queries in the music recommendation component. The music which are similar to
the query items will be recommended to the user by conducting music similarity
analysis, as illustrated in figure 4.1. As the first step towards the proposed recommendation system, in this thesis, I focus on automated sleep quality measurement
which is the major task in the physiology-based music rating component.
3.1.2
Normal Sleep Physiology
According to current sleep analysis standard [47], normal sleep physiology consists
of Non Rapid Eye Movement (NREM) and Rapid Eye Movement (REM). NREM
is subdivided into four stages: stages 1 (S1), stage 2 (S2), stage 3 (S3) and stage 4
(S4). S3 and S4 are also called deep sleep. A healthy person will experience about
5 complete sleep cycles per night. [48]. S1 is the first stage in a sleep cycle. S2 is
the second stage. Then S3 and S4 will occur in the cycle consecutively. After the
completion of four stages of NREM, these four stages will reverse rapidly and then
followed by REM, as Figure 3.2 shows.
Given the sleep cycles over night, three main parameters can be calculated to
measure sleep quality: sleep latency, sleep efficiency and percentage of deep sleep.
Specifically, sleep latency is the time that it takes to finish the transition from
wakefulness to the first sleep stage. Sleep efficiency is the ratio of time spent
asleep to the time spent in bed. Percentage of deep sleep is the ratio of deep
sleep to the all sleep stages. To calculate these parameters, we do not need to
recognize every sleep stages in sleep cycles. It is enough if three sleep categories
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
34
Figure 3.2: Typical Sleep Cycles
can be distinguished: wakefulness, deep sleep (S3,S4), and other sleep stages
(S1,S2,REM). Consequently, the problem of sleep quality measurement is converted
into how to recognize these three sleep categories.
3.1.3
Paper Objectives
Now Polysomnography (PSG) technique is widely used in hospital to monitor
patients’ sleep cycles. First developed in 1960s [49], this method, also known
as sleep scoring, has become a golden standard in sleep studies. While users
are sleeping, three physiological signals are monitored: Electroencephalography
(EEG), Electrooculography (EOG), and Electromyography (EMG). Based on the
analysis of these three signals, six sleep stages can be recognized by human expert:
wakefulness, S1, S2, S3, S4 and RAM [50]. These stages are scored based on
30-second epoch.
The standard PSG approach utilizes 9 sensors to monitor the required signals:
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
35
EEG, EOG and EMG [51]. Although satisfactory accuracy can be obtained based
on analysis of multiple signals, standard PSG systems might make subjects feel
uncomfortable because they are asked to wear many sensors to collect the signals
during sleep, as described in Figure 3.3. The situation become evenly worse especially for the home-based health-care applications in which users’ experience is
relatively more important than systems’ accuracy. To improve the user experience, we use fewer signal channels to recognize sleep stages. As discussed above,
to calculate sleep quality parameters, we just need to recognize three categories:
wakefulness, deep sleep and other stages. On the account of the fact that EOG
and EMG are mainly used to distinguish between REM and S1, these two signals
may not contribute much in the calculation of sleep quality. Consequently, we
recognize the three sleep categories from EEG signal only.
3.1.4
Organization of the Thesis
Based on the analysis of current PSG techniques in Section 3.2, we proposed to
recognize the three sleep categories from EEG signal only. The details of our
approach is presented in Section 3.3. Section 3.4 discuses the experiment results.
Conclusions are given in Section 3.5.
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
(a) PSG experiment for Adults [52]
(b) PSG experiment for Children [53]
Figure 3.3: Traditional PSG system with Three Physiological Signals
36
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
3.2
3.2.1
37
Literature review
Manual PSG analysis
In 1968, Rechtschaffen and Kales first proposed a standard method, which called
R&K scoring system [49], to recognize human’s sleep stages by manual. Over 40
years of development, the manual scoring system has been modified and improved
[54, 55, 56], especially for the recording procedure [51], signal amplifier [57] and
artifact issue [58]. Actually R&K method is still the most popular approach in
clinic practice, nowadays.
3.2.2
Computerized PSG Analysis
Intense research has been conducted on computerized PSG analysis, that is how
to automated recognize sleep stages from physiological signals. Two nice survey
papers were published about computerized PSG analysis [59, 60]. Based on the
literature, automated PSG analysis can achieve reasonable accuracy, compared
with the accuracy of human scoring. However its performance is influenced by
many factors, such as subject selection, electrode application and recording quality.
Current automated scoring systems can be mainly categorized into two families:
rule-based expert system and classifier-based system. Intuitively, rule-based expert
system imitates the process of human scoring by considering domain knowledge
such as R&K scoring method. Thus sleep stages is recognized by the expert system
in the similar way as it is done by human expert. Alternatively, classifier-based
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
38
system treats the sleep scoring as a normal classification problem. The typical
works of computerized PSG system are discussed in detail as below.
Based on the survey in literature, most of existing works intend to provide a
service of sleep scoring for clinic application. The main differences between our
approach and these works are that our approach is specifically designed for a
multimedia application rather than clinic one. In particular, we balance the user
experience and system accuracy, a EEG-based approach is proposed to recognize
three sleep categories.
Rule-based Expert System
Peter Anderer et al. [61, 62] developed an expert system, which is called Somnolyzer
24 X 7, to recognize six sleep stages (W,S1,S2,S3,S4 and REM) from 30-second
epoch of three signals (EEG, EMG and EOG). The Somnolyzer 24 X 7 system
consists of two main components: feature generator and rule-based expert system.
First, feature vector are extracted from EEG, EMG and EOG respectively. EEG
signal is filtered into five frequency bands: delta (0.5-2 Hz), theta (2-7 Hz), alpha
(7-12 Hz), beta (12-20 Hz) and fast beta(20-40 Hz). Then density, mean amplitude, mean frequency and variability are calculated from each 30-second epoch in
different frequency bands as well as the full-frequency band (0.5-40 Hz). For the
EMG signal, squared amplitude is calculated from each 1-second epoch, then the
minimum, the maximum and the mean of squared amplitude are determined for
30-second epoch of EMG.
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
39
A decision tree is built based on 10 linear discriminant analysis classifiers. The
80% agreement between expert system scoring and human scoring was achieved.
classifier-based system
The commercial package, BioSleep of Oxford BioSignals Company, is a typical
classifier-based system [63]. Auto regressive coefficients are extracted as feature
vectors from EEG segment, then a neural network classifier is used to classify each
EEG segment into different sleep stages. The BioSleep package obtains reasonable
results with the comparison of human scoring in the third-part evaluation [64].
Other methods
In spite of expert system and classifier approaches, additional solutions have been
proposed also. such as, fuzzy reasoning [65], higher order statistical analysis [66],
hidden markov model [67].
3.3
Methodology
To measure sleep quality, we need to recognize three sleep stage categories: wakefulness, deep sleep, and other sleep stages. To recognize these three sleep categories,
we extract spectral power feature from each 30-seconds EEG epoch. LibSVM
package [41] is used to build a SVM classifier to classify each EEG epoch into one
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
40
of the three categories. Additionally, as discussed in Section 3.1, the transition of
sleep stages follows the trend of sleep cycles. For example, if current epoch belongs
to deep sleep, then the next epoch probably also belongs to deep sleep. However,
SVM classifier treats each 30-second epoch independently, thus it does not take the
advantage of correlations between nearby epochs. Consequently, we further model
the sleep transition as a Markov chain. A matrix will be learned from training data
to indicate the transition probability between different sleep stages, as Figure 3.6
illustrates. Based on this matrix the classification results are further refined. The
probability estimated by SVM classifier for each epoch is processed by a dynamic
programming algorithm.
3.3.1
Feature Extraction
Band power features
EEG is the summation of electrical signal generated by millions of neurons in
the brain. It was first recorded by Richard in 1875 [68], now has been widely
used in PSG studies. EEG signal is usually divided into five different frequency
bands: delta (0.5-2 Hz), theta (2-7 Hz), alpha (8-12 Hz), beta (12-20 Hz) and
fast beta (20-40 Hz). According to previous studies, spectral powers of EEG in
different frequency bands highly relate to sleep stages. In particular, some sleep
stages are recognized by the presence of EEG signal in the specific frequency band.
Spectral power in specific frequency band of EEG is a well-known pattern in PSG
studies [50]. For example, deep sleep is recognized by the present of high amplitude
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
41
(a) Band Power Features extracted from 30-second EEG epoch
(b) Sleep Stages annotated by Human Expert
Figure 3.4: Band Power Features and Sleep Stages
delta wave of EEG. One epoch will be scored as wakefulness when alpha wave
presents in EEG. Consequently, we use spectral power extracted from five frequency
bands as the feature vector. As sleep stage is usually scored based on 30-second
epoch, feature vector is generated for each 30-second EEG epoch.
3.3.2
Classification
Support Vector Machine
Platt et al. [69] presented a work to convert the results of binary SVM into posterior
probabilities. Later this work is extended to estimate multi-class SVM probabil-
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
42
ities by combining the output of multiple one-against-one binary classifiers [70].
With this technique, we use a multi-class SVM classifier to estimate P (x|Ft ), the
probability of epoch t belonging to sleep stage x. Ft is the feature vector extracted from epoch t. We also define C(t, x) = P (x|Ft ). C(t, x) will be used in the
following section.
In SVM classification, epoch t is scored as the stage which could maximize the
probability C(t, x), as following equation shows:
ψ(t) = arg max C(t, x),
x
where ψ(t) is the label generated by classifier for epoch t.
3.3.3
Post Processing
Sleep Stage Transition Modeling
As SVM classifier treats each epoch independently, it does not utilize the correlation between epochs. To take the advantage of sequential information, we model
the transition of sleep stages as a discrete time Markov chain. The effectiveness of
this modeling has been proved by Gibellato [71]. The property of Markov chain
can be formalized in equation 3.1. It is also called first-order Markov assumption,
the stage of current epoch just depends on the stage of previous epoch.
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
43
P r(Xt = xt |Xt−1 = xt−1 , Xt−2 = xt−2 , . . . , X1 = x1 )
= P r(Xt = xt |Xt−1 = xt−1 ) = Pxt−1 xt , (3.1)
where Xt indicates the sleep stage of epoch t; Pxt−1 xt indicates the transition probability from stage xt−1 to stage xt .
As x has three possible stages, there are nine possible values for Pxt−1 xt . These
nine values construct a 3 by 3 matrix which indicates the transition probability
between three stages. This matrix is learned from training data as follows:
Count(i, j)
Pij = ∑3
k=1 Count(i, k)
where Count(i, j) counts the transitions from stage i to stage j. The calculation
of this matrix is also illustrated in Figure 3.6.
Dynamic Programming for Post-processing
Based on the probability generated by SVM and the transition matrix learned from
training data, a dynamic programming (DP) algorithm is designed to score each
epoch in the way that optimum overall posterior probability can be obtained. The
subproblem of dynamic programming is defined in Equation 3.2, which presents
the maximum overall probability from the first epoch to current epoch t, where
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
44
current epoch is scored as stage v.
L(t, v) =
max
x1 ,··· ,xt−1
{
P r(X1 = x1 , · · · , Xt−1 = xt−1 , Xt = v)
t
∏
}
C(i, Xi )
(3.2)
i=1
Because first-order Markov assumption is hold, Equation 3.2 can be simplified as
follows:
L(t, v) = max {L(t − 1, u)P r(Xt = v|Xt−1 = u)} C(t, v)
u
Thus L(t, v) can be calculated on top of L(t − 1, ·), and this forms an optimal
substructure for the subproblem. Therefore, a dynamic programming algorithm
can be designed to find the optimum solution.
To eliminate the float point overflow problem, logarithm probability is used:
S(t, v) = ln L(t, v). For the first epoch, we define S(1, v) = ln C(1, v).
ln C(1, v)
t=1
S(t, v) =
maxu {S(t − 1, u) + ln Puv } + ln C(t, v) t > 1
In backtracing, variable α(t, v) is defined to record the optimum sleep stage of
previous epoch.
α(t, v) = arg max{S(t − 1, u) + ln Puv + ln C(t, v)}
u
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
45
The process of backtracing is shown in the following formulation:
α(t + 1, ϕ(t + 1)) t < n
ϕ(t) =
arg maxv S(n, v) t = n
where ϕ is the refined classification labels which result in the maximum overall
posterior probability.
3.4
Experiment Results
The automated sleep quality measurement consists of two phases. First, SVM
classifier categorizes EEG epoch into one of three classes: wakefulness, deep sleep,
and other sleep stages. Second, a dynamic programming algorithm is used to refine
the results of SVM by utilizing the sequential information. Two experiments are
respectively conducted to evaluate the classifier and the post-process algorithm.
Experiments are conducted based on the Sleep EDF Database, which is selected
from the PhysioBank that is a large achieve of digital recording for biomedical research community [72]. Four recordings which were obtained from healthy subjects
in the hospital are used: st7022j0, st7052j0, st7121j0, and st7132j0. Three signals
(EEG, EOG and EMG) were monitored during night with good signal quality.
Only the EEG channel (Fpz-Cz) is utilized in our experiment. The channels position of Fpz and Cz is illustrated in the Figure 3.5. This data set is annotated by
human expert according to PSG standard [49]. Human annotation is used as the
ground truth in our experiments.
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
Figure 3.5: Position of Fpz and Cz in the 10/20 System
46
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
47
Table 3.1: Accuracy of SVM Classifier in 10-fold Cross-validation
Recording st7022j0 st7052j0 st7121j0 st7132j0
SVM
88.4%
93.9%
90.9%
92.7%
Table 3.2: Confusion Matrix on st7022j0
Deep Others Wake
Deep
238
45
1
Others
26
545
15
Wake
0
22
53
Sleep Stage Classification
In the first experiment, we evaluate the accuracy of SVM classifier. Four sleep
recordings of Sleep EDF Database are divided into 30-second epochs, and 3874
epochs are generated in total. The spectral power feature is extracted from each
epoch. 10-fold cross-validation is conducted based on the band power feature
vectors of each recording. The classifier achieves the average accuracy up to 93.9%
at frame level, as described in Table 3.1. The confusion matrix of SVM are showed
in table 3.2, table 3.3, table 3.4 and table 3.5.
Post Processing
In the second experiment, we evaluate the performance of DP algorithm. As
DP algorithm utilizes sequential information, it processes on continuous epochs.
Table 3.3: Confusion Matrix on st7052j0
Deep Others Wake
Deep
158
18
4
5
724
14
Others
Wake
0
22
105
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
48
Table 3.4: Confusion Matrix on st7121j0
Deep Others Wake
Deep
172
30
1
Others
26
718
9
Wake
1
26
43
Table 3.5: Confusion Matrix on st7132j0
Deep Others Wake
Deep
91
10
2
Others
22
664
3
Wake
0
25
35
Consequently, each recording is divided into two continuous parts, one for training
and one for testing. The first 400 epochs of each recording are used to train the
SVM model and learn the transition matrix. The rest epochs are testing data.
The experiment conducted on one recording is demonstrated in Figure 3.6. Based
on the experiment results showed in Table 3.6, DP algorithm stably improves
the accuracy in each recording. Comparing the sleep cycle generated by classifier
and DP algorithm, in spite of the improvement in accuracy, DP algorithm also
generates a more smooth sleep cycle than the SVM classifier, as Figure 3.6 shows.
In the early stage of this work, the proposed approach is evaluated based on a
data set of four EEG recordings. Due to limitation of the size of data set, training
and testing are conducted on the same subject in the same session. To evaluate
the generalization of the proposed approach, more recording should be collected
and further evaluation should be done in the future work.
Table 3.6: Accuracy of SVM and SVM with Post-processing
Recording st7022j0 st7052j0 st7121j0 st7132j0
SVM
87.5%
92.7%
93.4%
94.2%
SVM+DP
89.3%
95.8%
96.8%
99.3%
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
Figure 3.6: Experiment Over the Recording, st7052j0
49
Chapter 3 Automatic Sleep Scoring using EEG Signal-First Step
Towards a Domain Specific Music Recommendation System
3.5
50
Conclusions
In this thesis, we discussed the concept of a domain specific music recommendation
system, which automatically recommends music for users according to their sleep
quality. One important problem in the proposed recommendation system is how
to automatically monitor users’ sleep quality at night. To address this problem,
we first investigate sleep physiology and traditional PSG approach in literature.
Considering that standard PSG system may make users feel uncomfortable, we
specifically design an approach to recognize three sleep categories from EEG signal
only. Then three parameters, sleep latency, sleep efficiency and percentage of
deep sleep, can be calculated to measure sleep quality. The experiment results
demonstrates that our approach can achieve high accuracy though only the EEG
channel is used. These results motivate us to further develop the content-based
music recommendation components and evaluate the full functions of EEG-based
music recommendation system in the future. As the future work, content-based
music similarity, which is the major task in content-based music recommendation
system, is discussed in Chapter 4.
Chapter 4
Conclusion and Future work
How to utilize EEG signal in music retrieval systems is investigated in this thesis.
Two projects are conducted: EEG-based music emotion annotation and EEGbased music recommendation.
In the first project, an online system is built to recognize audience’s musicevoked emotion from EEG signal. EEG signal is recorded and processed while
subject listens to music clips. Frontal alpha power feature is extracted from 4
pairs of EEG channels: Fp1-Fp2, F7-F8, F3-F4, and FC3-FC4. A SVM classifier
is built to classify the feature vector into three emotion categories: happy, sad, and
peaceful. The proposed approach achieves accuracy around 90% in 10-fold crossvalidation (randomly select feature vector), and 35% in the prediction. To the best
of our knowledge, a few works also present the accuracy up to 90% in the crossvalidation, but nobody achieve reasonable accuracy in the prediction. As discussed
in Chapter 2.5, two conclusions can be derived from this observation. First, special
51
Chapter 4 Conclusion and Future work
52
attention need to be paid on the evaluation of EEG-related system, because the
dependency between feature vectors extracted from neighboring EEG segments
can result in considerable distortion in the k-fold cross-validation. Second, how
to recognize human emotion from EEG signal is still an open question because
nobody can achieve reasonable accuracy in the prediction.
In the second project, a technique is presented to analysis sleep qualify from
EEG signal, as a component of an EEG-based music recommendation system.
The system consists of two components, EEG-based music rating and contentbased music recommendation. Band Power Feature is extracted from one EEG
channel Fpz-Cz to recognize three sleep categories, wakefulness, deep sleep, and
other stages. The proposed approach achieves accuracy up to 90% and shows huge
potential for the EEG-based music rating component. The next step is to implement the content-based music recommendation component and to evaluate the
full function of whole system. In the initial stage of content-based music recommendation component, a literature survey about music similarity measurement is
given in Chapter 4.1. Three approaches are discussed: distance-based approaches,
cluster-based approaches and model-based approaches. Based on the results reported in literatures, model-based approach produces best results and is the most
competent solution for the content-based music recommendation system.
Chapter 4 Conclusion and Future work
4.1
53
Content-based Music Similarity Measurement
As showed in the Figure 4.1, music similarity is the major task in the content-based
music recommendation component. As one of the most fundamental problems in
MIR, how to measure the similarity between music pieces by analyzing content information is still a challenging research topic. Intense works have been conducted
in this field. On the account of the fact that the semantics of music can be ambiguous and complex, to measure their similarity, music is first represented as a set
of feature vector. Thus the music similarity measurement is converted to be how
to measure the similarity of two set of feature vector in a multi-dimensional space.
Therefore, music similarity measurement can be mainly divided to be two phases:
feature extraction from music and similarity measure in a multi-dimensional space.
In the phase of feature extraction, feature vectors are extracted from music
pieces and then placed in a multi-dimensional space. Then music piece is represented by points in high-dimension space. How to extract reliable and effective
feature from music piece is a traditional pattern recognition problem. So far a
large amount of features were presented to describe the semantic of music or audio
piece, such as temporal feature, spectral feature, rhythmic feature, and timbral
feature [73]. Over years of development, many convenient tools are implemented
to effectively extract those features from music such as the Marsyas package [74].
Given the feature vectors of music, many methods were proposed to measure
their similarity. Those methods can be mainly categorized into three families:
distance-based approach, cluster-based approach and model-based approach.
Chapter 4 Conclusion and Future work
Figure 4.1: Content-based Music Recommendation Component
54
Chapter 4 Conclusion and Future work
55
The most generic approach is to calculate the Euclidean distance in the raw
feature space. Therefore the similarity between two music pieces is measured by
the Euclidean distance between feature vectors. This approach implies that feature
value in different dimension is equally important. Thus one unit of distance in X
direction is equal to one unit of distance in Y direction. However, this is not always
the case. Addressing on this problem, Malcolm etc. described and evaluated
five approaches (whitening, LDA, NCA, LMNN and RCA) to rotate and scale
the raw feature space with a linear transform [75]. To evaluate the performance
of the proposed approaches, a straight-forward distance-based classifier, kNN, is
employed to process a classification task in the modified space.
A cluster-based approach, which regards the set of feature vectors as a entity,
was proposed by Logan and Salomn [76]. The proposed method consists of two
steps: frame clustering and cluster model similarity. First, feature vectors are
clustered into different groups by using K-means. Then the distribution of feature
vectors is regarded as a signature of music pieces. Second, the similarity is calculated by comparing the signature using the Earth Movers Distance (EMD) [77].
This approach was improved by Aucouturier and Pachet later [78, 79]. Gaussian
mixture models (GMM) was introduced to cluster the feature vectors. Monte Carlo
(MC) sampling is employed to measure the similarity of cluster model, which is
the likelihood of samples from model A is generated by model B. This approached
was implemented by Elias Pampalk [80] in an open source Package and used in a
genre classification task by Pampalk etc. [81].
In spite of the distance-based approach and cluster-based approach, West and
Chapter 4 Conclusion and Future work
56
Lamere [82] proposed a model-based approach to constructing music similarity.
Authors stated that high-level music factors, such as genre, are also important to
measure the similarity. Instead of directly calculating the distance between feature
vectors, feature vectors are first classified into different categories, and then an
internal profile is build to present the music. Suppose that Px = {cx0 , cx1 , ..., cxn } is
the profile of music pieces x in genre dimension, where cxi indicate the probability
that x is belong to genre i. The similarity Sx,y between music x and y is calculated
based on the Euclidean distance of Px and Py as follows:
Sx,y = 1 −
n
∑
(cxi − xyi )2
(4.1)
i=1
Different classification method can be used such as, LDA or SVM. The genre
label can also be augmented with other music semantic factors such as mood,
tempo or melody.
Zhang etc. [83] extended this approach to represent one music piece as a fuzzy
music semantic vectors by considering the profiles of different music dimensions
(genre, mood, tempo, etc.) together. They also evaluate the extended approach
in a large scale data set. The distance in different dimensions is first calculated as
follows:
1∑ x
(c − xyi )2
n i=1 i
n
Disjx,y
=
(4.2)
Chapter 4 Conclusion and Future work
57
The distance values are scaled by a sigmoid function, and multiplied by different
weight factors, wpi , then they are summed together and generalized by factors a
and b, as following equations:
Sx,y = a
e+1
Where a = 2 e−1
and b =
2
,
e−1
Np
∑
wpi
j=1
1 + eDisx,y
j
and
∑Np
j=1
−b
(4.3)
wpi = 1. Hence the value of Sx,y is in
the range of [0,1].
As above discussion, current approaches of music similarity measurement can
be mainly categorized into three families: distance-based approaches, cluster-based
approaches, and model-based approaches. Based on the evaluation results reported
in the literature [82, 79], model-based approaches likely produce the best performance. Consequently, model-based approach is one competent option for our
content-based music recommendation system. The similar methods reported in
[82, 83] can be employed in the implementation of content-based music recommendation component.
Bibliography
[1] Donna Harman. Relevance feedback revisited. In Proceedings of the 15th
annual international ACM SIGIR conference on Research and development
in information retrieval, pages 1–10, New York, NY, USA, 1992. ACM.
[2] Ioannis K. Ioannis Arapakis. Using facial expressions and peripheral physiological signals as implicit indicators of topical relevance. In ACM Multimedia,
2009.
[3] Mikhail A. Lebedev and Miguel A. L. Nicolelis. Brain-machine interfaces:
past, present and future. Trends in NeurosciencesVolume 29, Issue 9, pages
536–546, September 2006.
[4] Onathan R. Wolpaw, Niels Birbaumer, Dennis J. McFarland, Gert
Pfurtscheller, and Theresa M. Vaughan. Braincomputer interfaces for communication and control. Clinical Neurophysiology, Volume 113, Issue 6:767–791,
2002.
[5] Leeb. Navigation in virtual environments through motor imagery. Proc. of
9th Computer Vision Winter Workshop, , eds. D. Skocaj, publ.: Slovenian
58
BIBLIOGRAPHY
59
Pattern Recognition Society, Piran, Slowenien, pages 99–108, 2004.
[6] Lie L. Liu; and Hong-Jiang Zhang. Automatic mood detection and tracking
of music audio signals. IEEE Transactions on Audio, Speech, and Language
Processing, 14:5–19, 2006.
[7] Rosalind W. Picard.
Affective computing.
The MIT Press, Cambridge,
September 1997.
[8] Yashar Moshfeghi. Affective adaptive retrieval: Study of emotion in adaptive
retrieval. In SIGIR 2008.
[9] Joemon M. Ioannis Arapakis. Affective feedback: An investigation in the role
of emotions in the information seeking process. In ACM SIGIR, 2008.
[10] Zhihong Zeng, Jilin Tu, Ming Liu, Thomas S. Huang, Brian Pianfetti, Dan
Roth, and Stephen Levinson. Audio-visual affect recognition. Multimedia,
IEEE Transactions on, 9(2):424–428, 2007.
[11] Maja P. Zhihong Zeng. A survey of affect recognition methods: Audio, visual,
and spontaneous expressions. IEEE Transactions on Pattern Analysis and
Machine Interlligence.
[12] Elisabeth A. Jonghwa Kim.
Emotion recognition based on physiological
changes in music listening. IEEE Transactions on Pattern Analysis and Machine Interlligence, 30:2067–2083, 2008.
[13] Ekman, Levenson, and Friesen. Autonomic nervous system activity distinguishing among emotions. Science, 221, 1983.
BIBLIOGRAPHY
60
[14] Guillame Chanel, Koray Ciftci, Javier C. Mota, Arman Savran, Luong H.
Viet, Lale Akarun, Alice Caplier, Michele Rombaut, and Bulent Sankur. Emotion detection in the loop from brain signals and facial images. Workshop on
Emotion-Based Agent Architectures, Third International Conference on Autonomous Agents, 2006.
[15] Guillaume Chanel, Julien Kronegg, Didier Grandjean, and Thierry Pun. Emotion assessment: Arousal evaluation using eeg’s and peripheral physiological
signals. Lecture Notes in Computer Science, Multimedia Content Representation, Classification and Security, 2006.
[16] Guillaume Chanel. Emotion assessment for affective-computing based on brain
and peripheral signals. PhD thesis, University of Geneva, Switzerland, 2009.
[17] Danny O. Bos. Eeg-based emotion recognition. 2007.
[18] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Multilayer perceptron for eeg signal classification during listening
to emotional music. TENCON - IEEE Region 10 Conference, 2007.
[19] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Support vector machine for eeg signal classification during listening to emotional music. In IEEE 10th Workshop on Multimedia Signal
Processing, 2008.
[20] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Eeg-based emotion recognition in music listening: A comparison
of schemes for multiclass support vector machine. In IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP’09), 2009.
BIBLIOGRAPHY
61
[21] Elias Vyzas. Recognition of emotional and cognitive states using physiological
data. Master’s thesis, MIT, 1999.
[22] Elias Vyzas and Rosalind W. Picard. Offline and online recognition of emotion expression from physiological data. Workshop on Emotion-Based Agent
Architectures, Third International Conference on Autonomous Agents, 1999.
[23] Jennifer Healey. Wearable and Automotive Systems for the Recognition of
Affect from Physiology. PhD thesis, MIT, 2000.
[24] Rosalind W. Picard, Vyzas, and Healey. Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23, Issue 10:1175–1191, 2001.
[25] Jonghwa Kim. From physiological signals to emotions: Implementing and
comparing selected methods for feature extraction and classification. IEEE
International Conference on Multimedia & Expo (ICME 2005), pages 940–
943, 2005.
[26] Jonghwa Kim and E. Andre. Emotion-specific dichotomous classification and
feature-level fusion of multichannel biosignals for automatic emotion recognition. Multisensor Fusion and Integration for Intelligent Systems, 2008. MFI
2008. IEEE International Conference on, 2008.
[27] Jonghwa Kim and Elisabeth Andre. Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and
Machine Interlligence, 30:2067–2083, 2008.
BIBLIOGRAPHY
62
[28] Jonghwa Kim and Elisabeth Andre. Biomedical Engineering Systems and
Technologies–Four-Channel Biosignal Analysis and Feature Extraction for Automatic Emotion Recognition. Springer Berlin Heidelberg, 2009.
[29] Jonghwa Kim and Elisabeth Andr´e. Multisensor Fusion and Integration for
Intelligent –Fusion of Multichannel Biosignals Towards Automatic Emotion
Recognition. Springer Berlin Heidelberg, 2009.
[30] C. Maaoui, A. Pruski, and F. Abdat. Emotion recognition for human-machine
communication. IEEE/RSJ International Conference on Intelligent Robots
and Systems, 2008. IROS 2008., pages 1210–1215, 2008.
[31] Robert Horlings, Dragos Datcu, and Leon J. M. Rothkrantz. Emotion recognition using brain activity. CompSysTech’08 ACM International Conference
Proceedings Series, 2008.
[32] Human nervous system, http://en.wikipedia.org/wiki/nervous system.
[33] M. Lyons and C. Bartneck. Hci and the face. Proceedings of the Conference on
Human Factors in Computing Systems (CHI2006), Extended Abstracts, 2006.
[34] John J. B. Allen. Frontal eeg asymmetry, emotion, and psychopathology: the
first, and the next 25 years. Biological Psychology, 67:1–5, 2004.
[35] John J. Nazarian. Issues and assumptions on the road from raw signals to
metrics of frontal eeg asymmetry in emotion. Biological Psychology, 67:183–
218, 2004.
[36] James A. Allen. Frontal eeg asymmetry as a moderator and mediator of
emotion. Biological Psychology, 67:7–49, 2004.
BIBLIOGRAPHY
63
[37] David N. Allen. A better estimate of the internal consistency reliability of
frontal eeg asymmetry scores. Psychophysiology, 46:132–142, 2009.
[38] http://en.wikipedia.org/wiki/10-20 system (eeg).
[39] John T. Cacioppo, Louis G. Tassinary, and Gary Berntson, editors. Handbook
of Psychophysiology. Cambridge University Press, March 2007.
[40] J. Allen. Issues and assumptions on the road from raw signals to metrics
of frontal eeg asymmetry in emotion. Biological Psychology, 67(1-2):183–218,
October 2004.
[41] Chih C. Chang and Chih J. Lin. LIBSVM: a library for support vector machines, 2001.
[42] Yuan-Pin Lin, Chi-Hong Wang, Tien-Lin Wu, Shyh-Kang Jeng, and JyhHorng Chen. Support vector machine for eeg signal classification during listening to emotional music. pages 127–130, October 2008.
[43] http://code.google.com/p/tempo.
[44] Leepeng P. Tan. The effelcts of background music on quality of sleep in
elementary school children. Journal of Music therapy, pages 128–150, 2004.
[45] Harmat, L´aszl´o Tak´acs, Johanna B´odizs, and R´obert. Music improves sleep
quality in students. Journal of Advanced Nursing, 62:327–335, 2008.
[46] Lai, Hui-Ling Good, and Marion. Music improves sleep quality in older adults.
Journal of Advanced Nursing, volum 53, issue 1:134, 2006.
BIBLIOGRAPHY
64
[47] Iber. The AASM Manual for the Scoring of Sleep and Associated Events:
Rules, Terminology and Technical Specification. 2007.
[48] http://en.wikipedia.org/wiki/sleep, retrieved on 29 april 2010.
[49] A. Rechtschaffen and A. Kales. A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects. US Government
Printing Office, US Public Health Service, Washington DC, 1968.
[50] Michael H. Silber.
Staging sleep.
Sleep Medicine Clinics, 4(3):343–352,
September 2009.
[51] Kelly A. Carden. Recording sleep: The electrodes, 10/20 recording system,
and sleep system specifications. Sleep Medicine Clinics, 4(3):333–341, September 2009.
[52] http://www.clevemed.com/crystalmonitor/overview.shtml, retrieved on july
20, 2010.
[53] http://en.wikipedia.org/wiki/polysomnography, retrieved on july 19, 2010.
[54] M. Hirshkowitz. Commentary - standing on the shoulders of giants: the
standardized sleep manual after 30 years. Sleep Medicine Reviews, 4(2):169–
179, April 2000.
[55] S. Himanen. Response to ”standing on the shoulders of giants: The standardized sleep manual after 30 years”. Sleep Medicine Reviews, 4(2):181–182,
April 2000.
[56] S. Himanen. Limitations of rechtschaffen and kales. Sleep Medicine Reviews,
4(2):149–167, April 2000.
BIBLIOGRAPHY
65
[57] Patrick Sorenson. Generating a signal: Biopotentials, amplifiers, and filters.
Sleep Medicine Clinics, 4(3):323–331, September 2009.
[58] Elise Maher and Lawrence J. Epstein. Artifacts and troubleshooting. Sleep
Medicine Clinics, 4(3):421–434, September 2009.
[59] T. Penzel. Computer based sleep recording and analysis. Sleep Medicine
Reviews, 4(2):131–148, April 2000.
[60] T. Penzel, M. Hirshkowitz, J. Harsh, R. D. Chervin, N. Butkov, M. Kryger,
B. Malow, M. V. Vitiello, M. H. Silber, C. A. Kushida, and A. L. Chesson.
Digital analysis and technical specifications. Journal of clinical sleep medicine,
3(2):109–120, March 2007.
[61] Peter Anderer, Georg Gruber, Silvia Parapatics, Michael Woertz, Tatiana Miazhynskaia, Gerhard Klosch, Bernd Saletu, Josef Zeitlhofer, Manuel J. Barbanoj, Heidi Danker-Hopfe, Sari-Leena L. Himanen, Bob Kemp, Thomas Penzel, Michael Grozinger, Dieter Kunz, Peter Rappelsberger, Alois Schlogl, and
Georg Dorffner. An e-health solution for automatic sleep classification according to rechtschaffen and kales: validation study of the somnolyzer 24 x 7
utilizing the siesta database. Neuropsychobiology, 51(3):115–133, 2005.
[62] Peter Anderer, Georg Gruber, Silvia Parapatics, and Georg Dorffner. Automatic sleep classification according to rechtschaffen and kales. In Annual
International Conference of the IEEE Engineering in Medicine and Biology
Society, volume 2007, pages 3994–3997. IEEE, August 2007.
BIBLIOGRAPHY
66
[63] N. McGrogan, E. Braithwaite, and L. Tarassenko. Biosleep: a comprehensive
sleep analysis system. In Engineering in Medicine and Biology Society, Annual
International Conference of the IEEE, November 2002.
[64] Caffarel Jennifer, Gibson G. John, Harrison J. Phil, J. GRIFFITHS Clive,
and J. DRINNAN Michael. Comparison of manual sleep staging with automated neural network-based analysis in clinical practice. Medical & biological
engineering & computing, pages 105–110, 2006.
[65] D. Alvarez-Estevez, J. M. Fernandez-Pastoriza, and V. Moret-Bonillo. A continuous evaluation of the awake sleep state using fuzzy reasoning. In Engineering in Medicine and Biology Society, Annual International Conference of
the IEEE, 2009.
[66] U. R. Abeyratne, S. Vinayak, C. Hukins, and B. Duce. A new measure to
quantify sleepiness using higher order statistical analysis of eeg. In Engineering in Medicine and Biology Society, Annual International Conference of the
IEEE, 2009.
[67] Flexer. An automatic, continuous and probabilistic sleep stager based on a
hidden markov model. In Applied Artificial Intelligence, pages 199–207, 2002.
[68] Maryann Deak and Lawrence J. Epstein. The history of polysomnography.
Sleep Medicine Clinics, 4(3):313–321, September 2009.
[69] John C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin
Classifiers, pages 61–74, 1999.
BIBLIOGRAPHY
67
[70] Ting F. Wu, Chih J. Lin, and Ruby C. Weng. Probability estimates for multiclass classification by pairwise coupling. J. Mach. Learn. Res., 5:975–1005,
2004.
[71] Marilisa G. Gibellato. Stochastic modeling of the sleep process. PhD thesis,
Ohio State University, 2005.
[72] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, Ivanov, R. G.
Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. PhysioBank,
PhysioToolkit, and PhysioNet: Components of a new research resource for
complex physiologic signals. Circulation, 101(23):e215–e220, 2000 (June 13).
[73] Beth Logan. Mel frequency cepstral coefficients for music modeling. In International Symposium/Conference on Music Information Retrieval - ISMIR,
2000.
[74] George Tzanetakis and Perry Cook. Marsyas: a framework for audio analysis.
Org. Sound, 4(3):169–175, December 1999.
[75] M. Slaney, K. Weinberger, and W. White. Learning a metric for music similarity. In ISMIR, pages 313–318, 2008.
[76] Beth Logan and Ariel Salomon. A music similarity function based on signal
analysis. In International Conference on Multimedia and Expo, volume 0,
pages 190+, Los Alamitos, CA, USA, 2001. IEEE Computer Society.
[77] Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. The earth mover’s
distance as a metric for image retrieval. International Journal of Computer
Vision, 40, 2000.
BIBLIOGRAPHY
68
[78] Jean-Julien Aucouturier and Francois Pachet. Music similarity measures:
What’s the use? In Ircam, editor, Proceedings of the 3rd International Symposium on Music Information Retrieval, pages 157–163, Paris, France, October
2002.
[79] Jean-Julien Aucouturier and Francois Pachet. Improving timbre similarity:
How high’s the sky?
Journal of Negative Research Results in Speech and
Audio Science, 1, 2004.
[80] E. Pampalk. A matlab toolbox to compute music similarity from audio. In
Proceedings of 5th International Conference on Music Information Retrieval,
Barcelona, Spain, 2004.
[81] Elias Pampalk, Arthur Flexer, and Gerhard Widmer. Improvements of audiobased music similarity and genre classification. In Proc. of Int. Symposium
on Music Information Retrieval, 2005.
[82] Kris West and Paul Lamere. A model-based approach to constructing music
similarity functions. EURASIP J. Appl. Signal Process., 2007(1):149, January
2007.
[83] Bingjun Zhang, Jialie Shen, Qiaoliang Xiang, and Ye Wang. Compositemap:
a novel framework for music similarity measure. In Proceedings of the 32nd
international ACM SIGIR conference on Research and development in information retrieval, pages 403–410, New York, NY, USA, 2009. ACM.
[...]... recognize musical emotion from Audience’s EEG signal instead of music item An online system was built to demonstrate this concept Audience’s EEG signal is captured when s/he listens to the music items Then Alpha frontal power feature is extracted from EEG signal A SVM classifier is used to classify each music items in three musical emotions: happy, sad, and peaceful In the second project, an EEG- assisted music. .. Brain Computer Interface (BCI) [3] Two years ago, I was surprised by the amazing applications of BCI technology such as P300 speller [4], and motor-imaginary-controlled robot [5], etc At that time I came up the idea that utilizing EEG signal in traditional MIR systems I have been trying to find a scenario where EEG signal can be integrated in a MIR system So far two projects have been conducted: EEG- based... pumps blood in the blood vessel, resulting in a peak in BVP signal Heart rate (HR) signal can be extracted from BVP easily BVP is also in uenced by emotions and stress Active feeling such as anger, fear or happiness always increases the value of BVP signal Electroencephalography (EEG) , the electric signal generated by neuron cells can be captured by placing electrodes on the scalp, as described in Figure... emotion from human’s physiological signal instead of low- 4 Chapter 2 EEG- based Music Emotion Annotation System 5 Figure 2.1: Recognize Musical Emotion from Acoustic Features of Music Figure 2.2: Recognize Musical Emotion from Audience’s EEG Signal level feature of music item, as described in Figure 2.2 A physiology-based music emotion annotation approach is investigated in this part The research problem... conducted: EEG- based musical emotion recognition and EEG- assisted music recommendation system The first project is musical emotion recognition from audience’s EEG feedback Music emotion recognition is an important but challenging task in music information retrieval Due to the well-known semantic gap problem, musical emotion cannot be accurately recognized from the low level features extracted from music items... follows EEG- based Music Emotion Annotation System is presented in detail in Chapter 2 Chapter 3 discuss the EEG- assisted music recommendation system Future work and Perspective are summarized in Chapter 4 Chapter 2 EEG- based Music Emotion Annotation System 2.1 Introduction Like genre and culture, emotion is one important factor of music which has attracted much attention in MIR community Musical emotion... This work addresses a healthcare scenario, music therapy, that utilizing Chapter 1 Introduction 3 music to heal people who suffer from sleep disorders Music therapy research has indicated that music does have beneficial effects on sleep During the process of music therapy, people are asked to listen to a list of music, which is pre-selected by music therapist In spite of its clear benefits towards sleep... consuming task for music therapist to produce a personalized music list Based on this observation, an EEG- assisted music recommendation system was proposed, which automatically recommends music for user according to his sleep quality estimated from EEG signal As the first attempt, how to measure sleep quality from EEG signal is investigated This work was recently selected for poster presentation in ACM... detect emotion states in earlier works, are summarized in Table 2.2 2.3.4 Feature Extraction and Classification To decode the emotion from physiological signals, many features have been presented Two popular features are spectral density in frequency domain and statistical information in time domain Chapter 2 EEG- based Music Emotion Annotation System (a) EEG Electrode Cap and EEG Amplifier (b) Experiment... electrode channels Each channel captures EEG signals continuously To perform real-time analysis, the EEG signals are collected by the signal acquisition module The module buffers the continuous signals and feeds them as smaller EEG segments of 1s in duration, into the analysis module The analysis module calculates the spectral power density and the frontal Chapter 2 EEG- based Music Emotion Annotation System ... progresses in the field of music information retrieval (MIR), grand challenges such as the intention gap and the semantic gap still exist Inspired by the current successes in the Brain Computer Interface... features are spectral density in frequency domain and statistical information in time domain Chapter EEG- based Music Emotion Annotation System (a) EEG Electrode Cap and EEG Amplifier (b) Experiment... Chapter Automatic Sleep Scoring using EEG Signal- First Step Towards a Domain Specific Music Recommendation System 42 ities by combining the output of multiple one-against-one binary classifiers [70]