() IEEE Transactions on Consumer Electronics, Vol 53, No 1, FEBRUARY 2007 Manuscript received January 15, 2007 0098 3063/07/$20 00 © 2007 IEEE 218 Person Identification System for Future Digital TV wi[.]
IEEE Transactions on Consumer Electronics, Vol 53, No 1, FEBRUARY 2007 218 Person Identification System for Future Digital TV with Intelligence Min-Cheol Hwang, Le Thanh Ha, Nam-Hyeong Kim, Chun-Su Park, and Sung-Jea Ko, Senior Member, IEEE Abstract — Intelligent digital TV (iDTV) is a future digital TV with intelligence which can automatically provide userpersonalized services for each audience For the userpersonalized services, the iDTV should recognize audiences in real-time In this paper, we define a novel structure of the iDTV and propose a real-time person identification system in the iDTV that analyzes captured images and recognizes audiences The proposed system consists of three processing units: preprocessing for reducing computational costs of the proposed system, face detection using a statistical approach, and face recognition using Support Vector Machines (SVMs) Experimental results show that the proposed system achieves efficient performance with high recognition accuracy of 90% or higher at the speed of 15~20 fps, which is suitable for the iDTV1 through the sensors The iEPGM manages the list of programs which is appropriate for audiences and provides the userpersonalized service to each person by using the information received from the iESM For example, for those who like to watch the sports, the iEPGM turns on the sports channel automatically iDTV Moving object tracking Sensor (Camera) Input images iESM Audience Person Identifier Person ID Index Terms — Intelligent digital TV, personalized services, audience identification, face detection, face recognition DB Personalized Service Provider I INTRODUCTION Recently with the development of compression technology and the digitization of TV programs, the digital TV (DTV) provides not only high quality audio-visual effects as in high definition TV (HDTV) but also datacasting and multimedia interactive services According to the increase of the amount of services provided by the DTV, conventional methods of channel selection such as browsing become impractical [1] The electronic program guide (EPG) can help viewers check future programs in advance However, the multi-channel DTV service delivers more the number of programs than that of programs which viewers can handle That results in information overload for users [2] The ability of the DTV to provide user-personalized services for the individual person automatically is required In order to provide the personalized service, a future DTV with intelligence (iDTV) needs to identify the audiences Considering the above requirements and problems, we define a novel structure of the iDTV as illustrated in Fig The proposed iDTV consists of two main components: intelligent environment sensor manager (iESM) and intelligent EPG manager (iEPGM) The iESM equipped with various sensors investigates the indoor environment including audiences, light, illumination, and noise The iESM performs the person identification by using the gathered information M.-C Hwang, L T Ha, N.-H Kim, C.-S Park, and S.-J Ko are with the Department of Electronic Engineering, Korea University, Seoul, 136-701, Korea (e-mail: sjko@dali.korea.ac.kr) Manuscript received January 15, 2007 iEPGM Personalized services Fig The novel structure of the iDTV Various identification technologies have been widely used for commercial purpose The most common personal verification and identification methods are the password/PIN (personal identification number) and token systems [3] Because those systems are vulnerable to forgery, theft, and lapses in users’ memory, biometric identification systems using pattern recognition techniques are attracting considerable interest For example, fingerprint, retina, and iris are used for identification technologies The above biometric technologies have high person identification accuracy for a large group of authorized members However, these techniques are not appropriate for the iDTV due to their inevitable drawbacks Compared with the above identification methods such as the fingerprint and iris recognition, face recognition has several appropriate characteristics, such as non-intrusive and user-friendly interfaces, low-cost sensors, easy setup, and active identification, for consumer applications [4] Generally, the system using human faces for person identification is separated into two parts: face detection and face recognition There exists much effort to improve the performance of face detection [5]-[7] and face recognition [3][4][8]-[14] Although considerable successes have been reported, it is still a difficult task to design an automatic face recognition system in realtime In this research, we focus only on the real-time person identification module in the iESM, and the others are left for 0098 3063/07/$20.00 © 2007 IEEE M.-C Hwang et al.: Person Identification System for Future Digital TV with Intelligence 219 Fig The processing flow of the proposed person identification system future research The proposed system consists of three phase: preprocessing for reducing computational costs of the proposed system, face detection using a statistical approach, and face recognition using Support Vector Machines (SVMs) The paper is organized as follows In Section II, the proposed person identification system is explained Section III presents the experimental results and the performance analysis of the overall system Finally, the paper concludes in Section IV the house Background subtraction and thresholding are performed to produce difference images The moving object is extracted from the difference image using a morphological opening, erosion followed by dilation, to remove small clusters This computational cost is much lower than the face detection method described in next subsection Background subtraction is not suitable where many objects are moving at the same time However, our iDTV considers a small family composed of only a few members That performance is efficient enough to be adopted into our proposed system Using the stereo image recognition system, we can obtain the distance information of each object from the cameras Face size is estimated by using the relationship between the distance from cameras to an object and the size of the object Therefore the face detector deals with only one scaled input image The detail algorithm used for the distance detection is presented in [16] and [17] II PROPOSED PERSON IDENTIFICATION SYSTEM FOR THE IDTV Fig portrays the processing flow of the proposed person identification system for the iESM The proposed system consists of three processing units The first unit is the preprocessing for reducing computational costs of next unit In the preprocessing, moving objects are detected and tracked The second is the face detection unit using a statistical approach with Haar-like features The region of moving objects are checked whether they contain faces or not independently in the face detection unit Haar-like features are efficient to detect faces with low costs, especially frontal faces [7] These Haar-like features are suitable for the proposed system because most captured faces of audiences are frontal The last is the face recognition using SVMs [11] Compared with other learning methods such as eigenfaces [12], hidden markov models (HMM) [13], and neural network (NN) [14], the training time of SVMs is short enough to be applied to the real-time iDTV Our proposed system can add a set of face images with new identity in the database and retrain it in real-time The details of each phase are described in the following subsections Fig Extended set of Haar-like features: (a) edge features (b) line features (c) center-surround features A Preprocessing The detector is applied to every location in the input image in order to find faces regardless of the position of the faces To detect faces larger or smaller than the window size, the input image is scaled and the detector is applied to all the scaled images As a result, the computational complexity increases drastically In the preprocessing, we propose the methods which can reduce the search range and the number of image scaling iterations for the face detection Generally, people are considered as moving objects We are able to reduce the search range of face detection by considering only the area related to the moving objects Background subtraction [15] which has been used to extract moving objects in many applications is applied In our system, background is easily estimated because the camera is attached to the iDTV in B Face Detection We use a statistical approach for faces detection, the approach originally developed by Viola and Jones [7], and then analyzed and extended by Lienhart and Maydt [18] A cascade of boosted tree classifiers as a statistical model and Haar-like features which are computed similar to the coefficients in Haar wavelet transforms are used in this method A highly accurate or strong classifier can be produced through the linear combination of many inaccurate or weak classifiers [19] Every weak classifier is trained by using a single feature and checks whether an object region at a certain location looks like a face or not Each feature is described by the template, its coordinate relative to the search window origin and the size of the feature In [18], the 14 different templates are used as shown in Fig IEEE Transactions on Consumer Electronics, Vol 53, No 1, FEBRUARY 2007 220 Each feature consists of two or three joined “black” and “white” rectangles either up-right or rotated by 45o The value of each feature is calculated as a weighted sum of two components: the pixel sum over the black rectangle and the sum over the whole feature area which are all black and white rectangles The weights of these two components are of opposite signs and for normalization, their absolute values are inversely proportional to the areas: for example, the black features in Fig 3(c) have weightblack = -9 × weightwhole There exist hundreds of features in real classifiers Computing pixel sums over multiple small rectangles is too slow to detect faces in real-time Viola [7] introduced an elegant method to compute the sums very fast First, an integral image, summed area table (SAT), is computed over the whole image I, where SAT ( X , Y ) = ∑ x < X , y