Author's Accepted Manuscript A Method for Image-based Shadow Interaction with Virtual Objects Hyunwoo Ha, Kwanghee Ko www.elsevier.com/locate/jcde PII: DOI: Reference: S2288-4300(14)00004-9 http://dx.doi.org/10.1016/j.jcde.2014.11.003 JCDE3 To appear in: Journal of Computational Design and Engineering Received date: Revised date: Accepted date: 22 August 2014 17 September 2014 18 September 2014 Cite this article as: Hyunwoo Ha, Kwanghee Ko, A Method for Image-based Shadow Interaction with Virtual Objects, Journal of Computational Design and Engineering, http://dx.doi.org/10.1016/j.jcde.2014.11.003 This is a PDF file of an unedited manuscript that has been accepted for publication As a service to our customers we are providing this early version of the manuscript The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain Journal of Computational Design and Engineering 00 (2013) 0000~0000 www.jcde.org A Method for Image-based Shadow Interaction with Virtual Objects Hyunwoo Ha and Kwanghee Ko* School of Mechatronics, Gwangju Institute of Science and Technology, 123 Cheomdangwagiro, Bukgu, Gwangju, 500-712, Republic of Korea Korea Culture Technology Institute, Gwangju Institute of Science and Technology, 123 Cheomdangwagiro, Bukgu, Gwangju, 500-712, Republic of Korea (Manuscript Received 000 0, 2013; Revised 000 0, 2013; Accepted 000 0, 2013) Abstract A lot of researchers have been investigating interactive portable projection systems such as a mini-projector In addition, in exhibition halls and museums, there is a trend toward using interactive projection systems to make viewing more exciting and impressive They can also be applied in the field of art, for example, in creating shadow plays The key idea of the interactive portable projection systems is to recognize the user’s gesture in real-time In this paper, a vision-based shadow gesture recognition method is proposed for interactive projection systems The gesture recognition method is based on the screen image obtained by a single web camera The method separates only the shadow area by combining the binary image with an input image using a learning algorithm that isolates the background from the input image The region of interest is recognized with labeling the shadow of separated regions, and then hand shadows are isolated using the defect, convex hull, and moment of each region To distinguish hand gestures, Hu’s invariant moment method is used An optical flow algorithm is used for tracking the fingertip Using this method, a few interactive applications are developed, which are presented in this paper Keywords: shadow interaction; Hu moment; gesture recognition; interactive UI; image processing; Introduction There have been the increasing demands for a more active and interesting viewing experience, and interactive projection technology has been considered as a solution to this issue For example, if you can flip pages with a gesture when you make a presentation, or write a sentence without any manual tools, then the presentations can be more immersive and attractive to the audiences An interactive projection system also helps people to produce more attractive artistic exhibits, such as interactive walls and floors Lately, a lot of attempts have been made to use human-computer interaction in plays and musical performances Namely, if appropriate events occur when an actor performs on stage, a better reaction can be obtained from the audience because such events are well synchronized with the actor’s performance Using this concept, new applications with interesting interactions are possible such as the magic drawing board or virtual combat simulation From a technical standpoint, research on gesture recognition is a topic of interest in the field of computer vision In particular recognizing gestures in real time is of paramount importance Most research groups use the Kinect camera to recognize gestures precisely because the Kinect camera can discern both depth and color information On the other hand, the Kinect cannot obtain depth and color information for shadows generated by light from behind the screen The detection range of the Kinect is limited when applied to a large screen because the distance from the sensor to the screen is considerably large Another method for gesture recognition is to recognize gestures from images The image-based approach is less expensive than the Kinect-based method because it uses less hardware for gesture acquisition In this work, a vision-based interactive projection system is proposed, which recognizes shadow gestures with proper precision The process consists of detection and recognition modules of shadow gestures in real time, which are the core parts of the proposed system Next, several novel applications based on the proposed system are presented to demonstrate the potential of the proposed method for use in various applications * Corresponding author Tel.: +82-62-715-3225, Fax.: +82-62-715-2384 E-mail address: khko@gist.ac.kr © Society of CAD/CAM Engineers & Techno-Press doi:xxxxxxx 0000 H Ha H and K Ko / Journal of Computationnal Design and Enggineering 00 (2014) 0000~0000 Numerous sttudies have beeen conducted reggarding interacttive projection systems s In partiicular researcheers are interestedd in generating evennts using hand gestures g becausee the hand gesturres can represennt diverse shapess appropriate forr recognition Mistry et al [1] proposed a portable interacctive projection system, SixthSeense, based on natural n hand gesstures It providees a wearable gestuural interface thaat allows the user to interact withh digital informaation augmentedd around the useer The system conc sists of a portabble projector, a camera c and a moobile wearable device, d which shhows digital infoormation on phyysical objects in real r time Grønbækk et al [2] introd duced an interacctive floor suppoort system usingg a vision-based tracking methodd The system conc sists of a 12 m2 glass surface with w a projectorr that projects thhe glass upward Limbs of userss (children) are tracked and reccognized for variouus interactions, which w provide learning environnments for childrren Wilson of Microsoft M Researrch [3] reported the prototype of ann interactive tabletop projectionn-vision system, called PlayAnyywhere, which allows a the user to t interact with virtual objects proojected on a flaat surface For thhis interaction, the t shadow-bassed finger recoggnition, tracking, and various otther image processiing are incorporaated to provide a convenient buut flexible tablettop projection-viision system It consists of off-ttheshelf commodiities such as a caamera, projectorr, and a screen that t not requiire any detailed configurations or o calibration Berard [4] developped the “Magic Table” for meettings It has a whhiteboard on thee surface It was developed to ovvercome the lim mitation of the currrent whiteboard by providing various operationns such as copy,, paste, translatioon, and rotationn of the drawn conc tents It consistts of a projector, two cameras annd a white boardd The pen strokee and the contennts on the board are captured by the cameras The captured c images are then processsed to extract thhe position and the contents usiing various imagge processing teechniques Practiccally, this system m allows the user u to interactivvely create andd control the coontents In addiition to the aboovementioned systtems, various oth her projection syystems have beeen developed woorldwide [5]-[7] This paper iss structured as fo ollows: Section presents the ovverall process off the proposed algorithm a Sectioon explains thee Figure Overview O of thee system process of detection d and sep paration of imagge data In Secttion 4, the recoggnition process for f distinguishinng hand gesturees is presented In Section 5, the traacking process of o shadows is preesented Sectionn shows the exxperimental resuults of the propoosed algorithm Finaally, the conclusiion of the paper is presented witth future work inn Section Overall Prrocess Figure shoows the entire sy ystem consistingg of a beam projjector, a screen, a web camera and a computerr If a user createes a gesture, the shaadow is created on o the screen, which w is capturedd by the camera The computer then t performs calculations in orrder to recognize thhe gesture throu ugh image proceessing Next, thee computer conntrols the beam projector p to creeate an event at the proper place inn real-time The overall workflow w of thee proposed systeem is illustrated in Figure Firrst, the computerr receives an inpput image from the web camera The image is processed to produuce a binary image Then, an AN ND operation is performed on thhe background and the binary imagge in order to reemove the backgground Shadow ws that are distinnct from the backkground are dettected using a laabeling algorithm The area of the hand can be obbtained through curvature, c a connvex hull, and deefect in each labbeled area The cenc ter of the hand can be recognizzed using the mooment value Invvariant momentss are used for gesture recognitionn After the gestture is recognized, the t shadow hand d is traced by ann optical flow allgorithm Finallyy, events correspponding to the gesture g are geneerated and given too the user In the subsequent secttions, each moddule in the overalll process is expplained in detail Separation and Detection n Process In this section, the technical ap pproaches for sepparation and dettection are explaained Given an image, i the shadoow part is extracted using thhe background seeparation and shhadow detectionn methods H Ha H and K Ko / Jourrnal of Computationnal Design and Enggineering 00 (2014) 0000~0000 0000 Figure Shadow gestu ure recognition process p 3.1 Backgroun nd Separation Process P The backgrounnd separation step segments the image i into the background b and objects For thiss operation, an im mproved averagging backgroundd algorithm is em mployed 3.1.1 Averaging ng background algorithm a The averaginng background algorithm a is usedd in order to disttinguish betweenn the backgrounnd and the objeccts in an image The T algorithm is deesigned to generaate a backgrounnd model using the t mean and vaariance of each pixel, p and to dellete the backgrouund based on the model m When thee current frame is i present betweeen the upper annd lower threshoolds obtained froom the backgrouund model, we consider it as a background Otherw wise, it is recoggnized as an objeect First, to obtaain the backgrouund model, the images of each fraame are accumu ulated for some period p The form mula is expressedd as dst1( x, y ) ← dst1( x, y ) + frame ( x, y ) (1)) Figure Problem with the t averaging baackground algorrithm Here, dst1(x,y) is a pixel at thee position of x annd y in the imagge of dst1, and frame(x,y) f indicate the pixel vallue at x and y off an image in a fram me Next, varian nce is needed to generate a backkground model Accumulating the t absolute valuue of the differeence between the preevious frame an nd the current fraame is carried ouut as dst ( x, y ) ← Pframe ( x, y ) − frame ( x, y ) (2) Here, dst2(x,y)) is a pixel at thee position of x and a y in the imagge of dst2, and Pframe(x,y) P is thhe pixel value att x and y of the image in the previious frame We obtain o average values v of dst1 annd dst2 by dividding the total num mber of frames as a follows dst1( x, y ) ← dst1( x, y ) total dst ( x, y ) ← dst ( x, y ) total (3) 0000 H Ha H and K Ko / Journal of Computationnal Design and Enggineering 00 (2014) 0000~0000 The upper and lower threshold d values are dettermined by calcculating the adddition and subtraaction of values If a user wantss to adjust the rangee where the bacckground is recoognized, he/she can c this by addjusting the threeshold In this paper, the upper and a lower thresholdds are calculated d through the folllowing formulaae upper ( x, y ) ← dst1( x, y ) + dst ( x, y ) lower ( x, y ) ← dst1( x, y ) − dst ( x, y ) (4) To apply this method m in real-tim me, we need to introduce i the rattio α , which is the ratio of thee accumulated values v and currennt frame values The T formula is ex xpressed as folloows dst ( x, y ) ← (1 − α )C dsst ( x, y ) + α C frame ( x, y ) (5) Figure represents the resu ult of the averagiing backgroundd algorithm It inndicates that the shadow can bee detected when the user’s shadow appears after th he background is recognized byy accumulating the t first 30 fram mes The separatted portions are divided with certtainty by using a reverse binariization method provided p througgh the OpenCV library [8] In addition, a stationnary objects are recoognized as belon nging to the backkground, and onnly moving objeccts are detected because of the real-time r updatees Figurre Separation between b backgrround and shadoow 3.1.2 Problemss with the averag ging backgroun nd algorithm It was determ mined that somee problems occuur because the background b is updated u in real-ttime Shadows are a recognized as a a background whhen they stay in n the same placee for longer thann a certain periood of time Thenn, if the shadow w moves, its form mer location is still regarded as a sh hadow The righht figure in Figuure shows the aforementioned a problem that thhe previous shaddow is still detected as a shadow in the current fram me although it noo longer exists In the previoous studies, the depth d and color information wass used to solve the t same problem m; however, wee have only the shas dow’s 2D imagge information To solve this prroblem, a currennt binary image should be emplloyed The imagge of the averagging background alggorithm and thee current binaryy image are recaalculated by thee AND operatioon This methodd contributes to the improved recoggnition of the sh hadow Figure shows the proceess of solving thhe problem Figure AND D operator (a) Im mage of the aveeraging backgrouund algorithm (bb) Current binaryy image (c) Imaage after the AND D opeeration 3.2 Shadow Deetection Processs H Ha H and K Ko / Jourrnal of Computationnal Design and Enggineering 00 (2014) 0000~0000 0000 Once the bacckground is sep parated, the shaddows are then prrocessed for reccognition The issolated shadowss are labeled forr an efficient accesss using the labeliing algorithm 3.2.1 Labeling algorithm g algorithm [9] is i as follows A binary image has h only values of o Zero (0) and One (1) The allgoThe principle of the labeling rithm begins att a pixel (the leftt-top pixel) If thhe value of one is i not present in every direction,, the algorithm continues c searchhing When the first pixel that has th he value of one is i detected, it is marked as the starting s point Thhe endpoint is where w the last vaalue of one is detectted Figure dep picts the operatioon of the labelinng algorithm Fiigure Operatioon of the labelingg algorithm 3.2.2 Region off interest (ROI) To distinguissh the hand and d to increase the processing speeed, we need to set s a region of innterest (ROI) wiithin the previouusly labeled areas Image I processing can be made faster f by using an a ROI image innstead of using thhe entire image Examples of RO OIs are shown in Fiigure Fiigure Examplee of regions of innterest Recognition n Process This paper iss focused on thee recognition of hand h gestures, which w may find a lot of applicatiions in diverse areas a Detecting the hand region reppresented in shaadows, howeverr, can be limitedd in that the shaadows not haave depth or coolor informationn In order to overcoome this limitatiion, this paper proposes p a methhod of extractingg the hand area only using convvex hull and deffect information 4.1 Recognition n of the Hand Region R Given ROIs’’ in the image, th he hand region is i detected for geesture recognitioon 4.1.1 Convex hull h & defect dettection This methodd consists of threee steps as follow ws Step 1: In ordeer to extract the convex hull andd defects, we neeed to determine the contours off the regions Thhe contour inform mation is obtained using the Canny edgee detection algorrithm [10] Figurre shows the detected d contourrs of the shadow w regions 0000 H Ha H and K Ko / Journal of Computationnal Design and Enggineering 00 (2014) 0000~0000 Figure Deetected contourss Step 2: Many researchers hav ve been studying and developinng algorithms too search for connvex hulls, suchh as Gift wrappping [11] annd Quick hull [1 12] The convexx hull is the shoortest closed paath including alll points given a set of points An examplle of a convex hull h for points is given in Figurre Figure Convvex Hull Principple In this paaper, the Graham m scan algorithm m [13][14] was chosen c for conveex hull computaation The algoritthm relies upon the principlee that a point can nnot be part of thhe convex hull when w a triangle consisting of thrree points includdes that point That T is, if therre are points S, A, A B, C, D as in Figure 10 (a), S is selected to bee the smallest vaalue of the y-axis (if it has the saame value, thee largest point of o the x-axis is seelected) Then, the t angles of all four points (A, B, C, D) from point p S are obtainned and sorteed in the order of o size As a resuult, S, A, and B are put on the sttack, and the scaan algorithm beggins for all remaaining pointts We can deterrmine whether a point is on the convex hull byy checking the direction d of the cross c vector arouund each resuult of the scan If I the cross vector of the three stacked s points iss negative, it meeans that the poiint is located insside the trianggle In that case, the point is rem moved from the stack, s and the neext point is stackked A convex hull h can be obtainned as shownn in Figure 10 (d d) once all poinnts are scanned We can obtain convex hulls seeparately for eacch ROI because the searchingg algorithm operrates independenntly in each ROII (a) (b) (c) (d) Figuure 10 Convex hull h searching allgorithm an other objectss from the shadoow, a new methood that uses onlyy the shape of thhe shadow is neccesStep 3: To disttinguish hands and sary because there is no n depth or coloor information thhat can be utilizzed Thus, we prropose a methodd for extracting the hand arrea To classify the t shadow of thhe hand, the best method is to plot p the location of the wrist In this t step, defectss of shadow w are used for thiis purpose Defeects are defined as the farthest point p from the liine segment madde by two pointss of the connvex hull In otheer words, they are a the points onn contour lines that t have the lonngest distance between the shaddow and connvex hull line We W can draw thee line perpendiccular from the coonvex hull line to t the shadow using u a straight-lline H Ha H and K Ko / Jourrnal of Computationnal Design and Enggineering 00 (2014) 0000~0000 0000 equationn The shadow edge e point that has h the longest straight s line is thhen identified ass the defect poinnt Figure 11 shoows the defeect points Figure 11 Defect points and the experim mental result 4.1.2 Resettingg the ROI For faster im mage processing,, the smaller shaadow regions aree considered Orriginal ROIs oftten change becaause the aspect raatio of the ROI imaage size is modiffied according too the length of thhe arm, which makes m it difficultt to identify a haand gesture becaause we can only reccognize gestures through momeent values How wever, if we cropp the image at thhe wrist positionn, our ROI incluudes only the hand regardless r of thee length of the arm a The momennt values are nott changed becauuse the aspect raatio of the new ROI R is fixed Thereffore, resetting th he ROI will conttribute to recognnizing hand gestuures We can reset the new ROII using the first and last defects Figgure 12 indicates the new ROI using u defect poinnts Figure 12 Resetting a RO OI 4.2 Recognition n of Hand Gesttures In this section, we describe thee process to be used in distinguisshing the variouus hand gestures Recognition alggorithms are ex-ecuted by calcuulating momentss of ROIs Accorrding to the uniqqueness theorem m [15], if it is asssumed that the density d distributiion function f ( x, y ) finite part of thhe by f ( x, y ) ; xy is piecew wise and continuoous, and therefoore, a bounded fuunction; it can have h nonzero vallues only in the plane, an nd then, the mom ments of all ordeers exist The moments m sequencce { mij } is uniqquely determinedd and contrariw wise, f ( x, y ) is uniquely dettermined by { mij } It should be b noted that the restriction as- sumption is impportant; otherwiise, the abovemeentioned uniquenness theorem may m not hold 4.2.1 Hu invarriant moments 0000 H Ha H and K Ko / Journal of Computationnal Design and Enggineering 00 (2014) 0000~0000 In digital imagees, the two-dimeensional ( i + j )th order momeents of f ( x, y ) mij = ∑ f ( x, y ) x i y j ( i, j = 0,,1, …) are defined in i terms of Riem mann integrals ass (6)) x, y m value, annd it is utilized as a a reference pooint when an eveent occurs The The centroid off gravity is deterrmined by the moment ´ ´ centroid of gravvity ( x, y ) willl be expressed as a ´ x= m10 ´ m01 ,y= m00 m00 (7) Figure 13 Ceenter of the handd Figure 13 indiccates the centroid of the hand annd the farthest loocated convex hull h from the cennter This point is i very useful whhen users want to select s a benchm mark, such as thee mouse pointerr The central mooments not change c under traanslation [16] The T central momennts shifted by a centroid are definned as i j ´ ´ μij = ∑ f ( x, y ) ⎛⎜ x − x ⎞⎟ ⎛⎜ y − y ⎞⎟ ⎝ ⎠⎝ ⎠ x, y Here, i and j (8) represent thee horizontal x-axxis and vertical y-axis, y respectivvely, in Eq (8) The T central mom ments have a relaa- tionship as folloows: μ00 = m00 μ10 = μ01 = (9) (10) ´2 μ20 = m20 − μ x (11) ´ μ11 = m11 − μ xy (12) ´2 μ02 = m02 − μ y (13) ´3 ´ μ03 = m03 − 3m02 μ y + 2μ y ( (14) ´3 ´ μ30 = m30 − 3m20 μ x + 2μ x ´ ´ ( (15) ´2 ´ μ21 = m21 − m20 y − 2m11 x + 2μ x y (16) H Ha and K Ko / Journal of Computational Design and Engineering 00 (2014) 0000~0000 ´ 0000 ´ ´ μ12 = m12 − m02 x − 2m11 y + μ xy (17) The mathematical interpretation of the moments is as follows μ02 μ20 μ11 μ12 μ21 μ30 μ03 : The dispersion of the horizontal axis : The dispersion of the vertical axis : The covariance of the horizontal and vertical axes : The degree of dispersion of the left side compared to the right side in the horizontal axis : The degree of dispersion of the lower direction compared to the upper direction in the horizontal axis : The degree of asymmetry in the horizontal axis (skew) : The degree of asymmetry in the vertical axis (skew) The normalized moments are obtained by dividing the values of consistent size, and those give the invariable characteristics at that size [17] The normalized moments are defined as ηij = μij i+ j ,γ = +1 γ μ00 (18) In this work, we extract the Hu invariant moments [18] through (18) and (25) and use them for the gesture recognition algorithm The Hu invariant moments consist of 2nd and 3rd order central moments, and are as follows: I1 = η 20 + η02 (19) I = (η 20 + η02 ) + 4η112 (20) I = (η30 − 3η12 ) + (3η 21 − η03 ) I = (η30 + η12 ) + (η 21 + η03 ) (21) (22) I = (η30 − 3η12 )(η30 + η12 )[(η30 + η12 ) − 3(η 21 + η03 ) ] + 3(η 21 − η03 )(η 21 + η03 )[3(η30 + η12 ) − (η 21 + η03 ) ] (23) I = (η 20 − η02 )[(η30 + η12 ) − (η 21 + η03 ) ] + 4η112 (η30 + η12 )(η 21 + η03 ) (24) I = (3η12 − η30 )(η30 + η12 )[(η30 + η12 ) − 3(η 21 + η 03 ) ] − (η30 − 3η12 )(η 21 + η03 ) [ 3(η30 + η12 ) − (η 21 + η03 ) ] (25) Our analysis of the Hu invariant moments defined in the above equation is as follows: I1 : The sum of the dispersion of the horizontal and vertical directions The more the values are spread out along the horizontal and vertical directions, the greater the value is I : The covariance of the horizontal and vertical directions (if dispersions of the horizontal and vertical directions are similar.) I : The emphasizing value of dispersion of the horizontal and vertical directions 0000 H Ha H and K Ko / Journal of Computationnal Design and Enggineering 00 (2014) 0000~0000 o dispersion off the horizontal and a vertical direcctions I : The countervailing value of I , I : Size, rotation, and traanslation invariaant moments w invariant is usseful in distinguishing mirror im mages I : The skew orthogonal invaariants This skew Hu invariantt moments are influenced i by thhe brightness of the image Even when the userr makes the sam me gesture, the mom ments have diffferent values The T most frequennt variation of invariant i momennts occurs in onne particular corrner Therefore, we propose a methhod to improve the recognition rate This method is to measuree each differentt Hu invariant moment m by dividding the image into four-parts, becaause the brightneess of the shadoows changes acccording to the poosition of the im mages This methhod contributes to an a increase in thee recognition ratte Figure 14 reppresents how thee image is divideed Figure 14 Subdiviision of the recoognition area When the Huu invariant mom ments are compaared with the pree-computed onees in the databasse, valid ranges of o the moment valv ues are consideered for handling g the variations of the computedd Hu invariant values v in order too improve the roobustness of reccognition The rannges need to be selected considdering the enviroonment of the system In this work, w 2%~3% of o the Hu momeents were considereed for the ranges of each momennt values 4.2.2 Multiscaale retinex algo orithm Shadows aree not detected co orrectly when thhe ambient lighht is strong Figuure 15 displays one such case of o strong light The T hand's shadow is observed in th he gray scale im mage, but only paart of a finger orr nothing at all iss detected in thee binary image Figure 15 Erro or in strong illum mination The leeft is a grey scalee image The rigght is a binary im mage In order to solve s this probleem, we use the multiscale m retineex algorithm Thhe multiscale reetinex algorithm m is widely used for improving imaage differences [19] [ The algorithm assumes thhat a scene in ann image consistss of two compoonents, illuminattion and reflectancee The clarity of the image can be b improved wheen the illuminatiion element is reemoved from sceene An equation is applied only foor the value chan nnel of the input image represennted using the HS SV color model as follows: N RMSRi = ∑ωn {log Si ( x, y ) − log[Gn ( x, y ) *S n ( x,, y )]} (26) n =1 RMSRi is the resulting r image, S represents thee scene intensityy, G indicates thee Gaussian filterr, and ω refers too the weight coeef- ficient Figure 16 shows the ressult of applying this multiscale retinex r algorithm m The formerlyy invisible hand is i now clearly visible H Ha and K Ko / Journal of Computational Design and Engineering 00 (2014) 0000~0000 0000 Figure 16 Result of the multiscale retinex algorithm 4.3 Discussion on Hand Gesture Recognition Extracting the hand region is critical for hand gesture recognition because features of each gesture are computed based on the shadow of hands The proposed method extracts the hand region of a labeled sub-image using an ROI, and then resets the ROI using the first and last defects, which produces a more compact region for the hand This procedure can mostly extract the hand region from the shadows of various objects such as arms and head as shown in Figure However, the robustness of the shadow region for hands can be compromised because the defects may not be computed consistently Therefore, in this work, the user is asked to make a gesture such that the shadow regions of the hand are consistently obtained Robust extraction of the hand region is necessary for more diverse gesture recognition In order to solve this problem, the color information as well as data from other sensors needs to be utilized This extension is not discussed in this work, but it is recommended for future work Tracking Process In this section, we present a tracking algorithm to improve the recognition rate Most tracking algorithms such as SIFT [20]or SURF [21] use feature points extracted from the image However, there is no feature point in shadows because shadows only have binary images In this paper, we use an optical flow algorithm proposed by Lucas and Kanade [22] to solve this problem Optical flow indicates the relative movement as viewed by an observer The optical flow in the image means that the speeds of position changes of the pixels in the frame are represented as two-dimensional vectors There are various methods of calculating optical flow This paper uses the Lucas–Kanade optical flow algorithm, which is spatiotemporal gradient-based To calculate the optical flow, if brightness is constant from time the t to t + δ t corresponding to ( x, y ) coordinates, we can get the image constraint equation as a flow: I ( x, y ) = I ( x + δ x, y + δ y , t + δ t ) (27) We applied the Taylor series at equation (27) Neglecting high-order terms, the following expression is derived ∂I dx ∂I dy ∂I + + =0 ∂x dt ∂y dt ∂t Here, v = It = ∂I ∂t ⎡ dx dy ⎤ ⎢ dt dt ⎥ ⎣ ⎦ (28) T is the optical flow at a pixel I ( x, y ) and ⎡ ∂I ∂I ⎤ ∇I = ⎢ ⎥ ⎣ ∂x ∂y ⎦ ( A A) T represents the gradient in the space indicates the variation of brightness in accordance with time Assuming that the optical flow v is constant in the small search area, it can be estimated using the least squares method as v= T −1 AT ( −b) (29) 0000 Here, H Ha H and K Ko / Journal of Computationnal Design and Enggineering 00 (2014) 0000~0000 ⎡ ∑I x A A =⎢ ⎢⎣ ∑I x I y T ∑I I ∑I x y y ⎤ ⎥ ⎥⎦ ⎡ I t1 ⎤ ⎢ ⎥ ,b=⎢ # ⎥ ⎢ It ⎥ ⎣ n⎦ This methodd of tracking gesstures is perform med using the Lucas–Kanade L optical flow algoorithm Thereforre, we can generrate the desired eveent at the requireed position Traacking does not always involve using optical fllow Optical flow w is used as a sups plement when Hu H invariant mo oments not reecognize the gessture Experimen ntal Results In this study, reecognition of haand gestures is teested using the values v of seven Hu invariant moments, m and fouur interactive appplications are pressented using the hand gesture reecognition 6.1 Hu Invariaant Moments Measurement M Four gesturees are considered d for interactionn, which are the movements of the palm, fist, one o finger, and two t fingers that are shown in Tablee In theory, th he same gesture should have thee same Hu invariiant moments, but b the values woould differ for each e gesture This phenomenon is due d to the inconnsistent brightnesss of the projecttor, the surroundding illuminatioon, and camera perp formance Fourr different environmental condittions were appliied for the Hu innvariant momennt computation For F each conditiion, different Hu innvariant momentts are computedd Table summ marizes the valuues of Hu invariaant moments foor the four differrent gestures in the four different en nvironmental coonditions Some values are simillar, and some off the values are significantly s diffferent In particullar, the 7th Hu in nvariant momennt values are unnstable, and, theerefore, we not n use them in the demonstratiion These values are a relative, and not absolute Inn other words, thhese values are applicable a in ouur environment only o These valuues are one exampple, and if the ex xperiment is performed in a diifferent environm ment, it is recom mmended to meeasure these vallues again Table Vallues of Hu invarriant moments according to gestture and environnmental conditioons In this experriment, we disco overed that the recognition ratee of Hu invariannt moments is improved by usiing the subdivission method Table indicates the numbers n of recoognized frames using u Hu invariaant moments Thhere are 1800 fraames / in tootal In the case of the t palm, 1592 frames f were reccognized before the subdivision method was appplied; however,, 1638 frames were w H Ha H and K Ko / Jourrnal of Computationnal Design and Enggineering 00 (2014) 0000~0000 0000 recognized afteer the subdivision method is applied a The reccognition rates of o other gesturees were also im mproved Figure 17 shows the grapphs of the enhancced recognition rates In terms of o the average value, v the total reecognition rate was w increased frrom 1475 frames (881.9%) to 1584 frames (88%) This result show ws that the recoognition rate cann be elevated if more divisions are applied T Table Number of recognized frrames Figure 17 Reccognition rate chhart 6.2 Demonstraation of Gesturee Recognition We conducteed a simple dem monstration usinng three types of gestures If the user made a palm p gesture, thhe board was iniitialized A curve was drawn on the t board when the user made a single-finger gesture g Finally, the board was ready r to extract the color of pixels in response to a two-finger geesture Figure 188 shows a drawing depicting thhe sea created using u the developped drawing board Figgure 18 Examplee of the drawingg board We also expperimented the proposed p recognnition method using u different gestures g such ass finger puppet imitations of guuns, hearts, dogs, annd cats The efffect of these gestures in a shadoow play can creaate a great deal of interest and entertainment The T values of the Hu H invariant mo oments are summ marized in Tablle When the user u made a gunn gesture, a gunnshot was heardd A heart image apppeared when thee user made a heeart gesture If a dog image appeears, a bark is geenerated Taable Values of Hu H invariant mooments 0000 H Ha H and K Ko / Journal of Computationnal Design and Enggineering 00 (2014) 0000~0000 6.3 Application n Experiment A shadow allone can be used d to make an innteresting perform rmance Figure 19 indicates an example showinng such possibillity Figure 19 (a) iss the shadow aftterimage effect, which is meantt to simulate a drreamlike sensatiion Figure 19 (bb) shows that iff the shadow moves, the ball tracks the shadow in real-time In Figuure 19 (c), the loogo is bound to the shadows Peeople can be parrt of the exhibition using u this techniique These exam mples not usse gesture recognnition Howeverr, if you add gessture recognitionn to this technique, interesting content can be createed w (c) Simple boounce game Figure 19 Application Experiments (aa) Shadow afterimage effect (b) Tracking shadow Conclusion n In this paperr, we presented a shadow gesturre recognition method m for use with w an interactivve projection sysstem In the fieldd of interactive projjection systems,, there has beenn no research abbout shadow geesture recognitioon since 2000 In I that respect, this t paper provides a ground for vaarious novel appplications using the t shadow gestture recognition concept and oppens a new reseaarch topic related wiith an interface of o a human withh a virtual contennt The shadow w gesture recogn nition that was presented p in thiss work may nott be directly appplicable for enggineering purposes However, various algorithms developed in n this work can n be used for engineering e ap pplications such h as vision-bassed monitoring an nd management in a manufa acturing site For F example, images of the current fabrica ation process are processed to obtain binary images, from which w various features and shapes s can be e analyzed for checking c the curc rent status Recognition R using Hu invarian nt moments ca an be used forr recognition off various produ ucts in the manufacturing proccess There are soome limitations of the proposed system 3D gesture g recognitiion is impossiblle because 2D-bbased Hu invariiant moments are used u for the reco ognition of gestuures This is an inherent limitattion of the propoosed method, which w would resttrict the scope of appplication of thee proposed methhod We are considering to com mbining 3D scannners with our method m for more sophisticated gestture recognition n Second, the roobustness of imaage processing iss not always asssured It means that t at some connditions the gesturre recognition fails, fa and unexpeected results woould happen Theese two problem ms including usee of additional sens sors for an impproved recognitio on are recommeended for future work Acknowled dgments The researchh is supported by y the Basic Scieence Research Program P throughh the National Research R Foundaation of Korea (NRF) ( funded by the Ministry M of Educcation, Science and a Technologyy (2011-00100999) H Ha and K Ko / Journal of Computational Design and Engineering 00 (2014) 0000~0000 0000 References [1] Mistry P, Pattie M SixthSense: a wearable gestural interface In: ACM SIGGRAPH ASIA 2009 Sketches; 2009 [2] Grønbæk K, Iversen OS, Kortbek KJ, Nielsen KR Aagaard L, Interactive floor support for kinesthetic interaction in children learning environments Human-Computer Interaction–INTERACT 2007; Springer Berlin Heidelberg; 2007; pp 361-375 [3] Wilson AD PlayAnywhere: a compact interactive tabletop projection-vision system In: Proceedings of the 18th annual ACM symposium on User interface software and technology, ACM; 2005 [4] Bérard F The magic table: Computer-vision based augmentation of a whiteboard for creative meetings In: IEEE International Conference in Computer Vision, Workshop on Projector-Camera Systems (PROCAMS'03); 2003 [5] Kjeldsen R, Pinhanez C, Pingali G, Hartman J, Levas T, Podlaseck M Interacting with Steerable Projected Displays In: Proceeding of Automatic Face and Gesture Recognition; 2002; Washington, DC, USA; p 402-407 [6] Pinhanez C, Kjeldsen R, Levas A, Pingali G, Podlaseck M, Sukaviriya N Applications of Steerable Projector-Camera Systems In: Proceedings of ICCV Workshop on Projector-Camera Systems (PROCAMS’03); 2003; Nice, France [7] Holman D, Vertegaal R, Altosaar M, Troje N, Johns D PaperWindows: Interaction Techniques for Digital Paper In: CHI ’05 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; 2005; Portland, Oregon USA; p 591-599 [8] OpenCV dev team, “OpenCV Documentation”, http://docs.opencv.org/; 2013 [9] Chang F, Chen C-J, Lu C-J A Linear-time Component-labeling Algorithm using Contour Tracing Technique Computer Vision and Image Understanding 2004; 93(2): 206-220 [10] Suzuki S Topological structural analysis of digitized binary images by border following Computer Vision, Graphics, and Image Processing 1985; 30(1):32-46 [11] Barber C, Bradford D, Dobkin P, Huhdanpaa H The quickhull algorithm for convex hulls ACM Transactions on Mathematical Software (TOMS) 1996; 22(4):469-483 [12] Sugihara K Robust gift wrapping for the three-dimensional convex hull Journal of Computer and System Sciences 1994; 49(2):391407 [13] Sklansky J Finding the convex hull of a simple polygon Pattern Recognition Letters 1982; 1(2):79-83 [14] Graham RL, Yao FF Finding the convex hull of a simple polygon Journal of Algorithms 1983; 4(4): 324-331 [15] Papoulis A, Pillai SU Probability, random variables, and stochastic processes Tata McGraw-Hill Education, 2002 [16] Nadler M, Smith EP Pattern recognition engineering New York: Wiley; 1993 [17] Flusser J, Suk T, Zitová B Moment Invariants to Translation, Rotation and Scaling Moments and Moment Invariants in Pattern Recognition; 2009 [18] Hu M-K Visual pattern recognition by moment invariants Information Theory, IRE Transactions on, 1962; 8(2):179-187 [19] Jobson DJ, Rahman Z, Woodell GA A multiscale retinex for bridging the gap between color images and the human observation of scenes Image Processing, IEEE Transactions on 1997; 6(7):965-976 [20] Lowe DG Object recognition from local scale-invariant features In: Computer vision, 1999 The proceedings of the seventh IEEE international conference on; Kerkyra, Greece; 1999 [21] Bay H, Tuytelaars T, Gool LV Surf: Speeded up robust features In: Computer Vision–ECCV 2006; Springer Berlin Heidelberg; 2006 [22] Lucas BD, Kanade T An iterative image registration technique with an application to stereo vision In: IJCAI’81 Proceedings of the 7th International Joint Conference on Artificial Intelligence; 1981; p 674-679 ... recognition method is based on the screen image obtained by a single web camera The method separates only the shadow area by combining the binary image with an input image using a learning algorithm that... summ marized in Tablle When the user u made a gunn gesture, a gunnshot was heardd A heart image apppeared when thee user made a heeart gesture If a dog image appeears, a bark is geenerated Taable...Journal of Computational Design and Engineering 00 (2013) 0000~0000 www.jcde.org A Method for Image- based Shadow Interaction with Virtual Objects Hyunwoo Ha and Kwanghee Ko* School of Mechatronics,