VIETNAM NATIONAL UNIVERSITY, HANOI COLLEGE OF TECHNOLOGY ANH DUC NGUYEN IMPROVING THE 3D TALKING HEAD FOR USING IN AN AVATAR OF VIRTUAL MEETING ROOM Branch Information Technology Code 1 01 10 M ASTER[.]
VIETNAM NATIONAL UNIVERSITY, HANOI COLLEGE OF TECHNOLOGY ANH DUC NGUYEN IMPROVING THE 3D TALKING HEAD FOR USING IN AN AVATAR OF VIRTUAL MEETING ROOM Branch: Information Technology Code: 1.01.10 MASTER THESIS Supervisor: Dr The Duy Bui Hanoi, November 2006 Contents List of Figures Chapter - Introduction 1.1 The avatar in the virtual meeting room 1.2 Structure of this thesis .6 Chapter - The 3D animated talking head 2.1 A muscle based 3D face m odel 2.2 Combination of facial movements on a 3D talking head 2.3 From emotions to emotional facial expressions 12 2.4 Conclusion .15 Chapter - OpenGL and JO G L overview 16 3.1 OpenGL overview 16 3.1.1 Immediate Mode and Retained Mode (Scene Graphs) 16 3.1.2 OpenGL history .16 3.1.3 How does OpenGL work? 17 3.1.4 OpenGL as a state machine 19 3.1.5 Drawing geometry 20 3.2 JOGL overview .22 3.2.1 Introduction 22 3.2.2 Developing with JOGL 23 3.2.3 Using JOGL 24 3.3 Conclusion .25 Chapter - Improving lip-sync ability 26 4.1 Introduction 26 4.2 Previous work 27 4.3 FreeTTS and Mbrola 28 4.3.1 FreeTTS 28 4.3.2 Mbrola 31 4.4 The improved lip model 32 4.5 Conclusion .35 C hapter - Adding the hair and eyelashes models 36 5.1 Introduction 36 5.2 The Hair model 37 5.2.1 Introduction to V RM L 37 5.2.2 Our hair model 39 5.3 The Eyelashes m odel 42 5.4 Conclusion 44 Chapter - Implementation and illustrations 45 6.1 Implementing the face m odel 45 6.1.1 Structure of the system 45 6.1.2 Some improvements 46 6.2 Face model illustrations 47 Chapter - Conclusion 56 Future research 56 References 58 L is t o f F ig u re s 2.1: The original 3D face model: (a): The face mesh with muscles; (b): The face after rendering 2.2: System overview .10 2.3: Combination of two movements in the same channel 11 2.4: The activity of Zygomatic Major and Orbicularis Oris before (top) and after (bottom) applying combination algorithm 11 2.5: The emotion-to-expression system 12 2.6: Membership functions for emotion intensity (a) and muscle contraction level (b ) 13 2.7: Basic emotions: neutral, Sadness, Happiness, Anger, Fear, Disgust, Surprise (from left to right) 15 3.1: Software implementation of OpenGL 18 3.2: Hardware implementation of OpenGL 18 3.3: A simplified version of OpenGL pipeline 19 3.4: The structure of an application using JOGL .25 4.1: FreeTTS Architecture .29 5.1: Dividing a polygon (a) to triangles (b) 40 5.2: Importing the hair model: (a): the original head; (b): the head with the imported hair model; (c): the head with the imported and fine tuned hair model 41 5.3: Some other imported and fine tuned hair models 41 5.4: The open (a) and close eyes (b) without and with eyelashes 43 5.5: The face without (a) and with (b), (c) the hair and eyelashes models 44 6.1: The main interface of our program 47 6.2: The face model displays Happiness emotion with maximum intensity 48 6.3: The face model displays Surprise emotion with maximum intensity 48 6.4: The combination of two emotions: Happiness andSurprise 49 6.5: The effect of left Zygomatic Major muscle’scontractionat maximum level on the face model 49 6.6: The face model from different view points 50 6.7: Increasing surprise .50 6.8: The hair model after being imported .51 6.9: The hair model after being fine tuned .51 6.10: Some other hair models 52 6.11: Closing the eyes .53 6.12: The face model attach to the body 54 6.13: Our face model embeds intoother project 54 Chapter Introduction 1.1 The avatar in the virtual meeting room The Virtual Meeting Rooms (VMRs) are 3D virtual simulations of meeting rooms where the various modalities such as speech, gaze, distance, gestures and facial expressions can be controlled (a VMR project in Twente) The rapid development in computer graphics and embodied conversational agents areas allows the creation of VMRs and makes them to be useful for various purposes These purposes can be divided into three following categories [24], First, they can be used as a virtual environment for teleconferencing, a real-time communication means for remote participations of meeting [18] Using the VMRs helps to reduce the amount of data that needs to be sent to and displayed on screens of remote client side In addition, they offer to overcome some features that are problematic in real meetings or in traditional video-based conferences For examples, the participants can adapt the Virtual Environment to their own preferences without disturbing other people or they can choose a view from any seat in VMRs that they want and feel the comfortable during the meeting [17] Second, VMRs are used to simulate the content of recorded meeting in the different ways or present multimedia information about it Information can be directly recorded from participant’s behaviors in real meetings (e.g tracking of head or body movements, voice) These presentations can be used as a 3D summary of the real meetings or for evaluating the annotations and results which are obtained by machine learning methods Third, because Virtual Environments allow controlling various independent factors (voice, gaze, distance, gestures, and facial expressions); these factors can be used to study their influence on features of social interaction and social behavior Conversely, the effect of social interaction on these factors can be studied adequately in Virtual Environments as well In the VMRs environment, each participant is represented by an avatar An avatar is an embodied conversational agent that simulates all behaviors and movements of the participant The avatar will typically contain a talking head which is able to speak and displays lip movements during speech, emotional facial Introduction expressions, conversation signals and a body which is able to display gestures of the participant The important thing is the avatar of each participant must bring the belief to other participants The avatar will be believable if it can simulate the appearance, express the characteristics of the participant and its actions and reactions can be as true to life as those of the person it is representing The talking head model plays an important role in the creation of a believable avatar It is not only used to display facial movements and expressions but also used to distinguish other avatars and to express the personality of the participant In order to create a talking head model which is suitable to use for avatar in the VMRs, there are some problems which need to deal with First, the talking head must be simple enough to keep the real-time animation but still produce realistic and high quality facial expressions Second, the talking head not only has the capabilities to create facial movements such as conversational signals, emotions expressions, etc but also has to combine and solve the conflicts between them Third, the talking head must look like real head, it means the head must have other models attached to it such as hair models, tongue model, eyelashes model, etc In this thesis, we choose the talking head model from [3] to improve and then use for avatars in VMRs We study the model carefully to discover all advantages as well 93 disadvantages The advantages will be inherited while the missing functions or disadvantages will be supplemented or improved, respectively We change the rendering method of the head to new one to improve the animation speed The synchronization between audible and visible speech is also improved We supply the hair and the eyelashes models to make the head look more realistic The improved model not only can be used for avatars in VMRs environment but also can be embedded into other projects 1.2 Structure of this thesis In the Chapter 2, we introduce the 3D animated talking head [3] that our works are based on This head is able to produce realistic facial expressions, real time animation on the personal computer It can display several types of facial movements such as eye blinking, head rotation, lip movement, etc at once and the most important thing is it can generate emotional facial expressions from emotions We briefly introduce the way this muscle based 3D face model is created, the Introduction techniques it uses for producing animation, the combination of facial movements and how to generate emotional facial expressions from emotions In the Chapter 3, we present an overview of OpenGL and JOGL (Java bindings for OpenGL) OpenGL is industry standard and premier environment for developing 2D and 3D graphic applications Its capabilities allow developer to display compelling graphics and produce applications that require maximum performance (OpenGL project) JOGL is new OpenGL interface for Java platform It is open sourced, clean and minimalist API from all bindings available In the Chapter 4, we introduce an overview of FreeTTS and Mbrola FreeTTS is a robust text-to-speech system that we used to get phonemes and timing information from a text This phonemes string is used to generate lip movements when speaking FreeTTS supports Mbrola which is a speech synthesizer based on the concatenation of diphones We used Mbrola as an output thread of FreeTTS to produce synthetic audible We also present the method to improve the lip-sync capability The original head can speak but in some conditions the speech from the speaker does not synchrony with the movements of the lip on the screen Besides, we may want the head to express various emotions depends on current speaking sentence, so we need to know exactly time when the sentence is spoken then we can generate the suitable emotions The original head does not have hair model and eyelashes We supply these parts in order to make it look like a real head and become more attractive In the Chapter 5, we present the method to apply a hair model for the head and the way we draw eyelashes for the eyes Available hair models will be attached to the head model without much human intervention during process In addition, the eyelashes are a small part on the face but without them, the eyes may not look real The eyelashes also help to improve the emotions expression capability of the eyes when the eyes flutter We describe some problems about the eyelashes creation, and how to fix them to the eyelid so they can move with the eyelid when the eyes close or open In the Chapter 6, we introduce the implementation of the face using Java and JOGL We also introduce our improvement in rendering method of the talking head using the new methods and mechanism which are introduced in OpenGL 1.5 This method helps to increase the animation speed significantly Some illustrations of our 3D talking head model are also introduced in this Chapter Chapter The 3D animated talking head 2.1 A muscle based 3D face model The face model is created by a polygonal face mesh and a B-spline surface for the lips The face mesh data was obtained from a 3D scanner at first and was processed to improve the animation performance but still kept the high quality of the model The process contains two phases In the first phase, the number of vertices and polygons was reduced in non-expressive parts but maintained in the expressive parts which are the areas around the eyes, the nose, the mouth and the forehead At the end of this phase, the face mesh contains 2,468 vertices and 4,746 polygons This is small enough to have real-time animation but still preserves the high quality of detail in expressive parts of the face In the second phase, the face model was divided into eleven regions Five regions on the left part include of left lower face, left middle face, left lower eyelid, left upper eyelid and left upper face There are five corresponding regions on the right part and the last region is at the back of the head This not only helps to prevent unwanted artifacts generated because of the displacement of the vertices in the regions that should not be affected by muscle contractions but also increase the animation speed The lip model is a B-spline surface with 24 x control point grid The lip is deformed by moving the control points and the B-spline surface is polygonalized to connect with the face mesh for rendering The B-spline surface has the advantage of producing a smooth face but it can not produce wrinkles and needs to be polygonalized before rendering If the number of control point is too large then it will require heavy computations Due to these advantage and disadvantage, it is suitable to use B-spline surface for modeling the small part of the face like the lips Almost all of the 19 muscles, which are used on the face to generate animation, are vector muscles, except Orbicularis Oris that drive the mouth and Orbicularis Oculi that drive the eye The vector muscle of the face is an improved version of the vector muscle model from [28] In addition, a mechanism to generate wrinkles and bulges is added to increase the realistic of the facial expressions and the technique to reduce the computation is also introduced to enhance the animation The 3D animated talking head performance The Orbicularis Oris muscle is parameterization-based and is adopted from [12] The Orbicularis Oculi has two parts: the Pars Palpebralis that open and closes the eyelid, is adopted from [22] and Pars Orbitalis that squeezes the eye, is adopted from [28], The jaw and the eyeball rotation algorithms are improved from the ones proposed in [22] The mouth now has a natural oval-looking, and the eyes can track a target Eye movement is independent of facial muscle movements and can not rotate to impossible positions All muscles have the intensity range from to 1, the step value between two adjacent muscle contractions is 0.2 This step value is determined after trail and error experiments It is small enough to ensure the facial animations are smooth and large enough to decrease the computation times Figure 2.1 shows the original face from [3] (a) (b) Figure 2.1: The original 3D face model (a): The face mesh with muscles; (b): The face after rendering 2.2 Combination of facial movements on a 3D talking head The system takes as input the marked up text with each facial movement (except lip movement while talking) is defined as a group of muscle contractions that share the same function, start time, onset, offset and duration Lip movement will be generated separately inside the system based on the phonemic presentation of the input text Chapter OpenGL and JOGL overview 3.1 OpenGL overview 3.1.1 Immediate Mode and Retained M ode (Scene Graphs) There are two different types of APIs for programming real-time 3D applications [32] The first type is called retained mode In retained mode, the description of objects and the scene is provided to the API and then the graphics package will create the image on the screen All things the programmers need to is to give commands to change the position and viewing orientation of the user (also called the camera) or other objects in the scene The structure that has just be built is called scene graph The scene graph is a data structure that includes all the objects in our scene and their relationships to others Many high-level toolkits or "game engines" use this approach The programmer doesn't need to understand how the scene is rendered because the graphic library will take care of rendering the model or database that he hands over to it Java3D is one example of scene graph API The second approach to 3D rendering is called immediate mode Most retained mode APIs or scene graphs use an immediate mode API internally to actually perform the rendering For examples, Java3D uses OpenGL or Direct3D to render the geometry created by user In immediate mode, the programmers don't describe the models and environment at high a level as in retained mode Instead, they issue commands directly to the graphics processor Each command has an immediate effect depends on the current setting state and new commands have no effect on rendering commands that have already been executed This allows everything to be controlled at low-level 3.1.2 OpenGL history OpenGL is an industry-standard, cross-platform Application Programming Interface (API) The specification for this API was finalized in 1992, and the first implementations appeared in 1993 The forerunner of OpenGL is Iris GL (Graphics 16 OpenGL and JOGL overview Library), the API that was designed and supported by Silicon Graphics, Inc To establish an industry standard, Silicon Graphics collaborated with various graphics hardware companies to create an open standard, which was named "OpenGL." Until now, seven revisions have been introduced to add new functionality to the API The newest version of the OpenGL specification is 2.1.All newer versions are upward compatible with earlier versions [4], - Version 1.1 was finished in 1997 and added support for two important capabilities: vertex arrays and texture objects - The specification for OpenGL 1.2 was released in 1998 and added support for 3D textures and an optional set of imaging functionality - The OpenGL 1.3 specification was completed in 2001 and added support for cube map textures, compressed textures, multi-textures, etc - OpenGL 1.4 was completed in 2002 and added automatic mipmap generation, additional blending functions, internal texture formats for storing depth values for use in shadow computations, support for drawing multiple vertex arrays with a single command, more control over point rasterization, control over stencil wrapping behavior, and various additions to texturing capabilities - The OpenGL 1.5 specification was published in October 2003 It added support for vertex buffer objects, shadow comparison functions and occlusion queries - OpenGL 2.0, finalized in September 2004, opened up the processing pipeline for user control by providing programmability for both vertex processing and fragment processing Other features added in 2.0 include support for multiple render targets, nonpower-of-2 textures, point sprites, and separate stencil functionality for front- and back-facing surfaces - Version 2.1, has just released in August 2006, added support for the revision 1.20 of OpenGL shading language, non-square matrices, pixel buffer objects and sRGB textures |- ĐAI H O C Q U Ố C GIA HÀ NÒI trung TÁM THỒNG TIN THƯ VIỆN 31.1.3 How does OpenGL work? OpenGL implementations can be software implementation or hardware implementation Window applications can call a Windows API which is called the 17 OpenGL and JOGL overview Graphics Device Interface (GDI) to create output onscreen and graphic card vendors usually supply a driver for GDI to interface with A software implementation of OpenGL takes graphics requests from an application and constructs (rasterizes) a color image of the 3D graphics This image then will be supplied to the GDI to display on the monitor Microsoft has its OpenGL software implementation and almost modem operating system products from Microsoft contain support for OpenGL However, SGI and MESA also released software implementations of OpenGL for Windows that greatly outperformed Microsoft's implementation Figure 3.1: Software implementation of OpenGL An OpenGL hardware implementation usually takes the form of a graphics card driver OpenGL API calls from applications are passed to a hardware driver This driver does not pass its output to the Windows GDI for display, it interfaces directly with the graphics display hardware, instead The more components of OpenGL are hardware implemented, the faster the implementation processes the calls from applications and display images onscreen Figure 3.2: Hardware implementation of OpenGL 18 OpenGL and JOGL overview When an application calls OpenGL API functions, the commands are placed in a command buffer Vertex data, texture data, etc are also contained in this buffer When the buffer is flushed, the commands and data are passed to the “Transformation and Lighting” step In this step, points used to describe an object's geometry are recalculated to determine the given object's location and orientation Lighting calculations are performed as well to indicate the brightness of the colors at each vertex When this stage finished, the data is passed to the “Rasterization” step of the pipeline The rasterizer actually creates the color image from the geometric, color, and texture data and places the image into the frame buffer The frame buffer is the memory area of the graphics display device, which means the image is displayed on the screen Figure 3.3 shows the simple view of OpenGL pipeline At a low level, there are many boxes inside each box of the diagram Figure 3.3: A simplified version of OpenGL pipeline 3.1.4 OpenGL as a state machine OpenGL is designed as a state machine [21] If we put it into specific states (or modes) then these states will remain in effect until we change them For example, the current color is a state variable We can set the current color to black, white, red, or any other color, and all objects will be drawn with that color until we set the current color to something else The current color is only one of many state variables that OpenGL maintains The other states are current viewing and projection transformations, line and polygon stipple patterns, polygon drawing modes, pixel-packing conventions, positions and characteristics of lights, and material properties of the objects being drawn The execution model for OpenGL can be described as client-server An application (the client) issues OpenGL commands that are interpreted and processed by an OpenGL implementation (the server) Many server-side variables only have two states: on or off, that are enabled or disabled with the command gl E n a b l e () or g l D i s a b l e () For client-side, we enable it with g l E n a b l e C l i e n t S t a t e () and disable it with g l D i s a b l e C l i e n t S t a t e () commands Each state variable or 19 OpenGL and JOGL overview mode has a default value, and vve can query the system for each variable's current value at any time In addition, we can save a collection of server-side state variables on an attribute stack with g l P u s h A t t r i b () and client-side state can be pushed on second stack with g l P u s h C l i e n t A t t r i b () We can temporarily modify the states, and restore the values later with g l P o p A t t r i b () or g l P o p C l i e n t A t t r i b () for server-side or client-side states, respectively In the case we only need to change the state temporarily, using these commands is likely to be more efficient than issuing the query commands 3.1.5 Drawing geometry All graphic objects in OpenGL are constructed from geometric drawing primitives OpenGL only supports the following geometry primitives: points, lines, line strips, line loops, polygons, triangles, triangle strips, triangle fans, quadrilaterals, and quadrilateral strips To send geometry data to OpenGL for rendering, we have three main ways [25] The first is the vertex-at-a-time method The command g l B e g i n () is called to start a primitive and then glEnd() to end it Between these two commands are commands that specify vertex attributes such as vertex position, color, normal, texture coordinates These commands are g l V e r t e x * (), g l C o l o r M ) , g l N o rm al *( ), and g l T e x C o o r d * (), etc When the vertex-at-a-time method is used, the call g l V e r t e x * () signals the end of the data definition for a single vertex, and it may also define the completion of a primitive After calling the command g l B e g i n () and specifying a primitive type, a graphics primitive is completed by calling enough times g l V e r t e x * () to completely specify a primitive of the indicated type For example, a triangle is completed every third time g l V e r t e x * () is called The second method to draw primitives is to use vertex arrays With this method, vertex attributes are stored in user-defined arrays, the applications then set up pointers to the arrays, and use g l D r a w A r r a y s (), g l M u l t i D r a w A r r a y s (), or g l l n t e r l e a v e d A r r a y s (), etc to draw a huge number of primitives at once Because this method can efficiently pass large amounts of geometry data to OpenGL, it is usually used for portions of code that are extremely performance critical Using g l B e g i n () and glEnd(), application developers have to specify each attribute of each vertex, so the number of function calls can become significant when objects with thousands of vertices are drawn In contrast, we can draw a large 20 OpenGL and JOGL overview number of primitives with a single function call after the vertex data is organized into arrays by using vertex arrays method Besides, this method can be faster than vertex-at-a-time method because it is often more efficient for the OpenGL implementation to deal with data organized into arrays OpenGL supports some types of array includes colors array, vertex positions array, normal vectors array The values of current arrays are specified with g l C o l o r P o i n t e r (), g l V e r t e x P o i n t e r (), g l N o r m a l P o i n t e r (), respectively We have to indicate which type of arrays will be used before calling g l D r a w A r r a y s () or g l M u l t i D r a w A r r a y () The function g l l n t e r l e a v e d A r r a y s () can specify and enable several interleaved arrays simultaneously (e.g., each vertex might be defined with three floating-point values representing a normal followed by three floating-point values representing a vertex position.) The two former methods are called immediate mode because primitives are rendered right after they have been specified In the third method all function calls are stored in the display list and are pre-processed before executing A display list is an OpenGL-managed data structure that stores commands for later execution Both commands to set state and commands to draw geometry can be included in display list and are stored on the server side Display list can be processed later with g l C a l l L i s t () or g l C a l l L i s t s () The display list is initiated with g l N e w L i s t (), and completed with g l E n d L i s t () All the commands issued between those two calls become part of the display list There are although certain OpenGL commands are not allowed within display lists In common, display list mode can provide a better performance than immediate mode OpenGL implementation can optimize the commands in the display list for the underlying hardware and store the commands in a memory area that allow better drawing performance such as in memory of the graphics accelerator These optimizations require some extra computation or data movement, so applications only see a performance benefit if the display list is called more than once From OpenGL version 1.5, there is a mechanism that permits vertex array data to be stored in server-side memory This mechanism typically provides the highest performance rendering because the data can be stored in memory on the graphics accelerator and need not be transferred over the I/O bus each time it is rendered The g l B i n d B u f fer () command creates a buffer object in the memory of graphic accelerator, g l B u f f e rD at a () and g l B u f f e r S u b D a t a () commands are used to 21 OpenGL and JOGL overview specify the data values for that buffer The API also supports to efficiently stream data from client to server gl M a p B u f f er () can map a buffer object into the client's address space and obtain a pointer to this memory so that we can specified data values directly Before using other rendering commands that access the buffer, we need to call g l U n m a p B u f fer () to remove the current pointer to that buffer object 3.2 JOGL overview 3.2.1 Introduction OpenGL is for making graphics and it is fast In almost modern graphic card, it is hardware accelerated We can use OpenGL to create anything visually that we would want to Unfortunately, OpenGL is written in C language Besides, we need to put graphics from OpenGL into a window to display them, but OpenGL itself doesn't have any commands for us to create windows This makes OpenGL hard to learn for beginners or programmers that want to use true Object Oriented Programming (OOP) language like Java Java is possibly the most popular true OOP language There have been many attempts to combine OpenGL with Java and provide access to OpenGL through a friendly Java API, such as Java 3D, OpenGL for Java Technology (gl4java) and Lightweight Java Game Library (LWJGL) but the most robust, simple and easy-to-use API was JOGL The reason is JOGL is supported by both Sun (the creators of Java) and SGI (the creators of OpenGL) JOGL is a Java programming language binding for the OpenGL 3D graphics API It supports integration with the Java platform's AWT and Swing widget sets while providing a minimal and easy-to-use API that handles many of the issues associated with building multithreaded OpenGL applications JOGL provides access to the latest OpenGL routines (OpenGL 2.0 with vendor extensions) as well as platform-independent access to hardware-accelerated off screen rendering JOGL also provides some of the most popular features introduced by other Java bindings for OpenGL like GL4Java, LWJGL and Magician, including a composable pipeline model which can provide faster debugging for Java-based OpenGL applications than the analogous C program JOGL differs from these libraries in that it merely exposes the procedural OpenGL API via methods on a few classes, rather than attempting to map OpenGL functionality onto the OOP paradigm [9], 22 OpenGL and JOGL overview The JOGL binding is itself written almost completely in the Java programming language Indeed, the majority of the JOGL code is auto-generated from the OpenGL C header files via a conversion tool named GlueGen, which was programmed specifically to facilitate the creation of JOGL GlueGen parses the C header files and then magically creates the needed Java and JNI code necessary to connect to those native libraries This design decision has both its advantages and disadvantages The procedural and state machine nature of OpenGL is inconsistent with the typical method of programming under Java, which is bothersome to many programmers However, the straightforward mapping of the OpenGL C API to Java methods makes conversion of existing C applications and example code much simpler The thin layer of abstraction provided by JOGL makes runtime execution quite efficient Because most of the codes are auto-generated, all updates to OpenGL can be added quickly to JOGL [30] 3.2.2 Developing with JOGL JOGL was designed for the most recent versions of the Java platform and for this reason; it supports only J2SE 1.4 and later It also only supports true color (15 bits per pixel and higher) rendering and does not support color-indexed modes It was designed with New I/O (NIO) in mind and uses NIO internally in the implementation To develop an application using JOGL, we need both jogl.jar and the appropriate native library jar file (for example, jogl-natives-win32.jar) The jogl.jar needs to be in C L A S S P A T H for compiling and running code, while the native library file or files also need to be along the j a v a l i b r a r y p a t h at run time We can include the files with our code and point to them directly with - c l a s s p a t h and Djava l i b r a r y p a t h arguments This approach helps end users who may not want, or may not be able, to add files to these directories The recommended distribution vehicle for applications using JOGL is Java Web Start JOGL-based applications not even need to be signed; all that is necessary is to reference the JOGL extension JNLP file Because the JOGL jar files are signed, an unsigned application can reference the signed JOGL library and continue to run inside the sandbox The users only need to launch Java Web Start and download the client application for the first time, the application then is cached on the client machine and can be launched remotely offline 23 OpenGL and JOGL overview JOGL also supports Applet The J O G L A p p l e t l n s t a l l e r is distributed inside jogl.jar as a utility class in com s u n o p e n g l util This installer uses some clever tricks to allow deployment of unsigned applets which use JOGL into existing web browsers and JREs as far back as 1.4.2, which is the earliest version of Java supported by JOGL It requires that the developer host a local, signed copy of jogl.jar and all of the jogl-natives jar flies; the certificates must be the same on all of these jars Because in the release builds of JOGL, all of these jar files are signed by Sun Microsystems, so the developer can deploy applets without needing any certificates 3.2.3 Using JOGL JOGL provides two basic widgets into which OpenGL rendering can be performed The G L C a n v a s is a heavyweight AWT widget which supports hardware acceleration and which is intended to be the primary widget used by applications The G L J P a n e l is a fully Swing-compatible lightweight widget which supports hardware acceleration but it is not as fast as the G L C a n v a s because it typically reads back the frame buffer in order to draw it using Java2D The G L jP a n e l is intended to provide 100% correct Swing integration in the circumstances where a G L C a n v a s can not be used Both the G L C a n v a s and G L J P a n e l implement a common interface called G L A u t o D r a w a b l e so applications can switch between them with minimal code changes The G L A u t o D r a w a b l e interface provides: - access to the GL object for calling OpenGL routines - a callback mechanism (GL Ev entListener) for performing OpenGL rendering - a d i s p l a y method for forcing OpenGL rendering to be performed synchronously - AWT- and Swing-independent abstractions for getting and setting the size of the widget and adding and removing event listeners Applications implement the G L E v e n t L i s t e n e r interface to perform OpenGL drawing via callbacks When the methods of the G L E v e n t L i s t e n e r are called, the underlying OpenGL context associated with the drawable is already 24 OpenGL and JOGL overview current The listener fetches the GL object out of the G L A u t o D r a w a b l e and begins to perform rendering The i nit () method is called when a new OpenGL context is created for the given GL Au t o D w ab le Any display lists or textures used during the application's normal rendering loop can be safely initialized in init() The d i s p l a y () method is called to perform per-frame rendering The r e s h a p e () method is called when the drawable has been resized The default implementation automatically resizes the OpenGL viewport so often it is not necessary to any work in this method The d i s p l a y C h a n g e d () method is designed to allow applications to support on-the-fly screen mode switching, but it is not yet implemented so the body o f this method should remain empty JFrame « in t e r f a c e » « in t e r f a c e » GLDrawable GLEvent Listener "7 T~ _ SimpleJoglApp — i— GLCanvas -> SimpleGLEventUstener Figure 3.4: The structure of an application using JOGL 3.3 Conclusion In this chapter, we introduce about OpenGL and JOGL By using JOGL, we c;an create OpenGL applications with Java programming language Because JOGL h.as almost OpenGL API commands in few classes and it is generated automatically firom C header files of OpenGL then all programmers who are familiar with OpenGL can use JOGL without any difficulties 25 Chapter Improving lip-sync ability 4.1 Introduction The lips are a very expressive part of the face while they participate in the articulation of speech For convincing their animations while talking, two things must be considered: first, tight synchronization between audible and visible speech is required Humans can detect very slight misalignments, so any asynchronism between voice and lip movements may affect to the confidence of the users to the system Second, the effects of co-articulation need to be addressed Co-articulation is the blending effect that surrounding phonemes have on the current phonemes There are several ways to synchronize lip movements with speech; the classification mainly depends on the type of speech data First, the text-driven approach uses a text as input The phonemic representation of this text is used to generate both synthetic audible and visible speech The second way, speech-driven method, takes pre-recorded speech as input The phonemes and timing information are taken by analyzing the audio data file and are used to create lip movements The audio file will be played synchronously with facial animations If both text and speech audio are available, the text-and-speech-driven hybrid approach can be applied The phonemes and timing information which got from the text are adjusted to synchronize with the audio file before using for animation components [1] The phonemic representation of a text is required to create movements for the lips while talking that text There are two main ways to get the phoneme and timing information from a text First, we can use a phoneme database and search for the phonemic representation of each word in the input text The timing information is obtained by separate algorithms In the second way, we can use some rules to analyze the text to get both representation and timing information Of course, the later method is fast but it is not as accurate as the former method Besides, the algorithms, which are used to process the text in the first method, are improved and the processing ability of PCs is fast enough so the second method is rarely used today 26 Improving lip-sync ability 4.2 Previous work The original head uses the text-driven approach to get phonemes and timing information The phonemic representation, including the timing information, of the given text is used to generate lip movements Phonemes in the phoneme string are taken as parameters to search for corresponding visemes Each viseme belongs to a viseme segment which has a set of parameters of dominance functions These dominance functions participate in the articulation of speech segment (lip movement) Finally, the viseme segments and timing information of the corresponding phonemes are used to generate the key frames of the lip movement Other frames of this lip movement are generated by interpolating from the key frames of current movement and adjacent movements The original head model has used the dominance model from (Cohen and Massaro, 1993) to create the co-articulation effect of the lip movements Each viseme has dominance over the vocal articulators that increase and decrease over time during articulation This dominance function determines how close the lips come to reaching their target value of the viseme Each movement has a set of dominance functions, one for each parameter These dominance functions are based on (DeCarlo et al., 2002) The co-articulation part of generating lip animations process works well but the audible and visible synchronization part has some disadvantages First, there is not any mechanism to ensure audible and visible speech to start exactly at the same time Besides, generating lip movements from beginning to the end of phoneme string may cause a problem In some situations, it may lead to the misalignment between lip movements and sound from speaker, and as mentioned before, humans can detect a very slight misalignment even only in ms of time Second, because the head can combine and display several facial movements at once, it can show the emotions while talking The original head takes as input the marked up text which specify the text to speak and groups of emotions to display while talking Each emotion group has the start time, duration, onset, offset values and intensity values of six basic emotions An emotion in a group has the intensity value so that it does not conflict with others The head can speak the given text while displaying in order groups of emotions with corresponding intensity values from group’s starting time and happen in group's duration time But in practice, we usually want to generate 27 Improving lip-sync ability emotions corresponds to specific sentences in the given text but not depend on estimative time The original marked up text looks like this: Dự phòng: fb.com/TaiHo123doc.net 4.3 FreeTTS and Mbrola 4.3.1 FreeTTS FreeTTS is based upon two speech synthesizers: the Festival Speech Synthesis System, and Flite Festival is funded by Sun Microsystems through collaborative research It is an extremely flexible speech synthesis research environment written using Scheme and C++, but the disadvantage of Festival is that there is no particular attention paid to its performance Flite (Festival Lite) was written at Carnegie Mellon University (CMU), and is based on Festival Flite is written entirely in C with great detail paid to size and performance on embedded platforms The size and performance requirements of Flite, however, drastically reduce its flexibility To get the get the advantages of both these speech synthesizers, FreeTTS's algorithms are based on Flite, but the architecture on Festival To synthesize speech, FreeTTS breaks the input text into sets of phoneme and then converts those phonemes into audible speech FreeTTS does this by performing successive operations on the input text FreeTTS stores the cumulative results of each operation in an utterance structure that holds the complete analysis of the text Figure 4.1 shows the overall architecture for FreeTTS The core of 28 Improving lip-sync ability FreeTTS is an engine that contains a voice and an output thread The voice consists of a set of utterance processors that create, process, and annotate an utterance structure Associated with the voice is a data set that is used by each of the utterance processors The output thread is responsible for two actions: synthesizing an utterance into audio data and then directing this data to the appropriate audio player device Tải FULL (61 trang): https://bit.ly/3d6xmwp Dự phòng: fb.com/TaiHo123doc.net A p p lic a tio n s ¿ext JSML Tt'\r JS A P I Speaknble V o ic e D a ta U tter aoce Pro