Character Animation with Direct3D- P13 doc

226 Character Animation with Direct3D fairly educated guess as to which phoneme is being spoken. Figure 10.3 shows the waveform from the previous speech sample together with the spectrograph of the same sample (a spectrograph shows the frequency and amplitude of a signal). As you can see in Figure 10.3, distinct patterns can be seen in the spectrograph as the phonemes are spoken. As a side note, speech-to-text applications take this analysis one step further and use Hidden Markov Models (HMM) to figure out which exact word is being spoken. Luckily we don’t need to dive that deep in order to create reasonable lip-syncing. If you are interested in analyzing speech and making your own phoneme extractor, you’ll need to run the speech data through a Fourier Transform. This will give you the data in the frequency domain, which in turn will help you build the spectrogram and help you classify the phonemes. Check out www.fftw.org for a Fast-Fourier- Transform library in C. Analyzing speech and extracting phonemes is a rather CPU-intense process and is therefore pre-processed in all major game engines. However, some games in the past have used a real-time lip-syncing system based simply on the current amplitude of the speech [Simpson04]. With this approach the voice lines are evaluated just a little ahead of the playback position to determine which mouth shape to use. In the coming sections I will look at a similar system and get you started on analyzing raw speech data. FIGURE 10.3 Waveform and spectrograph of a voice sample. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. SOUND DATA Before you can start to analyze voice data, I’ll need to go off on a tangent and cover how to actually load some raw sound data. No matter which type of sound format you will actually use, once the sound data has been decompressed and decoded, the raw sound data will be the same across all sound formats. In this chapter I’ll just use standard uncompressed WAVE files for storing sound data. However, for projects requiring large amounts of voice lines, using uncompressed sound is of course out of the question. Two open-source compression schemes available for free are OGG and SPEEX, which you can find online: http://www.vorbis.com/ http://www.speex.org/ OGG is aimed mainly at music compression and streaming, but it is easy enough to get up and running. SPEEX, on the other hand, focuses only on speech compression. T HE WAVE FORMAT There are several good tutorials on the Web explaining how to load and interpret WAVE (.wav) files, so I won’t dig too deep into it here. The WAVE format builds on the Resource Interchange File Format (RIFF). RIFF files store data in chunks, where the start of each chunk is marked with a 4-byte ID describing what type of chunk it is, as well as 4 bytes containing the size of the chunk (a long). The WAVE file con- tains all information about the sound—number of channels, sampling rate, number of bits per sample, and much more. Figure 10.4 shows how a WAVE file is organized. There are many other different types of chunks that can be stored in a WAVE file. Only the Format and Data chunks are mandatory. Table 10.2 shows the different fields of the Format chunk and their possible values. Chapter 10 Making Characters Talk 227 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 228 Character Animation with Direct3D FIGURE 10.4 WAVE file format. TABLE 10.2 THE WAVE FORMAT CHUNK Field Type Description Audio Format Short Type of audio data. A value of 1 indicates PCM data; other values mean that there’s some form of compression. Num Channels Short Number of channels, 1 = mono, 2 = stereo Sample Rate Long Number of samples per second. For example, CD quality uses 44,100 samples per second (Hz). Byte Rate Long Number of bytes used per second. Block Align Short Number of bytes per sample (including multiple channels). Bits/Sample Short 8 = 8 bits per sample, 16 = 16 bits per sample. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. For a full description of the WAVE file format and the different chunks available, check out http://www.sonicspot.com/guide/wavefiles.html. I’ll assume that the data stored in the “data” chunk is uncompressed sound data in the Pulse-Code Modulation (PCM) format. This basically means that the data is stored as a long array of values where each value is the amplitude of the sound at a specific point in time. The quickest and dirtiest way to access the data is to simply open a stream from the sound file and start reading from byte 44 (where the data field starts). Although this will work if you know the sound specifications, it isn’t really recommended. The WaveFile class I’ll present here will do minimal error checking before reading and storing the actual sound data: class WaveFile { public: WaveFile(); ~WaveFile(); void Load(string filename); short GetMaxAmplitude(); short GetAverageAmplitude(float startTime, float endTime); float GetLength(); public: long m_numSamples; long m_sampleRate; short m_bitsPerSample; short m_numChannels; short *m_pData; }; The Load() function of the WaveFile class loads the sound data and performs some minimal error checking. For example, I assume that only uncompressed, 16- bit WAVE files will be used. You can easily expand this class yourself if you need to load 8-bit files, etc. If a WaveFile object is created successfully and a WAVE file is loaded, the raw data can be accessed through the m_pData pointer. The following code shows the code for the Load() function of the WaveFile class: void WaveFile::Load(string filename) { ifstream in(filename.c_str(), ios::binary); //RIFF char ID[4]; in.read(ID, 4); Chapter 10 Making Characters Talk 229 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. if(ID[0] != 'R' || ID[1] != 'I' || ID[2] != 'F' || ID[3] != 'F') { //Error: 4 first bytes should say 'RIFF' } //RIFF Chunk Size long fileSize = 0; in.read((char*)&fileSize, sizeof(long)); //The actual size of the file is 8 bytes larger fileSize += 8; //WAVE ID in.read(ID, 4); if(ID[0] != 'W' || ID[1] != 'A' || ID[2] != 'V' || ID[3] != 'E') { //Error: ID should be 'WAVE' } //Format Chunk ID in.read(ID, 4); if(ID[0] != 'f' || ID[1] != 'm' || ID[2] != 't' || ID[3] != ' ') { //Error: ID should be 'fmt ' } //Format Chunk Size long formatSize = 0; in.read((char*)&formatSize, sizeof(long)); //Audio Format short audioFormat = 0; in.read((char*)&audioFormat, sizeof(short)); if(audioFormat != 1) { //Error: Not uncompressed data! } //Num Channels in.read((char*)&m_numChannels, sizeof(short)); //Sample Rate in.read((char*)&m_sampleRate, sizeof(long)); 230 Character Animation with Direct3D Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. //Byte Rate long byteRate = 0; in.read((char*)&byteRate, sizeof(long)); //Block Align short blockAlign = 0; in.read((char*)&blockAlign, sizeof(short)); //Bits Per Sample in.read((char*)&m_bitsPerSample, sizeof(short)); if(m_bitsPerSample != 16) { //Error: This class only supports 16-bit sound data } //Data Chunk ID in.read(ID, 4); if(ID[0] != 'd' || ID[1] != 'a' || ID[2] != 't' || ID[3] != 'a') { //Error: ID should be 'data' } //Data Chunk Size long dataSize; in.read((char*)&dataSize, sizeof(long)); m_numSamples = dataSize / 2; //< Divide by 2 (short has 2 bytes) //Read the Raw Data m_pData = new short[m_numSamples]; in.read((char*)m_pData, dataSize); in.close(); } At the end of this function the raw sound data will be stored at the m_pData pointer as a long array of short values. The value of a single sample ranges from -32768 to 32767, where a value of 0 marks silence. The other functions of this class I will cover later as we do our amplitude-based lip-syncing system. Chapter 10 Making Characters Talk 231 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. AUTOMATIC LIP-SYNCING In the previous section you learned how to load a simple WAVE file and how to access the raw PCM data. In this section I will create a simplified lip-syncing system by analyzing the amplitude of a voice sample [Simpson04]. The main point of this approach is not to create perfect lip-syncing but rather to make the lips move in a synchronized fashion as the voice line plays. So, for instance, when the voice sample is silent, the mouth should be closed. The following function returns the average amplitude of a voice sample between two points in time: short WAVE::GetAverageAmplitude(float startTime, float endTime) { if(m_pData == NULL) return 0; //Calculate start & end sample int startSample = (int)(m_sampleRate * startTime) * m_numChannels; int endSample = (int)(m_sampleRate * endTime) * m_numChannels; if(startSample >= endSample) return 0; //Calculate the average amplitude between start and end sample float c = 1.0f / (float)(endSample - startSample); float avg = 0.0f; for(int i=startSample; i<endSample && i<m_numSamples; i++) { avg += abs(m_pData[i]) * c; } avg = min(avg, (float)(SHRT_MAX - 1)); avg = max(avg, (float)(SHRT_MIN + 1)); return (short)avg; } With this function you can easily create an array of visemes by matching a certain amplitude range to a certain viseme. This is done in the FaceController::Speak() function: 232 Character Animation with Direct3D Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. void FaceController::Speak(WAVE &wave) { m_visemes.clear(); //Calculate which visemes to use from the WAVE file data float soundLength = wave.GetLength(); //Since the wave data oscillates around zero, //bring the max amplitude down to 30% for better results float maxAmp = wave.GetMaxAmplitude() * 0.3f; for(float i=0.0f; i<soundLength; i += 0.1f) { short amp = wave.GetAverageAmplitude(i, i + 0.1f); float p = min(amp / maxAmp, 1.0f); if(p < 0.2f) { m_visemes.push_back(VISEME(0, 0.0f, i)); } else if(p < 0.4f) { float prc = max((p - 0.2) / 0.2f, 0.3f); m_visemes.push_back(VISEME(3, prc, i)); } else if(p < 0.7f) { float prc = max((p - 0.4f) / 0.3f, 0.3f); m_visemes.push_back(VISEME(1, prc, i)); } else { float prc = max((p - 0.7f) / 0.3f, 0.3f); m_visemes.push_back(VISEME(4, prc, i)); } } m_visemeIndex = 1; m_speechTime = 0.0f; } Chapter 10 Making Characters Talk 233 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Here I create a viseme for every 100 milliseconds, but you can try out different amounts of visemes per second. Of course the result will be a bit worse comparing this method to the previous one where the visemes were created manually, but the major upside with this approach is that you can quickly get “decent” looking lip- syncing with very little effort and no pre-processing. CONCLUSIONS This chapter covered the basics of lip-syncing and how to make a character “speak” a voice line. This is still a hot research topic that is constantly being improved upon. However, for games using thousands of voice lines, the focus is almost always on making the process as cheap and pain free as possible as long as the results are 234 Character Animation with Direct3D EXAMPLE 10.2 This example shows a simple lip-syncing system based on the amplitude of a voice sample. Play around with what visemes are assigned to which amplitude range, the number of visemes per second, and perhaps the blending amounts. See if you can improve on this example and make the lip-syncing look better. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. “good enough.” In this chapter I showed one way of doing the lip-synching auto- matically using only the amplitude of a voice sample. Granted, this wouldn’t be considered high enough quality to work in a next-generation project, but at least it serves as a starting point for you to get started with analyzing voice samples. If you want to improve this system, I suggest you look into analyzing the voice data with Fourier Transforms and try to classify the different phonemes. FURTHER READING [Lander00] Lander, Jeff, “Read My Lips: Facial Animation Techniques.” Available online at http://www.gamasutra.com/view/feature/3179/read_my_lips_facial_ animation_.php, 2000. [Simpson04] Simpson, Jake, “A Simple Real-Time Lip-Synching System.” Game Programming Gems 4, Charles River Media, 2004. [Lander00b] Lander, Jeff, “Flex Your Facial Muscles.” Available online at http://www.gamasutra.com/features/20000414/lander_pfv.htm, 2000. Chapter 10 Making Characters Talk 235 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... your game character IK can be used for many different things in games, such as placing hands on items in the game world, matching the feet of a character to the terrain, and much more ease purchase PDF Split-Merge on www.verypdf.com to remove this watermark 237 238 Character Animation with Direct3D So why should you bother implementing inverse kinematics? Well, without it your character animations... without it your character animations will look detached from the world, or “canned.” IK can be used together with your keyframed animations An example of this is a character opening a door You can use IK to “tweak” the door-opening animation so that the hand of the character always connects with the door handle even though the handle may be placed in different heights on different doors This chapter... www.verypdf.com to remove this watermark 240 Character Animation with Direct3D problem, you can go from this near-impossible problem to a quite manageable one This chapter will cover some approaches to solving the problem of inverse kinematics for characters S OLVING THE IK P ROBLEM Solutions to IK problems come in two flavors: analytical and numerical With analytical solutions, you have an equation... compared to the character Since the target might be moving dynamically in the game, there is no way to make a keyframed animation to cover all possible “view angles.” In this case, the IK calculation is done on the head bone and can easily be blended together with normal keyframed animations One more thing you need to consider is, of course, what should happen when the Look-At target is behind the character. .. 11.4 Limiting the field of view (FoV) ease purchase PDF Split-Merge on www.verypdf.com to remove this watermark 242 Character Animation with Direct3D So what I’ll try to achieve in the next example is a character that can look at a certain target in its field of view (i.e., turn the character s head bone dynamically) To do this I’ll use the InverseKinematics class This class encapsulates all the IK... 11.5 FIGURE 11.5 The forward vector of the head bone ease purchase PDF Split-Merge on www.verypdf.com to remove this watermark 244 Character Animation with Direct3D The forward vector of the head bone is calculated when the character is in the reference pose and the character is facing in the negative Z direction You’ll need this vector later on when you update the Look-At IK Next is the ApplyLookAtIK()... of that bone (something with which you should already be familiar after implementing a skinned character using bone hierarchies) Forward kinematics come in very handy when trying to link something to a certain bone of a character For example, imagine a medieval first-person shooter (FPS) in which you’re firing a bow A cool effect would be to have the arrows “stick” to the enemy character if you have... Look-At target is behind the character or outside the character s field of view (FoV) The easiest solution is just to cap the head rotation to a certain view cone A more advanced approach would be to play an animation that turns the character around to face the target and then use the Look-At IK to face the target In either case you need to define the character s field of view Figure 11.4 shows an example... -A T I NVERSE K INEMATICS To start you off with IK calculations, I’ll start with the simplest example: having only one bone orientation to calculate Figure 11.3 shows an example of Look-At IK Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Chapter 11 Inverse Kinematics 241 FIGURE 11.3 Look-At Inverse Kinematics In Figure 11.3 the character is facing the target (black ball)... would be to have the arrows “stick” to the enemy character if you have gotten a clean hit The first problem you would need to solve is to determine which bone of the character was hit A simple way to do this is to check which polygon of the character mesh was pierced and then see which bone(s) govern the three vertices of this polygon After this you would need to calculate the position (and orientation) . kinematics? Well, without it your character animations will look detached from the world, or “canned.” IK can be used together with your keyframed animations. An example of this is a character opening. you off with IK calculations, I’ll start with the simplest example: having only one bone orientation to calculate. Figure 11.3 shows an example of Look-At IK. 240 Character Animation with Direct3D Please. as the results are 234 Character Animation with Direct3D EXAMPLE 10.2 This example shows a simple lip-syncing system based on the amplitude of a voice sample. Play around with what visemes are

Character Animation with Direct3D- P13 doc

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan