外文文献二维动画.doc_三一文库31doc.com

资源描述

《外文文献二维动画.doc》由会员分享，可在线阅读，更多相关《外文文献二维动画.doc（10页珍藏版）》请在三一文库上搜索。

1、Faceand2-DMeshAnimationinMPEG-4AbstractThis paper presents an overview of some of the synthetic visual objects supported by MPEG-4 version-1, namely animated faces and animated arbitrary 2D uniform and Delaunay meshes. We discuss both specification and compression of face animation and 2D-mesh anima

2、tion in MPEG-4. Face animation allows to animate a proprietary face model or a face model downloaded to the decoder. We also address integration of the face animation tool with the text-to-speech interface (TTSI), so that face animation can be driven by text input.Keywords MPEG-4; Face animation; Co

3、mputer graphics; Deformation; VRML; Speech synthesizer; Electronic commerce1. IntroductionMPEG-4 is an object-based multimedia compression standard, which allows for encoding of different audiovisual objects (AVO) in the scene independently. The visual objects may have natural or synthetic content,

4、including arbitrary shapevideo objects, special synthetic objects such as human face and body, and generic 2D/3D objects composed of primitives like rectangles, spheres, or indexed face sets, which define an object surface by means of vertices and surface patches. The synthetic visual objects are an

5、imated by transforms and special-purpose animation techniques, such as face/body animation and 2D-mesh animation. MPEG-4 also provides synthetic audio tools such as structured audio tools and a text-to-speech interface (TTSI). This paper presents a detailed overview of synthetic visual objects suppo

6、rted by MPEG-4 version-1, namely animated faces and animated arbitrary 2D uniform and Delaunay meshes. We also address integration of the face animation tool with the TTSI, so that face animation can be driven by text input. Body animation and 3D mesh compression and animation will be supported in M

7、PEG-4 version-2, and hence are not covered in this article.The representation of synthetic visual objects in MPEG-4 is based on the prior VRML standard11,12and13using nodes such asTransform, which defines rotation, scale or translation of an object, andIndexedFaceSetdescribing 3D shape of an object

8、by an indexed face set. However, MPEG-4 is the first international standard that specifies a compressed binary representation of animated synthetic audio-visual objects. It is important to note that MPEG-4 only specifies the decoding of compliant bit streams in an MPEG-4 terminal. The encoders do en

9、joy a large degree of freedom in how to generate MPEG-4 compliant bit streams. Decoded audio-visual objects can be composed into 2D and 3D scenes using the binary format for scenes (BIFS)13, which also allows implementation of animation of objects and their properties using the BIFS-Anim node. We re

10、commend readers to refer to an accompanying article on BIFS for the details of implementation of BIFS-Anim. Compression of still textures (images) for mapping onto 2D or 3D meshes is also covered in another accompanying article. In the following, we cover the specification and compression of face an

11、imation and 2D-mesh animation in2and3, respectively.2. Face animationMPEG-4 foresees that talking heads will serve an important role in future customer service applications. For example, a customized agent model can be defined for games or web-based customer service applications. To this effect, MPE

12、G-4 enables integration of face animation with multimedia communications and presentations and allows face animation over low bit-rate communication channels, for point to point as well as multi-point connections with low delay. With AT&Ts implementation of an MPEG-4 face animation system, we can an

13、imate a face models with a data rate of 3002000bits/s. In many applications like Electronic Commerce, the integration of face animation and text to speech synthesizer is of special interest. MPEG-4 defines an application program interface for TTS synthesizer. Using this interface, the synthesizer ca

14、n be used to provide phonemes and related timing information to the face model. The phonemes are converted into corresponding mouth shapes enabling simple talking head applications. Adding facial expressions to the talking head is achieved using bookmarks in the text. This integration allows for ani

15、mated talking heads driven just by one text stream at a data rate of less than 200bits/s23. Subjective tests reported in25show that an Electronic Commerce web site with talking faces gets higher ratings than the same web site without talking faces. In an amendment to the standard foreseen in 2000, M

16、PEG-4 will add body animation to its tool set, thus allowing the standardized animation of complete human bodies.In the following sections, we describe how to specify and animate 3D face models, compress facial animation parameters, and integrate face animation with TTS in MPEG-4. The MPEG-4 standar

17、d allows using proprietary 3D face models that are resident at the decoder as well as transmission of face models such that the encoder can predict the quality of the presentation at the decoder. InSection 2.1, we explain how MPEG-4 specifies a 3D face model and its animation using face definition p

18、arameters (FDP) and facial animation parameters (FAP), respectively.Section 2.2provides details on how to efficiently encode FAPs. The integration of face animation into an MPEG-4 terminal with text-to-speech capabilities is shown inSection 2.3. InSection 2.4, we describe briefly the integration of

19、face animation with MPEG-4 systems. MPEG-4 profiles with respect to face animation are explained inSection 2.5.2.1. Specification and animation of facesMPEG-4 specifies a face model in its neutral state, a number of feature points on this neutral face as reference points, and a set of FAPs, each cor

20、responding to a particular facial action deforming a face model in its neutral state. Deforming a neutral face model according to some specified FAP values at each time instant generates a facial animation sequence. The FAP value for a particular FAP indicates the magnitude of the corresponding acti

21、on, e.g., a big versus a small smile or deformation of a mouth corner. For an MPEG-4 terminal to interpret the FAP values using its face model, it has to have predefined model-specific animation rules to produce the facial action corresponding to each FAP. The terminal can either use its own animati

22、on rules or download a face model and the associated face animation tables (FAT) to have a customized animation behavior. Since the FAPs are required to animate faces of different sizes and proportions, the FAP values are defined in face animation parameter units (FAPU). The FAPU are computed from s

23、patial distances between major facial features on the model in its neutral state.In the following, we first describe what MPEG-4 considers to be a generic face model in its neutral state and the associated feature points. Then, we explain the facial animation parameters for this generic model. Final

24、ly, we show how to define MPEG-4 compliant face models that can be transmitted from the encoder to the decoder for animation.2.1.1. MPEG-4 face model in neutral stateAs the first step, MPEG-4 defines a generic face model in its neutral state by the following properties (seeFig. 1):gaze is in directi

25、on of theZ-axis,all face muscles are relaxed,eyelids are tangent to the iris,the pupil is one-third of the diameter of the iris,lips are in contact; the line of the lips is horizontal and at the same height of lip corners,the mouth is closed and the upper teeth touch the lower ones,the tongue is fla

26、t, horizontal with the tip of tongue touching the boundary between upper and lower teeth.A FAPU and the feature points used to derive the FAPU are defined next with respect to the face in its neutral state.2.1.1.1. Face animation parameter unitsIn order to define face animation parameters for arbitr

27、ary face models, MPEG-4 defines FAPUs that serve to scale facial animation parameters for any face model. FAPUs are defined as fractions of distances between key facial features (seeFig. 1). These features, such as eye separation, are defined on a face model that is in the neutral state. The FAPU al

28、low interpretation of the FAPs on any facial model in a consistent way, producing reasonable results in terms of expression and speech pronunciation. The measurement units are shown inTable 1.2.1.1.2. Feature pointsMPEG-4 specifies 84 feature points on the neutral face (seeFig. 2). The main purpose

29、of these feature points is to provide spatial references for defining FAPs. Some feature points such as the ones along the hairline are not affected by FAPs. However, they are required for defining the shape of a proprietary face model using feature points (Section 2.1.3). Feature points are arrange

30、d in groups like cheeks, eyes and mouth. The location of these feature points has to be known for any MPEG-4 compliant face model. The feature points on the model should be located according toFig. 2and the hints given inTable 6.2.1.2. Face animation parametersThe FAPs are based on the study of mini

31、mal perceptible actions and are closely related to muscle actions16,26,31and36. The 68 parameters are categorized into 10 groups related to parts of the face (Table 2). FAPs represent a complete set of basic facial actions including head motion, tongue, eye and mouth control. They allow representati

32、on of natural facial expressions (seeTable 7). For each FAP, the standard defines the appropriate FAPU, FAP group, direction of positive motion and whether the motion of the feature point is unidirectional (see FAP 3, open jaw) or bi-directional (see FAP 48, head pitch). FAPs can also be used to def

33、ine facial action units8. Exaggerated amplitudes permit the definition of actions that are normally not possible for humans, but are desirable for cartoon-like characters.The FAP set contains two high-level parameters, visemes and expressions (FAP group 1). A viseme (FAP 1) is a visual correlate to

34、a phoneme. Only 14 static visemes that are clearly distinguished are included in the standard set (Table 3). In order to allow for coarticulation of speech and mouth movement6, the shape of the mouth of a speaking human is not only influenced by the current phoneme, but also the previous and the nex

35、t phoneme. In MPEG-4, transitions from one viseme to the next are defined by blending only two visemes with a weighting factor. So far, it is not clear how this can be used for high-quality visual speech animation.The expression parameter FAP 2 defines the six primary facial expressions (Table 4,Fig

36、. 3). In contrast to visemes, facial expressions are animated by a value defining the excitation of the expression. Two facial expressions can be animated simultaneously with an amplitude in the range of 063 defined for each expression. The facial expression parameter values are defined by textual d

37、escriptions. The expression parameter allows for an efficient means of animating faces. They are high-level animation parameters. A face model designer creates them for each face model. Since they are designed as a complete expression, they allow animating unknown models with high subjective quality

38、1and23.Using FAP 1 and FAP 2 together with low-level FAPs 368 that affect the same areas as FAPs 1 and 2, may result in unexpected visual representations of the face. Generally, the lower level FAPs have priority over deformations caused by FAP 1 or 2. When specifying an expression with FAP 2, the e

39、ncoder may sent an init_face bit that deforms the neutral face of the model with the expression prior to superimposing FAPs 3-68. This deformation is applied with the neutral face constraints of mouth closure, eye opening, gaze direction and head orientation. Since the encoder does not know how FAPs

40、 1 and 2 are implemented, we recommend using only those low-level FAPs that will not interfere with FAPs 1 and 2.2.1.3. Face model specificationEvery MPEG-4 terminal that is able to decode FAP streams has to provide an MPEG-4 compliant face model that it animates (Section 2.1.3.1). Usually, this is

41、a model proprietary to the decoder. The encoder does not know about the look of the face model. Using a face definition parameter (FDP) node, MPEG-4 allows the encoder to completely specify the face model to animate. This involves defining the static geometry of the face model in its neutral state u

42、sing a scene graph (Section 2.1.3.3), defining the surface properties and defining the animation rules using face animation tables (FAT) that specify how this model gets deformed by the facial animation parameters (Section 2.1.3.4). Alternatively, the FDP node can be used to calibrate the proprietar

43、y face model of the decoder (Section 2.1.3.2). However, MPEG-4 does not specify how to calibrate or adapt a proprietary face model.2.1.3.1. Proprietary face modelIn order for a face model to be MPEG-4 compliant, it has to be able to execute all FAPs according to2.1.1and2.1.2. Therefore, the face mod

44、el has to have at least as many vertices as there are feature points that can be animated. Thus, an MPEG-4 compliant face model may have as little as 50 vertices. Such a model would not generate a pleasing impression. We expect to require at least 500 vertices for pleasant and reasonable face models

45、 (Fig. 3).A proprietary face model can be built in four steps:1.We build the shape of the face model and define the location of the feature points on the face model according toSection 2.1.1andFig. 2.2.For each FAP, we define how the feature point has to move. For most feature points, MPEG-4 defines

46、 only the motion in one dimension. As an example, we consider FAP 54, which displaces the outer right lip corner horizontally. Human faces usually move the right corner of the lip backward as they move it to the right. It is left up to the face model designer to define a subjectively appealing face

47、deformation for each FAP.3.After the motion of the feature points is defined for each FAP, we define how the motion of a feature point affects its neighboring vertices. This mapping of feature point motion onto vertex motion can be done using lookup tables like FAT (Section 2.1.3.4)24, muscle-based

48、deformation16,31and36or distance transforms17.4.For expressions, MPEG-4 provides only qualitative hints on how they should be designed (Table 4). Similarly, visemes are defined by giving sounds that correspond to the required lip shapes (Table 3). FAPs 1 and 2 should be designed with care since they

49、 will mostly be used for visually appealing animations.Following the above steps, our face model is ready to be animated with MPEG-4 FAPs. Whenever a face model is animated, gender information is provided to the terminal. MPEG-4 does not require using a different face model for male or female gender. We recommend that the decoder reads the gender

展开阅读全文