基于内容的多媒体检索(Content-based multimedia retrieval).doc

上传人:rrsccc 文档编号:11045608 上传时间:2021-06-21 格式:DOC 页数:17 大小:46.50KB
返回 下载 相关 举报
基于内容的多媒体检索(Content-based multimedia retrieval).doc_第1页
第1页 / 共17页
基于内容的多媒体检索(Content-based multimedia retrieval).doc_第2页
第2页 / 共17页
基于内容的多媒体检索(Content-based multimedia retrieval).doc_第3页
第3页 / 共17页
基于内容的多媒体检索(Content-based multimedia retrieval).doc_第4页
第4页 / 共17页
基于内容的多媒体检索(Content-based multimedia retrieval).doc_第5页
第5页 / 共17页
点击查看更多>>
资源描述

《基于内容的多媒体检索(Content-based multimedia retrieval).doc》由会员分享,可在线阅读,更多相关《基于内容的多媒体检索(Content-based multimedia retrieval).doc(17页珍藏版)》请在三一文库上搜索。

1、基于内容的多媒体检索(Content-based multimedia retrieval)Content-based multimedia retrievalOf content based multimedia retrieval relevant introduces the concept, characteristics, based on the content analysis method is put forward, in the compressed domain directly analyze the MPEG audio signal, to achieve mul

2、timedia real-time analysis retrieval. Algorithm is divided into three steps: first use of compressed domain characteristics of audio signal is split, and then layered method is applied to the segmentation of audio clips into music, voice and other three basic categories; Due to identity the speakers

3、 words is important in speech signal retrieval clues, finally using hidden markov chain to realize the man who has nothing to do with the text word recognition, and identifies out identity of speech signal and its corresponding video annotation.Keywords audio retrieval concept multimedia content-bas

4、ed retrieval compression domain hidden markov chain speaker identifies multimedia retrievalThe introductionWith the development of computer application technology and the improvement of Internet speed, multimedia information such as text, audio and video can be accessed by users. In this way, the ma

5、jor problems that computer users face in processing information have been transformed from the early information deficiency to the rapid and reasonable retrieval of information needed from mass information.Therefore, from the early 1990s, the content-based image (video) retrieval became one of the h

6、ot spots in the multimedia field. 1 2. In the image (video) retrieval based on content, color, texture, shape, and motion can be extracted visual characteristics, such as characterization of the image (video) the semantics contained in the content, so as to realize the image (video) data query and m

7、anagement.The principle and characteristics of multimedia retrieval based on contentMultimedia retrieval is a content based retrieval (CBR: content-based retrieval). So-called content-based retrieval is the media object and the content of the context semantic retrieval environment, like the color, t

8、exture and shape of the image, the movement of the video camera, scene, lens, pitch, loudness, tone of voice, etc. Content-based retrieval to breakthrough the limitation of traditional text-based retrieval technology, directly on the image, video and audio content analysis, extract characteristics a

9、nd semantics, use of these content indexing and retrieval. In this retrieval process, it mainly USES some methods of image processing, pattern recognition, computer vision, image comprehension and other methods as part of the basic technology, which is the synthesis of various technologies.Compared

10、with traditional information retrieval, CBR has the following characteristics:(1) similarity retrieval: CBR USES a kind of approximate matching (or partial matching) method and technology of incremental refinement to query and retrieve results, abandoned the traditional precise matching technology,

11、avoid the because of the uncertainty brought by the traditional retrieval methods.(2) direct clues to extract information from the content: CBR directly analyze the text, images, video, audio, extract the contents characteristic, then use these content indexing and retrieval.(3) satisfy users multi-

12、level retrieval requirements: CBR retrieval system is usually composed of media library, feature library and knowledge base. Media library contains multimedia data such as text, image, audio, video, etc.The feature library contains features of user input and the content characteristics of automatic

13、extraction. Knowledge base contains domain knowledge and general knowledge, and knowledge expression can be changed to meet application requirements in various fields.(4) rapid retrieval of large databases (sets) : CBR often has a large and diverse multimedia database, which can achieve quick retrie

14、val of multimedia information.Based on content analysis methodWe know that the video and audio are in chronological order to organization, the traditional method to find the one fragment is through fast forward or to the order to browse the content such as fast search, this method not only requires

15、the user to concentration, and special waste of time. Due to the complexity of video and audio content contains abundant information data, video, audio retrieval has become a difficult problem in practical application, and based on the content analysis method is the main development trend of video,

16、audio retrieval.How to solve the problem of multimedia information content description, mainly is video processing and retrieval based on content analysis method, this method is with the development of multimedia data processing technology in recent years and put forward. The method of content analy

17、sis is to understand multimedia information from another perspective, from early basic color retrieval to multi-media features. For example: color, texture, shape, scene, lens, frame, etc. At present, the technology has been developed to the practical stage, where multimedia content description inte

18、rface mpeg-7 is currently widely accepted as an international standard, and its core is based on multimedia content analysis.The MPEG series standards media is currently the most widely used audio/visual media standards, widely used at present are mainly the MPEG - I, MPEG - II, mpeg-4 etc, they are

19、 both compression of digital images and audio coding, a kind of international standard mpeg-4 adopts according to the object of a certain time and space relations for the visual and audio coding approach. While mpeg-7 is developed on the basis of mpeg-4, mpeg-7 focuses on the description and definit

20、ion of the content of visual audio information, which is not related to the encoding and storage mode of multimedia information.Because audio also contains a large amount of semantic information, in recent years, content-based audio retrieval 3 has also been more and more attention, its main idea is

21、 through the extraction in the time domain, frequency domain features of the audio stream to describe audio content. Due to the nature of multimedia is by text, video and audio, and other media interactive integration, and they are more or less semantic relation between, one medium and other media c

22、an represent the same semantics, medium between index each other, such as by audio classification in 4 for video data indexing.But, neither content-based image retrieval or content-based audio retrieval (video), is based on visual or auditory perception feature similarity comparison of retrieval, an

23、d our description of the multimedia content contained is based on the semantic information. Therefore, the classification of multimedia data streams into predefined semantic models is a challenge for multimedia retrieval. Semantic conceptual models can be divided into three categories: one is advanc

24、ed semantics,This semantic is the result of highly abstract conceptualization of several multimedia events in different time and space, such as the formation of el nino climate, which requires the exploration of the thinking mechanism of the human brain. Secondly, the intermediate semantics, the sem

25、antic is involved in the high-level semantics described respectively, and the people or events do not involve cross several events, such as a host some kind of news report or of a football game; Finally, low-level semantics, which use visual or auditory information to categorize multimedia data, suc

26、h as music, voice or beach. The semantic annotation of multimedia data realizes the process of multimedia from unstructured to structured, which can effectively organize multimedia data flow and facilitate retrieval.In addition, with the popularization of network technology, real-time analysis of mu

27、ltimedia data (especially audio data) has also become a need for 6. To extract the characteristics of the traditional multimedia retrieval is basically based on the compressed domain, along with the development of multimedia technology, the advantages of MPEG transmission because of its easy to stor

28、e and become a multimedia data compression standard 7. When the MPEG data stream is semantic annotated with non-compressed domain method, it must be decoded first to extract the characteristics and analyze the characteristics, so that the calculation quantity is meaningless and the real-time effect

29、cannot be guaranteed. At the same time, the part of the MPEG audio coding combines hearing psychology, coding when considering the characteristics of human auditory perception, so directly on the MPEG compressed domain feature extraction, can make the awareness is not lost, ensure the correct unders

30、tanding of audio information.In the audio stream, the speaker is very important semantic information, such as different program hosts will report different content of news programs (sports, weather forecasts, current events, etc.). Through the analysis of the speaker voice, automatically identified

31、as the speakers words, can use identity to intermediate semantic annotation of audio, the speakers words can also classify the corresponding video information flow, realize index of different medium.Based on this, this paper proposes a direct multimedia in compressed domain analysis method: first of

32、 all, the MPEG data stream is divided into two parts of the video and audio, then the compressed audio stream segmentation and coarse, and to identify the identity of the speakers words sound bites to confirm; Finally, the corresponding voice audio and video are marked with the identity of the ident

33、ified person (see figure 1).FIG. 1 multimedia retrieval and classification process of compression domain featuresCompression domain audio feature extractionThe so-called audio feature is the data that is used to represent the original audio information. According to the difference of feature space,

34、the audio characteristics can be divided into three categories: time domain, frequency domain and time frequency: time domain features include short term energy, zero rate and linear prediction coefficient; The frequency domain features include linear prediction (LPC) inverted spectrum coefficient a

35、nd MFCC, etc. Time-frequency features include short time Fourier transform and wavelet coefficients. In recent years, in order to reflect the original audio data flow is processed by a cochlear first, then place in the brain to form the fact that audio scene 10, 11 to simulate human auditory percept

36、ion model, some features can be extracted. Thus, the audio features can be divided into two categories: physics and perception, depending on whether the perceptual model is used. Physical features include short-term energy, zero rate, basic frequency, etc.It comes from the audio signal itself; Senso

37、ry features include pitch and pitch, which depend on the human auditory model. It is pointed out that some time-frequency characteristics also belong to perceptual characteristics, such as wavelet transform per layer decomposition is equivalent to a constant Q filter, which conforms to the auditory

38、perception characteristics of human ears.MPEG audio compression using the psychoacoustics model (psychoacoustics model), extract the features in the MPEG compressed domain directly, can keep the awareness, more like the human auditory perception system, realize the understanding of audio semantic co

39、ntent.First, the MPEG data flow is broken down into video and audio parts. The audio stream data is mpeg-2 Layer III and the sampling frequency is 22050Hz. The audio data is split into a frame sequence of about 20 milliseconds (576 samples per frame), according to the traditional voice processing re

40、quirement for signal processing into short-term frames.For each frame, the mean square root of each subband vector is first evaluated, which is a 32-dimensional subband vector and a 32-dimensional vector. Represent the characteristics of the frame, which can get the following specific features: (1)

41、the center of mass (Centroid) :, refers to the balance of a vector, Centroid reflects on the compressed domain audio signal basic frequency band; (2) attenuation cut-off frequency (Rolloff) : refers to the cut-off frequency of the audio signal energy attenuation at 3 db. Because the human ear is ver

42、y sensitive to the strong and weak change of audio signal, the attenuation cutoff frequency is the adaptive auditory threshold, which reflects the auditory masking characteristics in psychoacoustics. (3) flow spectrum (Spectral Flux) : refers to the vector between two frames to the mold two differen

43、ce after normalization, reflects the dynamic characteristics of audio signal spectrum flow; (4) square root () : to measure the intensity of this frame audio signal. The switching of audio scenes is usually accompanied by volume changes, so it is an important indicator in segmentation.Due to the non

44、-stationary characteristics of audio signal, in order to better characterize the audio sequence, the four characteristics of the statistical information is extracted as the audio features: experiment, using 40 frames for a window (about 1 second), for each frame, calculate the previous window for al

45、l the frames in the center of mass, damping cut-off frequency and the mean and variance of spectrum flow and calculate the root mean square below a certain threshold, the proportion of the seven characteristics of statistical significance.In this way, there are 11 characteristics for each frame. The

46、 first 40 frames of each audio data stream are the average of all the corresponding statistical characteristics of this audio stream.These 11 characteristics reflect the static and dynamic characteristics of the audio and conform to the psychoacoustic model, which forms the audio on the compression

47、domainThe description operator of the signal is used for audio segmentation, coarse-splitting, and identification.Audio signal segmentation and coarse scoreResearch has shown that although audio signal characteristics change dramatically over time, but for the same audio class, its characteristics h

48、ave roughly the distance between the change, by choosing good window distance can reflect the regularity to 12 13. Using the 11 features extracted before, the following audio segmentation algorithm is implemented in the experiment: (1) read the MPEG audio stream, and find the eigenvector for each fr

49、ame, which is 11 dimensions,Represents time (frame number); (2) to find the logarithmic Euclidean distance between adjacent eigenvectors and between them, which represents the first characteristic of the first frame; (3) for the obtained sequence, find the difference between the mean value of the front and rear window; This process is called window (4) if the value is greater than a threshold, which happ

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 社会民生


经营许可证编号:宁ICP备18001539号-1