音频信号处理基础篇.ppt_三一文库31doc.com

资源描述

《音频信号处理基础篇.ppt》由会员分享，可在线阅读，更多相关《音频信号处理基础篇.ppt（44页珍藏版）》请在三一文库上搜索。

1、音频信号处理（基础篇）,参考文献,1) 本领域的学科发展 2) 本领域的技术发展,0 开胃酒,参考文献,网络,哪些素质（能力）是重要的？,一个项目的研发过程,有什么,是什么,为什么,怎么做,英语,数学,工具,“物理”概念思路,1 入手：实验的原材料,Wav文件,例子：keep friends with.wav,偏移地址字节数数据类型内容 00H 4 char “RIFF“标志 04H 4 long 文件长度，File length-8, so, is data length+0x24 (File length = data length + 0x2c) 08H 4 char “WA

2、VE“标志 0CH 4 char “fmt“标志 10H 4 过渡字节（不定） 14H 2 int 格式类别（10H为PCM形式的声音数据) 16H 2 int 通道数，单声道为1，双声道为2 18H 4 long 采样率（每秒样本数） 1CH 4 long 波形音频数据传送速率，其值为通道数每秒数据位数每样本的数据位数8。播放软件利用此值可以估计缓冲区的大小。,20H 2 int 数据块的调整数（按字节算的），其值为通道数每样本的数据位值8。播放软件需要一次处理多个该值大小的字节数据，以便将其值用于缓冲区的调整。 22H 2 每样本的数据位数，表示每个声道中各个样本的数据位数。如

3、果有多个声道，对每个声道而言，样本大小都一样。 24H 4 char 数据标记符data 28H 4 long 语音数据的长度,typedef struct char Riff4; unsigned long sizeOfFile; char WAVEfmt8; unsigned long sizeOfFmt; short int wFormatTag; short int nChannels; unsigned long nSamplesPerSec; unsigned long navgBytesPerSec; short int nBlockAlign; unsigned short

4、nBitPerSample; char Cdata4; unsigned long sizeOfData; HeadOfWave;,几个说明。,* 文件长度和数据长度,* 关键量：采样率/声道数/量化模式/量化bit,* navgBytesPerSec和nBlockAlign的计算,* 程序举例和说明,2 基本概念,采样率,量化bit,2.1 采样率,48k/44k/32k/22k/16k/11k/8kHz,两条线： 44k/22k/11k 32k/16k/8k,为什么是这些值？,代表频率，32是22kHz,2.2 音频信号的带宽,文件 keep_friend_with.wav （采样率4

5、4kHz）,7kHz,22kHz,4kHz,文件 keep_friend_with_8k.wav （采样率8kHz）,4kHz,上述文件很特殊。采集环境很好。,一般认为：,* 语音（speech） 3003400kHz，采样率8kHz,* 宽带语音（wide-band speech）带宽7kHz（50-7k），采样率16kHz,* 音频（audio）带宽20kHz（20-20k），采样率44.1kHz，48kHz,2.2 音频信号的带宽,采样率为什么是那些值？,Nyquist Sampling Theorem,为什么44.1kHz？,20kHz -(Nyquist) 40kHz-(Roll

6、off from passband to stopband ) 44kHz - 44.1kHz?,At the time the choice was made, only recorders capable of storing such high rates were VCRs. NTSC: 490 lines/frame, 3 samples/line, 30 frames/s = 44100 samples/s PAL: 588 lines/frame, 3 samples/line, 25 frames/s = 44100 samples/s,Prof. Brian L. Evans

7、 Dept. of Electrical and Computer Engineering The University of Texas at Austin,Listen to the sounds,keep_friends_with(44k_mono).wav,keep_friends_with(22k_mono).wav,keep_friends_with(16k_mono).wav,keep_friends_with(11k_mono).wav,keep_friends_with(8k_mono).wav,对语音信号，8kHz/11kHz 采样率是一个效果； 16kHz采样率以上是一个

8、效果。,所以，对语音信号而言，分为voice/wideband speech就可以了。,2.2 量化bits,线性量化/非线性量化,量化信噪比：6b dB。,6.02b + 1.76,复读机规范：声音从磁带上复读到芯片上，再用耳机听芯片上的声音时有用信号和噪声之间的幅度差，标准规定34dB。,Listen to the sounds,keep_friends_with(16k_mono).wav,keep_friends_with(16k_mono)_8b.wav,8bit线性量化的文件，明显带了背景噪声。,从经验出发，可接受的量化bit，应该是？,入手：实验的原材料,16kHz or 8kH

9、z采样率的语音文件；,16bit or 14bit 线性量化；,44.1kHz采样率的音乐文件；,3 我常用的音频处理的工具,VC6.0, using c;,matlab,cooledit,Matlab (Mathworks),Math. environment Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction voicebox,Matlab (Mathworks),pros: open, powerful, scripting, excell

10、ent plotting cons: poor speech community, standards, not designed for big files,其它的语音分析工具？,Goldwave (audio editor) Esps Xwaves (routines + visual.) Praat (speech analysis) Wavesurfer (speech editor) Transcriber (annotation tool) OGI speech tools (routines + app. dev.) winpitch, pitchworks, phonedit,

11、Goldwave,Goldwave,pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface cons: nothing for speech (pitch, formant), windows only, no scripting Good for file edition not for sp

12、eech,Esps - Waves,Developed by Entropic + AT&T. Now public Comp.speech FAQ says: Esps: comprehensive set of speech analysis/processing tools Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility,Esps waves,pros: powerful, designed fo

13、r big files, cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped,Praat,Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam general purpose speech tool : edition, segmentation and labeling, prosodic man

14、ipulation,Praat,pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation cons: limited scripting language, native format of transcription and pitch files,WaveSurfer,Open Source tool for soun

15、d visualization and manipulation speech/sound analysis and sound annotation/transcription platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications Requires SnackToolKit,Transcriber,Author

16、s: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis,OGI speech tools/CSLU Toolkit,development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI Includes : An X windows display tool (LYRE) display, edit spee

17、ch signal, spectrograms, phoneme labels, and other information a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. MAN Pages RAD rapid application development points of entry: Package(C), script(tcl), GUI(tk) levels free for research use,Summary,= yes but requires some dev.,

展开阅读全文