音频信号处理基础篇.ppt

上传人:本田雅阁 文档编号:2680803 上传时间:2019-05-05 格式:PPT 页数:44 大小:1.01MB
返回 下载 相关 举报
音频信号处理基础篇.ppt_第1页
第1页 / 共44页
音频信号处理基础篇.ppt_第2页
第2页 / 共44页
音频信号处理基础篇.ppt_第3页
第3页 / 共44页
音频信号处理基础篇.ppt_第4页
第4页 / 共44页
音频信号处理基础篇.ppt_第5页
第5页 / 共44页
点击查看更多>>
资源描述

《音频信号处理基础篇.ppt》由会员分享,可在线阅读,更多相关《音频信号处理基础篇.ppt(44页珍藏版)》请在三一文库上搜索。

1、音频信号处理(基础篇),参考文献,1) 本领域的学科发展 2) 本领域的技术发展,0 开胃酒,参考文献,网 络,哪些素质(能力)是重要的?,一个项目的研发过程,有什么,是什么,为什么,怎么做,英语,数学,工具,“物理”概念 思路,1 入手:实验的原材料,Wav文件,例子:keep friends with.wav,偏移地址 字节数 数据类型 内 容 00H 4 char “RIFF“标志 04H 4 long 文件长度,File length-8, so, is data length+0x24 (File length = data length + 0x2c) 08H 4 char “WA

2、VE“标志 0CH 4 char “fmt“标志 10H 4 过渡字节(不定) 14H 2 int 格式类别(10H为PCM形式的声音数据) 16H 2 int 通道数,单声道为1,双声道为2 18H 4 long 采样率(每秒样本数) 1CH 4 long 波形音频数据传送速率,其值为通道数每秒数据 位数每样本的数据位数8。播放软件利用此值可 以估计缓冲区的大小。,20H 2 int 数据块的调整数(按字节算的),其值为通道数 每样本的数据位值8。播放软件需要一次处理多 个该值大小的字节数据,以便将其值用于缓冲区的 调整。 22H 2 每样本的数据位数,表示每个声道中各个样本的数 据位数。如

3、果有多个声道,对每个声道而言,样本 大小都一样。 24H 4 char 数据标记符data 28H 4 long 语音数据的长度,typedef struct char Riff4; unsigned long sizeOfFile; char WAVEfmt8; unsigned long sizeOfFmt; short int wFormatTag; short int nChannels; unsigned long nSamplesPerSec; unsigned long navgBytesPerSec; short int nBlockAlign; unsigned short

4、nBitPerSample; char Cdata4; unsigned long sizeOfData; HeadOfWave;,几个说明。,* 文件长度和数据长度,* 关键量:采样率/声道数/量化模式/量化bit,* navgBytesPerSec和nBlockAlign的计算,* 程序举例 和 说明,2 基本概念,采样率,量化bit,2.1 采样率,48k/44k/32k/22k/16k/11k/8kHz,两条线: 44k/22k/11k 32k/16k/8k,为什么是这些值?,代表频率,32是22kHz,2.2 音频信号的带宽,文件 keep_friend_with.wav (采样率4

5、4kHz),7kHz,22kHz,4kHz,文件 keep_friend_with_8k.wav (采样率8kHz),4kHz,上述文件很特殊。采集环境很好。,一般认为:,* 语音(speech) 3003400kHz,采样率8kHz,* 宽带语音(wide-band speech) 带宽7kHz(50-7k),采样率16kHz,* 音频(audio) 带宽20kHz(20-20k),采样率44.1kHz,48kHz,2.2 音频信号的带宽,采样率为什么是那些值?,Nyquist Sampling Theorem,为什么44.1kHz?,20kHz -(Nyquist) 40kHz-(Roll

6、off from passband to stopband ) 44kHz - 44.1kHz?,At the time the choice was made, only recorders capable of storing such high rates were VCRs. NTSC: 490 lines/frame, 3 samples/line, 30 frames/s = 44100 samples/s PAL: 588 lines/frame, 3 samples/line, 25 frames/s = 44100 samples/s,Prof. Brian L. Evans

7、 Dept. of Electrical and Computer Engineering The University of Texas at Austin,Listen to the sounds,keep_friends_with(44k_mono).wav,keep_friends_with(22k_mono).wav,keep_friends_with(16k_mono).wav,keep_friends_with(11k_mono).wav,keep_friends_with(8k_mono).wav,对语音信号,8kHz/11kHz 采样率是一个效果; 16kHz采样率以上是一个

8、效果。,所以,对语音信号而言,分为voice/wideband speech就可以了。,2.2 量化bits,线性量化/非线性量化,量化信噪比:6b dB。,6.02b + 1.76,复读机规范:声音从磁带上复读到芯片上,再用耳机听芯片上的声音时有用信号和噪声之间的幅度差,标准规定34dB。,Listen to the sounds,keep_friends_with(16k_mono).wav,keep_friends_with(16k_mono)_8b.wav,8bit线性量化的文件,明显带了背景噪声。,从经验出发,可接受的量化bit,应该是?,入手:实验的原材料,16kHz or 8kH

9、z采样率的语音文件;,16bit or 14bit 线性量化;,44.1kHz采样率的音乐文件;,3 我常用的音频处理的工具,VC6.0, using c;,matlab,cooledit,Matlab (Mathworks),Math. environment Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction voicebox,Matlab (Mathworks),pros: open, powerful, scripting, excell

10、ent plotting cons: poor speech community, standards, not designed for big files,其它的语音分析工具?,Goldwave (audio editor) Esps Xwaves (routines + visual.) Praat (speech analysis) Wavesurfer (speech editor) Transcriber (annotation tool) OGI speech tools (routines + app. dev.) winpitch, pitchworks, phonedit,

11、Goldwave,Goldwave,pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface cons: nothing for speech (pitch, formant), windows only, no scripting Good for file edition not for sp

12、eech,Esps - Waves,Developed by Entropic + AT&T. Now public Comp.speech FAQ says: Esps: comprehensive set of speech analysis/processing tools Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility,Esps waves,pros: powerful, designed fo

13、r big files, cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped,Praat,Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam general purpose speech tool : edition, segmentation and labeling, prosodic man

14、ipulation,Praat,pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation cons: limited scripting language, native format of transcription and pitch files,WaveSurfer,Open Source tool for soun

15、d visualization and manipulation speech/sound analysis and sound annotation/transcription platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications Requires SnackToolKit,Transcriber,Author

16、s: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis,OGI speech tools/CSLU Toolkit,development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI Includes : An X windows display tool (LYRE) display, edit spee

17、ch signal, spectrograms, phoneme labels, and other information a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. MAN Pages RAD rapid application development points of entry: Package(C), script(tcl), GUI(tk) levels free for research use,Summary,= yes but requires some dev.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1