多媒体技术课后题-文档资料.ppt

资源描述

《多媒体技术课后题-文档资料.ppt》由会员分享，可在线阅读，更多相关《多媒体技术课后题-文档资料.ppt（102页珍藏版）》请在三一文库上搜索。

1、1,Ch3 Graphics and Image Data Representations,2,1. Briefly explain why we need to be able to have less than 24-bit color and why this makes for a problem. Generally, what do we need to do to adaptively transform 24-bit color values to 8-bit ones? Answer: May not be able to handle such large file siz

2、es or not have 24-bit displays. The colors will be somewhat wrong, however.,3,We need to cluster color pixels so as to best use the bits available to be as accurate as possible for the colors in an image. In more detail: variance minimization quantizationvmquant.m Minimum variance quantization alloc

3、ates more of the available colormap entries to colors that appear frequently in the input image and allocates fewer entries to colors that appear infrequently. Therefore if there are for example many reds, as in a red apple, there will be more resolution in the red part of the color cube. An excelle

4、nt implementation of this idea is Wus Color Quantizer (see Graphics Gems vol. II, pp. 126-133).,4,2. Suppose we decide to quantize an 8-bit grayscale image down to just 2 bits of accuracy. What is the simplest way to do so? What ranges of byte values in the original image are mapped to what quantize

5、d values? Answer:,5,6,0 to 63 64 to 127 128 to 191 192 to 255 Then reconstruction values should be taken as the middle of these ranges; i.e., 32 96 160 224,7,3. Suppose we have a 5-bit grayscale image. What size of ordered dither matrix do we need to display the image on a 1-bit printer? Answer: 25=

6、32 levels = n2+1 with n=6; therefore need D(6),8,4. Suppose we have available 24 bits per pixel for a color image. However, we notice that humans are more sensitive to R and G than to B in fact, 1.5 times more sensitive to R or G than to B. How could we best make use of the bits available? Answer: r

7、atio is 3:3:2, so use bits 9:9:6 for R:G:B.,9,5. At your job, you have decided to impress the boss by using up more disk space for the companys grayscale images. Instead of using 8 bits per pixel, youd like to use 48 bits per pixel in RGB. How could you store the original grayscale images so that in

8、 the new format they would appear the same as they used to, visually?,10,Answer: 48 bits RGB means 16 bits per channel: so re-store the old ints, which were 2 8, as new ints 2 16. But then the new values have to be created by multiplying the old values by 2 8, so that e.g. a mid-gray is still a mid-

9、gray. As well, have to duplicate the old gray into all three of R,G,B.,11,6. For the color LUT problem, try out the median-cut algorithm on a sample image. Explain briefly why it is that this algorithm, carried out on an image of red apples, puts more color gradation in the resulting 24-bit color im

10、age where it is needed, among the reds.,12,7. Write down an algorithm (pseudocode) for calculating a color histogram for RGB data. Answer: int hist256256256; image is an appropriate struct with int fields red,green,blue for i=0.(MAX_Y-1) for j=0.(MAX_X-1) R = imagexy.red; G = imagexy.green; B = imag

11、exy.blue; histRGB+;,13,Ch4 Color in Image and Video,14,Exercise 3,1. Consider the following set of color-related terms: (a) wavelength (b) color level (c) brightness (d) whiteness How would you match each of the following (more vaguely stated) characteristics to each of the above terms?,15,(a) lumin

12、ance )brightness (b) hue ) wavelength (c) saturation )whiteness (d) chrominance )color level,16,2. What color is outdoor light? For example, around what wavelength would you guess the peak power is for a red sunset? For blue sky light? Answer: 450 nm, 650 nm.,17,3. (a) Suppose images are not gamma c

13、orrected by a camcorder. Generally, how would they appear on a screen? Answer: Too dark at the low-intensity end.,18,(b) What happens if we artificially increase the output gamma for stored image pixels? (We can do this in Photoshop.) What is the effect on the image? Answer: Increase the number of b

14、right pixels we increase the number of pixels that map to the upper half of the output range. This creates a lighter image. and incidentally, we also decrease highlight contrast and increase contrast in the shadows.,19,Ch5 Fundamental Concepts in Video,20,1. NTSC video has 525 lines per frame and 63

15、.6 sec per line, with 20 lines per field of vertical retrace and 10.9 sec horizontal retrace. (a) Where does the 63.6 sec come from? Answer:,21,(b) Which takes more time, horizontal retrace or vertical retrace? How much more time? Answer:,22,2. Which do you think has less detectable flicker, PAL in

16、Europe or NTSC is North America? Justify your conclusion. Answer: PAL could be better since more lines, but is worse because of fewer frames/sec.,23,3. Sometimes the signals for television are combined into fewer than all the parts required for TV transmission. (a) Altogether, how many and what are

17、the signals used for studio broadcast TV? Answer: 5 R, G, B, audio, sync; can say “blanking” instead, too.,24,(b) How many and what signals are used in S-Video? What does S-Video stand for? Answer: Luminance+chrominance = 2+audio+sync = 4 Separated video (c) How many signals are actually broadcast f

18、or standard analog TV reception? What kind of video is that called? Answer: 1 Composite,25,4. One sometimes hears that the old Betamax format for videotape, which competed with VHS and lost, was actually a better format. How would such a statement be justified? Answer: Betamax has more samples per l

19、ine: 500, as opposed to 240.,26,5. We dont see flicker on a workstation screen when displaying video at NTSC frame rate. Why do you think this might be? Answer: NTSC video is displayed at 30 frames per sec, so flicker is possibly present. Nonetheless, when video is displayed on a workstation screen

20、the video buffer is read and then rendered on the screen at a much higher rate, typically the refresh rate 60 to 90 Hz so no flicker is perceived.,27,(And in fact most display systems have double buffers, completely removing flicker: since main memory is much faster than video memory, keep a copy of

21、 the screen in main memory and then when we this buffer update is complete, the whole buffer is copied to the video buffer.),28,6. Digital video uses chroma subsampling. What is the purpose of this? Why is it feasible? Answer: Human vision has less acuity in color vision than it has in black and whi

22、teone can distinguish close black lines more easily than colored lines, which soon are perceived just a mass without texture as the lines move close to each other. Therefore, it is acceptable perceptually to remove a good deal of color information. In analog, this is accomplished in broadcast TV by

23、simply assigning a smaller frequency bandwidth to color than to black and white information. In digital, we “decimate” the color signal by subsampling (typically, averaging nearby pixels). The purpose is to have less information to transmit or store.,29,7. What are the most salient differences betwe

24、en ordinary TV and HDTV? Answer: More pixels, and aspect ratio of 16/9 rather than 4/3. What was the main impetus for the development of HDTV? Immersion “being there”. Good for interactive systems and applications such as virtual reality.,30,8. What is the advantage of interlaced video? What are som

25、e of its problems? Answer: Positive: Reduce flicker. Negative: Introduces serrated edges to moving objects and flickers along horizontal edges.,31,9. One solution that removes the problems of interlaced video is to de-interlace it. Why can we not just overlay the two fields to obtain a de-interlaced

26、 image? Suggest some simple de-interlacing algorithms that retain information from both fields. Answer: The second field is captured at a later time than the first, creating a temporal shift between the odd and even lines of the image.,32,The methods used to overcome this are basically two: non-moti

27、on compensated and motion compensated de-interlacing algorithms. The simplest non-motion compensated algorithm is called “Weave”; it performs linear interpolation between the fields to fill in a full, “progressive”, frame. A defect with this method is that moving edges show up with significant serra

28、ted lines near them.,33,A better algorithm is called “Bob”: in this algorithm, one field is discarded and a a full frame is interpolated from a single field. This method generates no motion artifacts (but of course detail is reduced in the resulting progressive image).,34,In a vertical-temporal (VT)

29、 de-interlacer, vertical detail is reduced for higher temporal frequencies. Other, non-linear, techniques are also used. Motion compensated de-interlacing performs inter-field motion compensation and then combinesfields so as to maximize the vertical resolution of the image.,35,Ch6 Basics of Digital

30、 Audio,36,Exercise 1,1. My old Soundblaster card is an 8bit card. (a) What is it 8 bits of? (b) What is the best SQNR (Signal to Quantization Noise Ratio) it can achieve?,37,Answer:,38,2. If a set of ear protectors reduces the noise level by 30 dB, how much do they reduce the intensity (the power)?

31、Answer: A reduction in intensity of 1000.,39,3. A loss of audio output at both ends of the audible frequency range is inevitable, due to the frequency response function of an audio amplifier and the medium (e.g., tape). (a) If the output was 1 volt for frequencies at midrange, what is the output vol

32、tage after a loss of 3 dB at 18 kHz? (b) To compensate for the loss, a listener can adjust the gain (and hence the output) on an equalizer at different frequencies. If the loss remains 3 dB and a gain through the equalizer is 6 dB at 18 kHz, what is the output voltage now? Hint: Assume log102 = 0.3.

33、,40,41,4. Suppose the sampling frequency is 1.5 times the true frequency. What is the alias frequency? Answer: 0.5 times the True Frequency.,42,5. In a crowded room, we can still pick out and understand a nearby speakers voice, notwithstanding the fact that general noise levels may be high. This is

34、known as the cocktail-party effect. The way it operates is that our hearing can localize a sound source by taking advantage of the difference in phase between the two signals entering our left and right ears (binaural auditory perception). In mono, we could not hear our neighbors conversation well i

35、f the noise level were at all high. State how you think a karaoke machine works.,43,Hint: The mix for commercial music recordings is such that the “pan” parameter is different going to the left and right channels for each instrument. That is, for an instrument, either the left or right channel is em

36、phasized. How would the singers track timing have to be recorded to make it easy to subtract the sound of the singer (which is typically done)?,44,Answer: For the singer, left and right is always mixed with the exact same pan. This information can be used to subtract out the sound of the singer. To

37、do so, replace the left channel by the difference between the left and the right, and boost the maximum amplitude; and similarly for the right channel.,45,6. The dynamic range of a signal V is the ratio of the maximum to the minimum absolute value, expressed in decibels. The dynamic range expected i

38、n a signal is to some extent an expression of the signal quality. It also dictates the number of bits per sample needed to reduce the quantization noise to an acceptable level. For example, we may want to reduce the noise to at least an order of magnitude below Vmin. Suppose the dynamic range for a

39、signal is 60 dB. Can we use 10 bits for this signal? Can we use 16 bits?,46,47,48,7. Suppose the dynamic range of speech in telephony implies a ratio Vmax / Vmin of about 256. Using uniform quantization, how many bits should we use to encode speech to make the quantization noise at least an order of

40、 magnitude less than the smallest detectable telephonic sound? Answer: Vmin = Vmax / 256. The quantization noise is Vmax=2exp(n), if we use n bits. Therefore to get quantization noise about a factor of 16 below the minimum sound, we need 12 bits.,49,8. Perceptual nonuniformity is a general term for

41、describing the nonlinearity of human perception. That is, when a certain parameter of an audio signal varies, humans do not necessarily perceive the difference in proportion to the amount of change. (a) Briefly describe at least two types of perceptual nonuniformities in human auditory perception. (

42、b) Which one of them does A-law (or -law) attempt to approximate? Why could it improve quantization?,50,Answer: (a): (1) Logarithmic response to magnitude, (2) different sensitivity to different frequencies, (b): A-law (or -law) approximates the non-linear response to magnitude. It makes better use

43、of the limited number of bits available for each quantized data.,51,9. Suppose a signal contains tones at 1, 10, and 21 kHz and is sampled at the rate 12 kHz (and then processed with an antialiasing filter limiting output to 6 kHz). What tones are included in the output? Hint: Most of the output con

44、sists of aliasing. Answer: 1 kHz, 12-10=2 kHz, and 2*12-21=3 kHz tones are present.,52,10. (a) Can a single MIDI message produce more than one note sounding? Answer: No. (b) Is it possible for more than one note to sound at once on a particular instrument? If so, how is it done in MIDI? Answer: Yes

45、use two NoteOn messages for one channel before the NoteOff message is sent.,53,(c) Is the Program Change MIDI message a Channel Message? What does this message accomplish? Based on the Program Change message, how many different instruments are there in General MIDI? Why? Answer: Yes. Replaces patch

46、for a channel. 128, since has one data byte, which must be in 0.127.,54,(d) In general, what are the two main kinds of MIDI messages? In terms of data, what is the main difference between the two types of messages? Within those two categories, list the different subtypes. Answer: Channel Messages an

47、d System Messages. Channel voice messages, Channelmodemessages, System real-time messages, System common messages, System exclusive messages. Channel messages have a status byte with leading most-significant-bit set, and 4 bits of channel information; System messages have the 4 MSBs set.,55,11. (a)

48、Give an example (in English, not hex) of a MIDI voice message. Answer: NoteOn (b) Describe the parts of the “assembler” statement for the message. Answer: opcode=Note on; data = note, or key, number; data =“velocity”=loudness.,56,(c) What does a Program Change message do? Suppose Program change is h

49、ex “ PB =0.4; PC = 0.1. For simplicity, lets also assume that both encoder and decoder know that the length of the messages is always 3, so there is no need for a terminator. i. How many bits are needed to encode the message BBB by Huffman coding? Answer: 6 bits. Huffman Code: A - 0, B - 10, C - 11; or A - 1, B - 00, C - 01.,67,ii. How many bits are needed to encode the message BBB by arithmetic coding?,68,4、(a) What are the advantages of Adaptive Huffman Coding compared to the original Huffman Coding algorithm? (b) Assume that the Adaptive Huffman Coding is used to code an info

展开阅读全文