Human Perception of Media Synchronization

Ralf Steinmetz and Clemens Engler, IBM European Networking Center, Heidelberg, Germany

Content Summary

Paper describes a number of experiments designed to deliver hard data regarding how much synchronization delay between related streams is acceptable to human subjects. The required bounds range from very tight (nanoseconds) for stereo audio to moderate (milliseconds) for video/audio lipsyncing to loose (nearly a second) for audio/telepointer. Video experiments were done using PAL, for which the frame rate differs from NTSC; not clear whether the results should really be measured in pure time or in frames.

One interesting result is that audio-leading-video sounds awkward because we are used to adjusting internally for the slower speed of sound relative to light.

Relevance to Multimedia

Provides some hard data (numbers) that system designers can use to tune multimedia delivery systems, unlike survey papers (e.g. Roufs). However, as in the Roufs survey, there are many variables in such experiments, and the methodology and sample set size were not made explicit.


4 out of 5: even though methodology is sketchy, results are useful to systems designers, not just perceptual psychologists.
