CHI '95 ProceedingsTopIndexes
Short PapersTOC

Ear Tracking: Visualizing Auditory Localization Strategies

William Joseph King & Suzanne J. Weghorst

Human Interface Technology Laboratory
Fluke Hall; FJ-15
University of Washington
Seattle, Washington 98195
{jking,weghorst}@hitl.washington.edu

© ACM


Abstract

Auditory displays are an ongoing topic of human computer interaction research and have been shown to be beneficial in human interfaces. Further, binaural spatial acoustic displays are a topic of increasingly active research. As these virtual acoustic displays become more predominant, new methods for measuring user's perceptions and the display's effectiveness become necessary. A novel method for examining virtual acoustic displays, specifically localization strategies within these displays, is presented. This method is analogous to eye tracking in visual displays. Such a method may be useful in the evaluation of virtual acoustic displays and in the design of adaptive acoustic displays.

Keywords:

binaural audio, virtual acoustic displays, auditory perception, position tracking, adaptive interfaces

Introduction

Non-speech auditory displays are slowly gaining acceptance in human interface design. These displays have been shown to be highly effective in a variety of applications [1]. They allow the presentation of information without further cluttering the visual channel. Auditory displays may be used to reinforce the information which is being presented visually, or they may present other information. In either case, they are intended to reduce the cognitive load of the user.

The display, itself, may increase cognitive load if it differs from perceptual situations which humans normally encounter. This situation has stimulated the development of three dimensional auditory displays which are stabile in regard to the user's head position, more natural. These displays use head position data and head-related transfer functions (HRTF) to simulate sounds (moving or fixed position) that appear to be occurring in actual positions within the space around the user [6]. These display systems allow for the creation of complete auditory virtual environments.

Now researchers and designers are being tasked to effectively use these virtual acoustic environments. Unfortunately, there exists no method beyond performance measures to examine the user's perception of these virtual environments. Performance measures are not entirely satisfactory due to their gross nature. User's perceiving these environments exhibit elevation and front/back reversals perhaps more commonly than in physical environments [6]. These errors and subtle effects, such as non-individual HRTF's and individual differences in hearing, may not be illuminated through the use of performance measures alone.

METHOD

Unfortunately unlike some mammals (e.g., cats), humans do not move their outer ears while localizing sounds. However, they do actively move their heads and bodies to aid in listening and localization [5]. This movement is based on the orienting reflex [4] to stimuli which is an adaptive response to the presentation of stimuli in which the animal turns to face the source of the stimuli

The authors have developed a measure which they call "Ear Tracking" based on this natural head movement. The movement of the head is dynamically plotted (see Figure 1) in relation to the spatial sound sources presented to the user. This plot produces data somewhat analogous to eye tracking data [7]. Specifically, the position of a spatial tracker is sampled, and a vector representing the head orientation between the user's ears is calculated. This vector points to locations upon the inner surface of a virtual sphere; the point source spatial sounds (represented as numbers in boxes) are also located on the surface of this virtual sphere. These locations are dynamically projected onto a two dimensional surface which is represented on the computer's screen (see the Figures). The authors are exploring other projections, such as sinusoidal, gnomonic, and polar projections [2], to eliminate specific distortions.

Figure 1: A typical "cross" pattern. Note the color changes as a factor of time (each color represents one second).

The sampling rate for these traces was arbitrarily set at 10 Hz, but the maximum sampling rate of the tracker is the absolute limit. The passage of time is represented by the changing of the color of the trace. Each second is represented by a different color. Our ear tracker was implemented on a Macintosh II running the Gehring Research spatial sound system [3], and the sounds were presented to the blind-folded user via headphones. Position data was provided by a Polhemus 3Space tracker mounted on the headphones.

PILOT RESULTS

Six subjects have been presented with a variety of spatial sounds including pure tones, noise, naturalistic sounds, and speech-like sounds (e.g., foreign languages which none of the users were familiar with). Due to the small number of subjects and trials and to the varied stimuli, no rigorous conclusions can be drawn at this point. However, a number of interesting patterns have emerged; a few of these are illustrated here.

Figure 2: A "sweep" pattern is shown for a subject localizing a sonicly complicated stimuli presented as a point source sound. The "cross" pattern (see Figure 1) has emerged across all types of stimuli, but appears to only be used by certain subjects. It appears to be a strategy for quickly scanning a region of acoustic space. The "sweep" pattern (see Figure 2) appears to occur predominantly during the localization of more complicated sounds (e.g., naturalistic and speech-like). Variations and combinations of these two patterns were observed. Finally, an expert listener, one having a great deal of experience listening to virtual acoustic displays, uses a quick, direct movement (see Figure 3).

Figure 3: This pattern was generated by an expert listener; note that the subject localized the sound within three seconds.

DISCUSSION

These limited data provide interesting insights into auditory attention and localization strategies within virtual acoustic environments. The similarities to eye tracking seem striking and will likely drive further research. The ear tracking technique should serve as a method to analyze these sonic environments, as well as, a vehicle for the development of adaptive virtual acoustic displays.

ACKNOWLEDGEMENTS

The authors wish to thank Brian Karr for his contributions, specifically programming support, to this research

References

1. Buxton, W., Gaver, W., & Bly, S. The Use of Non-speech Audio at the Interface. (Tutorial Number 10). Presented at CHI `89, ACM Conference on Human Factors in Computing Systems. ACM Press, New York, NY, 1989.

2. Chamberlin, W. The Round Earth on Flat Paper: Map Projections Used by Cartographers. National Geographic Society, Washington, DC, 1947.

3. Gehring, B. Focal Point 3D Sound User's Manual. Gehring Research Corporation, Toronto, Ontario, Canada, 1990.

4. Sokolov, E.N. Neuronal Models and the Orienting Reflex. In M.A. Brazier (Ed.). The Central Nervous System and Behavior. Josiah Macy, Jr. Foundation, New York, NY, 1960.

5. Thurlow, W.R., Mangels, J.W., & Runge, P.S. Head Movements during Sound Localization. Journal of the Acoustical Society of America, 42, 489-493.

6. Wenzel, E.M. Localization in Virtual Acoustic Displays. Presence, 1, 1 (Winter 1992), 80-107.

7. Yarbus, A. Eye Movements and Vision. Plenum Press, New York, NY, 1967.