CHI '95 ProceedingsTopIndexes
Short PapersTOC

Looking and Lingering as Conversational Cues in Video-Mediated Communication

Herbert L. Colston & Diane J. Schiano

Interval Research Corporation,
1801 Page Mill Road,
Palo Alto, CA 94304
(415) 424-0722
colston@interval.com, schiano@interval.com

© ACM

Abstract

A study is described in which observers rated the difficulty people had in solving problems, based either upon simply how long the person looked at each problem, or also how long his or her gaze lingered on it after being instructed to move on. Initial results show a linear relationship between gaze duration and rated difficulty, with lingering as an added significant factor. These findings are discussed in terms of the role(s) gaze cues play in tracking understanding in conversations, with implications for the design of video- mediated communication (VMC) systems.

GAZE AND VIDEO-MEDIATED COMMUNICATION

Video-mediated communication (VMC) between people in remote locations is rapidly becoming a practical reality. Yet the psychological phenomena underlying video communication are not well understood. Researchers have long predicted that adding video's high bandwidth to an existing audio channel would provide more social cues, and give a greater sense of social presence (6, 7), than is possible with audio alone. However, such effects tend to be highly situation-specific (9), and have been notoriously difficult to validate with task performance or productivity measures (1).

Another approach takes a psycholinguistic perspective on VMC, testing specific hypotheses regarding visual contributions to mediated conversations. For example, Isaacs and Tang (3, 8) have argued that adding video--even at fairly low frame rates--changes the nature of remote audio interactions by providing visual feedback, helping to manage pauses, supporting iconic gestures, and allowing gaze co-ordination. O'Conaill and colleagues (5) report related research in which they conclude that directional gaze information appears necessary to promote seamless conversational turn-taking.

In this paper, we focus on video gaze cues and discuss some experimental work-in-progress on the cognitive implications of observed gaze duration and timing. With Ishii and colleagues (4), we believe that the ability to easily track where one's partner is looking can strongly influence the course of collaborative interactions. Studies of gaze patterns in face-to-face conversations (6) are consistent with the view that gaze cues are used--at least in part--to track comprehension, and thus to assess confirmation of conversational contributions. Previous research by Clark and colleagues has demonstrated the necessity of such cognitive tracking processes (2). It is not enough for a speaker to merely utter a verbal message; he or she must also ensure that the listener has understood what was said as it was intended. While gaze cues alone can support this function with great efficiency and precision, non-visual media such as the telephone require more cumbersome conventions or explicit modes of speech whenever comprehension is in doubt (3, 5, 6). Despite such precautionary measures, the risk of major misunderstanding, especially of complex ideas, can often remain quite large.

The present study is part of a larger research endeavour to systematically investigate the role(s) temporal gaze cues play in VMC. We examine inferences observers make simply from the length of time someone spends looking at an unknown problem. We predict that--in lieu of other information--the total time spent looking at the problem will be used to infer the degree of difficulty that person is having in trying to solve it. Moreover, we expect that subtle timing cues, such as a tendency to let one's gaze linger on the problem even after being instructed to move on, may add to perceived difficulty. Such effects could have important implications for the design of VMC systems, especially in choosing video sampling and compression techniques to best support natural conversational processes.

EXPERIMENT

Subjects

20 males & 20 females from Stanford University, aged 17- 45, participated in this research for pay.

Materials

A videotape was made of 8 individual actor confederates who turned over and looked (for varying lengths of time) at each of 3 cards, displayed on a table before him/her. Subjects viewing this videotape were told that the cards contained a word translation problem that the actor was asked to solve. Each actor was recorded in two clips, one in a "look only" condition, and one in a "look + linger" condition, in alternation. In "look only" clips, the actor simply looked at each card for 1, 3 or 5 seconds. In the "look + linger" clips, total gaze duration was always 5 seconds. However, while the actor was still viewing the card, the experimenter's voice was heard instructing the actor either to move on to the "next" card, or to "stop". Gaze linger was defined as the time the actor spent looking at the card after being instructed to proceed. Linger durations were 1, 2.5 or 4 seconds (after initial views of 4, 2.5 and 1 sec, respectively). The order of look and linger durations was counterbalanced across subjects.

Procedure

Subjects were seated at a video display unit and given task instructions. They were told that the videotape showed participants in a previous study solving word translation problems. Subjects viewed the videotape and rated the degree of difficulty each actor had in solving the word problem on each card, on a 7-point scale (from (1) "no difficulty at all" to (7) "a lot of difficulty"). After initial practice trials, a total of 48 ratings (8 actors x 3 card problems x 2 conditions, "look only" or "look + linger") were obtained. All subjects were debriefed upon completing the task.

RESULTS

All reported results were obtained from analyses of variance (ANOVAs), significant at least at the .05 level (most were at the .001 level). Results are given in Figure 1.


Figure 1. Effect of Gaze Duration on Rated Difficulty

Look Duration:

The results for the "look only" condition are straightforward. In observing someone trying to solve an unknown problem, the longer the total look duration, the greater the perceived difficulty. Five-second looks were rated as more difficult (mean rating=4.2) than 3-second looks (3.1), which were rated as more difficult than 1- second ones (1.8). All effects were highly significant. An intriguing interaction between look duration and gender was also found.

Linger Duration: Initial Findings

Our initial findings for the "look + linger" condition are quite suggestive. Overall, the combination of looking and lingering (4.4) was rated as significantly more difficult than only looking (4.2) for the equivalent length of time. Further analyses suggest that short lingers (with long initial looks) and long lingers (with short initial looks) are rated as more difficult than moderate lingers (with moderate initial looks). Since looking and lingering were deliberately confounded in this condition to permit comparisons between linger durations at a constant (5 sec) total gaze duration, these findings must be viewed with caution until the data from further comparison conditions are assessed.

DISCUSSION

These results suggest some important yet subtle ways in which gaze cues can influence VMC. The amount of time spent looking at an unknown problem is inferred to suggest the level of difficulty involved solving that problem, and this inference is highly sensitive to timing parameters. Lingering one's gaze on a problem beyond an instruction to move on is interpreted as an indication of added difficulty. Differences as small as 1.5 second between a verbal instruction to proceed to the next problem and a visual indication of compliance to that command can significantly affect perceived difficulty.

Conversation is a collaborative activity in which subtle visual timing and gestural cues must be interpreted in conjunction with speech acts; cues involving the face and eyes are especially salient (6). In designing VMC systems, decisions are frequently made to compress or otherwise manipulate the video stream, often to "keep up with the audio" (3, 5). Our findings suggest that these decisions may influence the nature and course of the mediated interaction, since subtle gaze cues that suggest cognitive difficulty may be introduced or eliminated unwittingly. Given limited resources, decisions regarding video lag, sampling frame rate and the choice of compression algorithm must be approached with caution to avoid introducing timing biases with cognitive and communicative implications. The leverage gained by choosing to accurately represent visual dynamics may well be worth some loss in resolution or completeness of the video image.

References

1. Chapanis, A., Ochsman, R., Parrish, R. & Weeks, G. (1972). Studies in interactive communication: I. The effects of four communication modes on the behavior of teams during co-operative problem solving. Human Factors, 14, 487-509.
2. Clark, H.H. & Shaeffer, E.F. (1989). Contributing to Discourse. Cognitive Science, 13, 259-294.
3. Isaacs, E. & Tang, J. (1993) What video can and can't do for collaboration: A case study. In Proceedings of the ACM Multimedia 93 Conference. Anaheim, CA.
4. Ishii, Hiroshi & Kobayashi, Minoru (1993). ClearBoard: A Seamless Medium for Shared Drawing and Conversation with Eye Contact. In Ronald Baeker (Ed.), Groupware and Computer-Supported Cooperative Work (pp 829-836). San Mateo, CA: Morgan Kaufman .
5. O'Connail, B., Whittaker, S. & Wilbur, S. (1993). Conversations over video conferences: An evaluation of the spoken aspects of video-mediated communication. In Human Computer Interaction, 8, 389-428.
6. Rutter, D.R. Communicating by Telephone. (1987). Oxford: Pergamon Press, Elmsford, NY.
7. Short, J., Williams, E. & Christie, B. (1976). The social psychology of telecommunications. London: Wiley.
8. Tang, J. & Isaacs, E. (1993). Why do users like video: Studies of multimedia-supported collaboration. In Computer Supported Cooperative Work, 1, 163-196.
9. Whittaker, S., Geelhoed, E., & Robinson, E. (1993). Shared workspaces: How do they work and when are they useful? International Journal of Man-Machines Studies, 39, 813-842.