Looking and Lingering as Conversational Cues in Video-Mediated Communication

Herbert L. Colston & Diane J. Schiano

Interval Research Corporation,
1801 Page Mill Road,
Palo Alto, CA 94304
(415) 424-0722
colston@interval.com, schiano@interval.com

Abstract

A study is described in which observers rated the difficulty people had in solving problems, based either upon simply how long the person looked at each problem, or also how long his or her gaze lingered on it after being instructed to move on. Initial results show a linear relationship between gaze duration and rated difficulty, with lingering as an added significant factor. These findings are discussed in terms of the role(s) gaze cues play in tracking understanding in conversations, with implications for the design of video- mediated communication (VMC) systems.

GAZE AND VIDEO-MEDIATED COMMUNICATION

Video-mediated communication (VMC) between people in remote locations is rapidly becoming a practical reality. Yet the psychological phenomena underlying video communication are not well understood. Researchers have long predicted that adding video's high bandwidth to an existing audio channel would provide more social cues, and give a greater sense of social presence (6, 7), than is possible with audio alone. However, such effects tend to be highly situation-specific (9), and have been notoriously difficult to validate with task performance or productivity measures (1).

Another approach takes a psycholinguistic perspective on VMC, testing specific hypotheses regarding visual contributions to mediated conversations. For example, Isaacs and Tang (3, 8) have argued that adding video--even at fairly low frame rates--changes the nature of remote audio interactions by providing visual feedback, helping to manage pauses, supporting iconic gestures, and allowing gaze co-ordination. O'Conaill and colleagues (5) report related research in which they conclude that directional gaze information appears necessary to promote seamless conversational turn-taking.

In this paper, we focus on video gaze cues and discuss some experimental work-in-progress on the cognitive implications of observed gaze duration and timing. With Ishii and colleagues (4), we believe that the ability to easily track where one's partner is looking can strongly influence the course of collaborative interactions. Studies of gaze patterns in face-to-face conversations (6) are consistent with the view that gaze cues are used--at least in part--to track comprehension, and thus to assess confirmation of conversational contributions. Previous research by Clark and colleagues has demonstrated the necessity of such cognitive tracking processes (2). It is not enough for a speaker to merely utter a verbal message; he or she must also ensure that the listener has understood what was said as it was intended. While gaze cues alone can support this function with great efficiency and precision, non-visual media such as the telephone require more cumbersome conventions or explicit modes of speech whenever comprehension is in doubt (3, 5, 6). Despite such precautionary measures, the risk of major misunderstanding, especially of complex ideas, can often remain quite large.

The present study is part of a larger research endeavour to systematically investigate the role(s) temporal gaze cues play in VMC. We examine inferences observers make simply from the length of time someone spends looking at an unknown problem. We predict that--in lieu of other information--the total time spent looking at the problem will be used to infer the degree of difficulty that person is having in trying to solve it. Moreover, we expect that subtle timing cues, such as a tendency to let one's gaze linger on the problem even after being instructed to move on, may add to perceived difficulty. Such effects could have important implications for the design of VMC systems, especially in choosing video sampling and compression techniques to best support natural conversational processes.

EXPERIMENT

RESULTS

All reported results were obtained from analyses of variance (ANOVAs), significant at least at the .05 level (most were at the .001 level). Results are given in Figure 1.

Figure 1. Effect of Gaze Duration on Rated Difficulty

Look Duration:

The results for the "look only" condition are straightforward. In observing someone trying to solve an unknown problem, the longer the total look duration, the greater the perceived difficulty. Five-second looks were rated as more difficult (mean rating=4.2) than 3-second looks (3.1), which were rated as more difficult than 1- second ones (1.8). All effects were highly significant. An intriguing interaction between look duration and gender was also found.

Linger Duration: Initial Findings

Our initial findings for the "look + linger" condition are quite suggestive. Overall, the combination of looking and lingering (4.4) was rated as significantly more difficult than only looking (4.2) for the equivalent length of time. Further analyses suggest that short lingers (with long initial looks) and long lingers (with short initial looks) are rated as more difficult than moderate lingers (with moderate initial looks). Since looking and lingering were deliberately confounded in this condition to permit comparisons between linger durations at a constant (5 sec) total gaze duration, these findings must be viewed with caution until the data from further comparison conditions are assessed.

DISCUSSION

These results suggest some important yet subtle ways in which gaze cues can influence VMC. The amount of time spent looking at an unknown problem is inferred to suggest the level of difficulty involved solving that problem, and this inference is highly sensitive to timing parameters. Lingering one's gaze on a problem beyond an instruction to move on is interpreted as an indication of added difficulty. Differences as small as 1.5 second between a verbal instruction to proceed to the next problem and a visual indication of compliance to that command can significantly affect perceived difficulty.

Conversation is a collaborative activity in which subtle visual timing and gestural cues must be interpreted in conjunction with speech acts; cues involving the face and eyes are especially salient (6). In designing VMC systems, decisions are frequently made to compress or otherwise manipulate the video stream, often to "keep up with the audio" (3, 5). Our findings suggest that these decisions may influence the nature and course of the mediated interaction, since subtle gaze cues that suggest cognitive difficulty may be introduced or eliminated unwittingly. Given limited resources, decisions regarding video lag, sampling frame rate and the choice of compression algorithm must be approached with caution to avoid introducing timing biases with cognitive and communicative implications. The leverage gained by choosing to accurately represent visual dynamics may well be worth some loss in resolution or completeness of the video image.

References

1. Chapanis, A., Ochsman, R., Parrish, R. & Weeks, G. (1972). Studies in interactive communication: I. The effects of four communication modes on the behavior of teams during co-operative problem solving. Human Factors, 14, 487-509.
2. Clark, H.H. & Shaeffer, E.F. (1989). Contributing to Discourse. Cognitive Science, 13, 259-294.
3. Isaacs, E. & Tang, J. (1993) What video can and can't do for collaboration: A case study. In Proceedings of the ACM Multimedia 93 Conference. Anaheim, CA.
4. Ishii, Hiroshi & Kobayashi, Minoru (1993). ClearBoard: A Seamless Medium for Shared Drawing and Conversation with Eye Contact. In Ronald Baeker (Ed.), Groupware and Computer-Supported Cooperative Work (pp 829-836). San Mateo, CA: Morgan Kaufman .
5. O'Connail, B., Whittaker, S. & Wilbur, S. (1993). Conversations over video conferences: An evaluation of the spoken aspects of video-mediated communication. In Human Computer Interaction, 8, 389-428.
6. Rutter, D.R. Communicating by Telephone. (1987). Oxford: Pergamon Press, Elmsford, NY.
7. Short, J., Williams, E. & Christie, B. (1976). The social psychology of telecommunications. London: Wiley.
8. Tang, J. & Isaacs, E. (1993). Why do users like video: Studies of multimedia-supported collaboration. In Computer Supported Cooperative Work, 1, 163-196.
9. Whittaker, S., Geelhoed, E., & Robinson, E. (1993). Shared workspaces: How do they work and when are they useful? International Journal of Man-Machines Studies, 39, 813-842.