Abstract
A study is described in which observers rated the difficulty
people had in solving problems, based either upon simply
how long the person looked at each problem, or also how
long his or her gaze lingered on it after being instructed to
move on. Initial results show a linear relationship between
gaze duration and rated difficulty, with lingering as an
added significant factor. These findings are discussed in
terms of the role(s) gaze cues play in tracking understanding
in conversations, with implications for the design of video-
mediated communication (VMC) systems.
GAZE AND VIDEO-MEDIATED COMMUNICATION
Video-mediated communication (VMC) between people in
remote locations is rapidly becoming a practical reality. Yet
the psychological phenomena underlying video
communication are not well understood. Researchers have
long predicted that adding video's high bandwidth to an
existing audio channel would provide more social cues, and
give a greater sense of social presence (6, 7), than is
possible with audio alone. However, such effects tend to be
highly situation-specific (9), and have been notoriously
difficult to validate with task performance or productivity
measures (1).
Another approach takes a psycholinguistic perspective on
VMC, testing specific hypotheses regarding visual
contributions to mediated conversations. For example,
Isaacs and Tang (3, 8) have argued that adding video--even
at fairly low frame rates--changes the nature of remote
audio interactions by providing visual feedback, helping to
manage pauses, supporting iconic gestures, and allowing
gaze co-ordination. O'Conaill and colleagues (5) report
related research in which they conclude that directional gaze
information appears necessary to promote seamless
conversational turn-taking.
In this paper, we focus on video gaze cues and discuss some
experimental work-in-progress on the cognitive
implications of observed gaze duration and timing. With
Ishii and colleagues (4), we believe that the ability to easily
track where one's partner is looking can strongly influence
the course of collaborative interactions. Studies of gaze
patterns in face-to-face conversations (6) are consistent with
the view that gaze cues are used--at least in part--to track
comprehension, and thus to assess confirmation of
conversational contributions. Previous research by Clark
and colleagues has demonstrated the necessity of such
cognitive tracking processes (2). It is not enough for a
speaker to merely utter a verbal message; he or she must
also ensure that the listener has understood what was said as
it was intended. While gaze cues alone can support this
function with great efficiency and precision, non-visual
media such as the telephone require more cumbersome
conventions or explicit modes of speech whenever
comprehension is in doubt (3, 5, 6). Despite such
precautionary measures, the risk of major misunderstanding,
especially of complex ideas, can often remain quite large.
The present study is part of a larger research endeavour to
systematically investigate the role(s) temporal gaze cues
play in VMC. We examine inferences observers make
simply from the length of time someone spends looking at
an unknown problem. We predict that--in lieu of other
information--the total time spent looking at the problem will
be used to infer the degree of difficulty that person is having
in trying to solve it. Moreover, we expect that subtle timing
cues, such as a tendency to let one's gaze linger on the
problem even after being instructed to move on, may add to
perceived difficulty. Such effects could have important
implications for the design of VMC systems, especially in
choosing video sampling and compression techniques to
best support natural conversational processes.
EXPERIMENT
Subjects
20 males & 20 females from Stanford University, aged 17-
45, participated in this research for pay.
Materials
A videotape was made of 8 individual actor confederates
who turned over and looked (for varying lengths of time) at
each of 3 cards, displayed on a table before him/her.
Subjects viewing this videotape were told that the cards
contained a word translation problem that the actor was
asked to solve. Each actor was recorded in two clips, one in
a "look only" condition, and one in a "look + linger"
condition, in alternation. In "look only" clips, the actor
simply looked at each card for 1, 3 or 5 seconds. In the
"look + linger" clips, total gaze duration was always
5 seconds. However, while the actor was still viewing
the card, the experimenter's voice was heard instructing the
actor either to move on to the "next" card, or to "stop".
Gaze linger was defined as the time the actor spent looking
at the card after being instructed to proceed.
Linger durations were 1, 2.5 or 4 seconds (after initial views
of 4, 2.5 and 1 sec, respectively). The order of look and
linger durations was counterbalanced across subjects.
Procedure
Subjects were seated at a video display unit and given task
instructions. They were told that the videotape showed
participants in a previous study solving word translation
problems. Subjects viewed the videotape and rated the
degree of difficulty each actor had in solving the word
problem on each card, on a 7-point scale (from (1) "no
difficulty at all" to (7) "a lot of difficulty"). After initial
practice trials, a total of 48 ratings (8 actors x 3 card
problems x 2 conditions, "look only" or "look + linger")
were obtained. All subjects were debriefed upon
completing the task.
RESULTS
All reported results were obtained from analyses of variance
(ANOVAs), significant at least at the .05 level (most were at
the .001 level). Results are given in Figure 1.
Figure 1. Effect of Gaze Duration on Rated Difficulty
Look Duration:
The results for the "look only" condition are
straightforward. In observing someone trying to solve an
unknown problem, the longer the total look duration, the
greater the perceived difficulty. Five-second looks were
rated as more difficult (mean rating=4.2) than 3-second
looks (3.1), which were rated as more difficult than 1-
second ones (1.8). All effects were highly significant. An
intriguing interaction between look duration and gender was
also found.
Linger Duration: Initial Findings
Our initial findings for the "look + linger" condition are
quite suggestive. Overall, the combination of looking and
lingering (4.4) was rated as significantly more difficult than
only looking (4.2) for the equivalent length of time. Further
analyses suggest that short lingers (with long initial looks)
and long lingers (with short initial looks) are rated as more
difficult than moderate lingers (with moderate initial looks).
Since looking and lingering were deliberately confounded
in this condition to permit comparisons between linger
durations at a constant (5 sec) total gaze duration, these
findings must be viewed with caution until the data from
further comparison conditions are assessed.
DISCUSSION
These results suggest some important yet subtle ways in
which gaze cues can influence VMC. The amount of time
spent looking at an unknown problem is inferred to suggest
the level of difficulty involved solving that problem, and
this inference is highly sensitive to timing parameters.
Lingering one's gaze on a problem beyond an instruction to
move on is interpreted as an indication of added difficulty.
Differences as small as 1.5 second between a verbal
instruction to proceed to the next problem and a visual
indication of compliance to that command can significantly
affect perceived difficulty.
Conversation is a collaborative activity in which subtle
visual timing and gestural cues must be interpreted in
conjunction with speech acts; cues involving the face and
eyes are especially salient (6). In designing VMC systems,
decisions are frequently made to compress or otherwise
manipulate the video stream, often to "keep up with the
audio" (3, 5). Our findings suggest that these decisions may
influence the nature and course of the mediated interaction,
since subtle gaze cues that suggest cognitive difficulty may
be introduced or eliminated unwittingly. Given limited
resources, decisions regarding video lag, sampling frame
rate and the choice of compression algorithm must be
approached with caution to avoid introducing timing biases
with cognitive and communicative implications. The
leverage gained by choosing to accurately represent visual
dynamics may well be worth some loss in resolution or
completeness of the video image.
References
1. Chapanis, A., Ochsman, R., Parrish, R. & Weeks, G.
(1972). Studies in interactive communication: I. The effects
of four communication modes on the behavior of teams
during co-operative problem solving. Human
Factors, 14, 487-509.
2. Clark, H.H. & Shaeffer, E.F. (1989). Contributing to
Discourse. Cognitive Science, 13, 259-294.
3. Isaacs, E. & Tang, J. (1993) What video can and can't do
for collaboration: A case study. In Proceedings of the ACM
Multimedia 93 Conference. Anaheim, CA.
4. Ishii, Hiroshi & Kobayashi, Minoru (1993). ClearBoard:
A Seamless Medium for Shared Drawing and Conversation
with Eye Contact. In Ronald Baeker (Ed.), Groupware and
Computer-Supported Cooperative Work (pp 829-836). San
Mateo, CA: Morgan Kaufman .
5. O'Connail, B., Whittaker, S. & Wilbur, S. (1993).
Conversations over video conferences: An evaluation of the
spoken aspects of video-mediated communication. In
Human Computer Interaction, 8, 389-428.
6. Rutter, D.R. Communicating by Telephone. (1987).
Oxford: Pergamon Press, Elmsford, NY.
7. Short, J., Williams, E. & Christie, B. (1976). The social
psychology of telecommunications. London: Wiley.
8. Tang, J. & Isaacs, E. (1993). Why do users like video:
Studies of multimedia-supported collaboration. In
Computer Supported Cooperative Work, 1, 163-196.
9. Whittaker, S., Geelhoed, E., & Robinson, E. (1993).
Shared workspaces: How do they work and when are they
useful? International Journal of Man-Machines Studies,
39, 813-842.