



Scribbler: A Tool for Searching Digital Ink
Alex Poon, Karon Weber, and Todd Cass
Xerox Palo Alto Research Center
3333 Coyote Hill Road, Palo Alto, California 94304
poon@parc.xerox.com (415) 812-4725
© ACM
Abstract
Scribbler is a tool that enables users to search untranslated
digital ink for target patterns such as words, symbols and
simple sketches. By matching the raw stroke data instead of
performing traditional handwriting recognition, Scribbler
allows users to write quickly and naturally without being
constrained to a particular writing style or a limited set of
dictionary terms. This paper gives a brief description of the
current implementation of Scribbler and discusses the
results of a controlled experiment run to evaluate the
matching engine's effectiveness.
Keywords:
pen-based input, digital ink, information
retrieval, handwriting recognition, handwriting matching.
Introduction
Pen-based computer systems are emerging as a new class of
user interfaces for small, portable computers without
keyboards. An impressive array of tools is available to
support pen-based data input; however, searching through a
digital ink corpus has required the translation of users'
handwriting into ASCII text [1], a task commonly referred
to as handwriting recognition. We refer to digital ink as the
digital representation of the path of the pen across the
writing surface. Because handwriting recognition is
inaccurate [4], some systems attempt to improve
recognition by requiring users to write in small gridded
areas, to use only printed characters, to use an unfamiliar
alphabet [3], or to restrict input to a pre-defined dictionary
of words [2]. All such attempts compromise the freedom,
speed, and the feeling of naturalness of the writer.
We have developed a search tool called Scribbler that does
not require the digital ink to be translated into ASCII text,
but operates instead on the raw stroke data. By analyzing
the untranslated digital ink signal itself, the application can
support searching cursive and printed handwriting, symbols,
foreign characters, and even simple sketches. Allowing
users to search their handwritten notes enables such tasks
as search and replace, document summarization, and
automatic keyword application [5] on handwritten
documents - tasks more commonly associated with text-
based documents. Unlike most pen-based systems,
Scribbler allows users to write naturally and rapidly without
being burdened by the
unreliability and slowness of handwriting recognition. In
this paper, we describe the current implementation of
Scribbler, discuss the trial run we conducted to show proof
of concept, present the results of the experiment, and point
to future work.
SYSTEM DESCRIPTION
The current implementation of Scribbler runs on a Apple
Macintosh computer equipped with a Wacom integrated
tablet. While it is possible to use the algorithm on any pen-
based system that stores handwritten digital ink as a data
form, we have embedded the tool into Marquee [5], a note-
taking system for real-time video logging.
FIGURE 1.
A search for the word "like" in a Marquee log.
Users can take notes consisting of words, symbols or
sketches on the scrolling window located on the right hand
side of the screen. To conduct a search, users first specify a
target pattern to match against material contained within the
document. Selection is accomplished by either 1) circling
existing digital ink in the corpus, or 2) drawing new ink in
the left hand column and circling it there. For example, in
FIGURE 1, the user has circled the word "like" in the
corpus, selecting it as the target. The results of the search
are displayed by showing boxes around the matched
patterns in the note-taking area. Notice that the word "here"
in the document was also boxed, constituting a false
positive. Users can reduce the number of erroneous matches
by adjusting the threshold control panel at the bottom left
hand corner. For example, by decreasing the threshold
setting, user can eliminate the match on the word "here."
However, this might also remove some of the correct
matches.
MATCHING ALGORITHM
We developed and tested multiple matching algorithms
differing in the tradeoffs they make regarding their
sensitivities to writing speed, spacing, scale, and rotation.
For each algorithm, Scribbler represents the ink as a
sequence of strokes, where each stroke consist of an ordered
sequence of (x,y) coordinates from pen up to pen down.
The process of matching is divided into three distinct
stages: pre-processing, grouping and matching.
Our most accurate algorithm currently works by first pre-
processing each stroke, resampling the data such that it is
composed of equally spaced (x,y) coordinates, thereby
discarding velocity data. It then divides the corpus into
stroke groups, where large breaks between
sequential strokes define group boundaries. Different
combinations of stroke groups are then presented to the
matcher as potential matches for testing.
Finally, each potential match is compared to the target
pattern using dynamic time warping techniques to compute
a measure of difference between the two patterns. If the
difference is less than the set threshold, then that potential
match is labeled as a match.
DATA COLLECTION
To test the accuracy of Scribbler's algorithm, we collected
handwritten data from six members of our research staff
who each completed three tasks. The tasks were structured
to gather a wide variety of writing samples, including short
words, long words, and symbols (FIGURE 2). The first
task had users copy text out of a Dr. Seuss children's book
in order to test Scribbler's ability to distinguish between
similar words like "house" and "mouse." Participants were
then given weather reports and asked to draw weather
symbols on sets of maps as a way for us to gain insights into
matching illustrations. Finally, the subjects copied text
from newspaper and magazine articles to generate standard
prose. Each user spent about forty minutes completing the
tasks.
FIGURE 2. Two of the tasks we used for data collection - Dr.
Seuss on the left and weather maps symbols on the right.
RESULTS AND OBSERVATIONS
We determined matching accuracy by generating a measure
of difference (score) between each word or symbol in the
document and each potential match created by the matcher,
where a score beneath a specific threshold signified a
match. We calculated the overall matching accuracy, A,
using A={TP-0.5(FP)}/P where TP is the number of true
positive matches, FP is the number of false positives, and P
is the correct total number of matches. Because different
threshold settings result in different values of A, we chose A
to be the maximum of the above formula for all possible
thresholds. Therefore, A ranges from 0 to 100%, with
100% being the accuracy of a perfect matcher using an
optimal threshold setting.
Using this optimized accuracy measure, we found an overall
matching accuracy of 75.2%, and a search speed of about
thirty seconds for a 200 word document using our non-
optimized, C++ version of Scribbler running on a 25Mhz
68040. We also noticed a large variance in accuracy figures
across different users and tasks, and plan to investigate
these results further.
FIGURE 3. Matching accuracy across different tasks and users.
In FIGURE 3, the accuracy figures represent situations
where thresholds are chosen optimally. If threshold settings
are chosen sub-optimally, accuracy most likely will suffer.
As optimal thresholds are likely to be inconsistent between
different users and different documents, we found that it is
important to have a threshold control panel that allows users
to adjust thresholds interactively after each search. Also, it
may be possible to perform dynamic threshold selection by
using features of the search target.
FUTURE WORK
Our initial investigation into using digital ink as the
predominant data form for pen-based systems has
demonstrated that it is possible to support users searching
handwritten data without relying on handwriting
recognition or imposing restrictions on users' writing styles.
Our next steps include continuing our experimentation with
Scribbler's matching accuracy and speed. We also plan to
examine the feasibility of supporting searches across
documents created by different users. Finally, we are
looking to embed Scribbler into other pen-based systems to
understand its application in a variety of settings.
References
1. Aha Software Corporation. InkWriter Quick-Reference
Guide, 1993.
2. Apple Computer, Inc. Newton MessagePad Handbook,
1993.
3. Goldberg, D., and Richardson, C. Touch-typing with a
Stylus, in Proc. INTERCHI '93 Human Factors in
Computing Systems, (Amsterdam, April 24-29, 1993),
ACM Press, pp. 482-487.
4. Tappert, C., Suen, C., Wakahara, T. The State of the Art
in On-line Handwriting Recognition. IEEE Transactions, in
Pattern Analysis and Machine Intelligence. Vol. 12, No. 8,
August 1990, pp. 787-808.
5. Weber, K., and Poon, A. Marquee: A Tool for Real-
Time Video Logging, in Proc. CHI '94 Human Factors in
Computing Systems, (Boston, Massachusetts, April 24-28),
ACM Press, pp. 58-64.