



Ken Fishkin, Maureen C. Stone
Xerox PARC, 3333 Coyote Hill Rd., Palo Alto CA 94304
E-mail: {fishkin, stone}@parc.xerox.com
In this paper, we present a new direct manipulation technique for exploring a database
displayed as a two dimensional set of
points. With it, users can incrementally construct full boolean queries by layering
queries encoded as Magic Lens(tm) filters [5,6,17]. Our work builds directly on two
techniques for information visualization that were presented at CHI '94. The first,
the starfield display [2], supports interactive filtering and zooming on scatterplot
displays. The second, the movable filter [17], supports multiple simultaneous visual
transformations and queries on underlying data. In this paper we combine the two
techniques, enhancing the starfield display by augmenting it with the flexibility
and functionality of the movable filter. The advantages include: a direct manipulation
mechanism for creating general boolean queries, multiple simultaneous views, and
a uniform mechanism for providing alternate views of the data. We also demonstrate
how our query mechanism was extended to support general, real-valued queries on databases
containing missing fields.
After describing related work, we describe using magic lens filters to generate boolean
queries over graphically displayed data. We then discuss how they provide a uniform
mechanism for generating multiple views of the data. We then extend our model to
support real-valued queries over incomplete data, and present our conclusions and
plans for future work. Concrete examples taken from an application for exploring
US Census data are used throughout the paper.
Dynamic queries [1] apply direct manipulation techniques to the problem of constructing
database queries. A selector (e.g. a slider) or set of selectors is used to control
the range of values required of a particular attribute or set of attributes. When
combined with a graphical representation of the database such as a scatterplot, users
can rapidly explore different subsets of the data by manipulating the selectors [2].
However, the number of attributes that can be controlled is limited by the number
of selectors that can be easily applied to the data. The effect of combining slider
filters is strictly conjunctive; disjunctive queries may only be performed by performing
each sub-query sequentially. The effects of the selectors are global; there is
no way to limit the scope to only a portion of the data except by zooming in on it.
Finally, the number of selectors, and hence the number of possible queries, is fixed
in advance.
The Aggregate Manipulator [11] as well as XSoft's Visual Recall(tm) [20] allow somewhat
more powerful (but non-interactive) queries, supporting disjunctive and limited compound
composition via textual, hierarchical, menu-driven interfaces. The conceptual prototype
of Egenhofer [10] is a hybrid technique. Query operands are typed in using a database
language, but those operands are visually composed, with modes that support disjunctive,
conjunctive, and subtractive composition.
Magic lens filters [6,17] are a user interface tool that combine an arbitrarily-shaped
region with an operator that changes the view of objects viewed through that region.
The operator can be quite general, accessing the underlying data structures of the
application and reformatting the data to generate a modified view. The filters are
spatially bounded, may be parameterized, and are interactively positioned over on-screen
applications. Filters may overlap, in which case they compose their effects in the
overlap region.
Magic lens filters provide a number of advantages desirable for data visualization.
Since they are spatially bounded, they can perform their operation within a restricted
focus while maintaining global context. Since they can overlap, compositional semantics
can be defined and controlled.
To create boolean queries by composing magic lens filters, we need to provide a way
to specify how the filters are combined. To provide full boolean functionality,
we need to provide a mechanism for the AND, OR, NOT and grouping or parenthesizing
operations.
We define for each lens a filtering function and a composition mode that describes
how the result of the filtering function is combined with the output of lenses underneath.
More formally, a lens L=(F, M), where F is a filter and M is a boolean operator.
The filter, F, describes the output calculation for the filter on some datum. The
mode, M, describes how that output is combined with the output from lower filters.
For example, given L1=(F1, OR) and L2=(F2, AND), the result of positioning L1 over
L2 is (F1 OR F2). Conversely, the effect of positioning L2 over L1 is (F2 AND F1).
We implement the composition mode as a button on the lens, making it easy to change
the mode as needed.
The NOT operation can be encoded as a lens whose filter inverts the sense of the data
coming in. Using the formalism of the previous paragraph, an inverting lens N =
(NULL, NOT). That is, N applies a NOT to the output of lower filters, and has no
intrinsic filter function. For example, consider the query (F1 OR NOT F2), where
F1 and F2 filter for various attributes. To implement this query, the user would
lay down filter F2, then the NOT filter N, then filter F1 with its composition mode
set to OR.
To incorporate grouping, we need a mechanism for encapsulating the expression defined
by a stack of lenses. To do this, we provide an operation that replaces a stack
of lenses with a single compound lens that is semantically equivalent. The user
creates such a compound lens by selecting a point on a stack of lenses through a
click-through button, a partially transparent button on an overlaying lens [5].
All lenses beneath this point are combined to create a compound lens that generates
the same query as the stack. The resulting lens also has a composition mode that
defines how it is combined with other lenses. This new compound lens can be manipulated
by the user just as any other lens, providing a simple way to encapsulate a complex
query into a single conceptual unit.
To create the query (F1 AND F2) OR (F3 AND F4), for example, we create compound lenses
for the values in parentheses: C1=(F1 AND F2) and C2=(F3 AND F4). By giving lens
C1 a composition mode of OR, we can create the desired expression by positioning
C1 over C2. Compound lenses may contain other compound lenses, allowing queries
to grow to arbitrary complexity.
Since we can incorporate AND, OR, NOT, and grouping, we can represent any boolean
query. Complex queries can be incrementally built up by direct manipulation of the
lenses and their modes. Useful queries can be saved as compound lenses. The resulting
lenses can then be used to build up more complex queries in a completely uniform
manner. While we have implemented only three common boolean composition modes, this
model supports any of the boolean operators.
In this implementation, each lens is implemented as an X window [16]. A lens manager
server extends the X window system to support magic lens functionality. Therefore,
the lenses can be manipulated using the regular window manager interface. Lenses
can display their output using the X graphics library, or, upon request, they can
display it in PostScript(R) [3] form. Using this facility, the application can generate
PostScript pictures describing its screen appearance. We used such generated PostScript
for all the black and white figures in this paper. Choosing this form, instead of
a screen snapshot, allows us to customize the presentation to fit the space and color
limitations of the proceedings. The figures in the color plates are screen snapshots
that show the screen appearance of the running application.
Color plate 1 shows a typical lens filter in action. Each city is displayed as a
blue-rimmed white box at a point proportional to its latitude and longitude. The
data attribute associated with the lens filter covering the center of the country,
"1991 crime index," is displayed in the window header. Below that on the left is
a slider used to control the threshold value for the query, and a label showing the
current slider value (12158.5 in this example). The buttons to the right of the
slider control whether the user is screening for data less than ( < ) or greater
than ( > ) the slider value. Cities shown in red, rather than white, pass the filter.
For example, in this case, cities in the center of the country with a crime rate
greater than 12158.5 are shown in red. Buttons along the right edge of the application
are used to spawn new lenses (13 different types are presently implemented), and
for application housekeeping.
Figure 1 shows the effects of applying different composition modes to the composition
of two filters. The composition mode for a lens is defined by a group of four radio
buttons. The buttons labeled AND and OR set the composition mode accordingly. The
SELF button causes the lens to display only the effect of its own filter, ignoring
other lenses, and the NOP button sets the filtering function to NULL, disabling the
effect of the lens. These two modes are useful when interpreting complex queries.
In figure 1(a) we look for cities which have high annual salaries and low taxes.
We can make the query less demanding by using the OR button to change the composition
mode (figure 1(b)), to see cities which have either high salaries or low taxes.
The SELF and NOP buttons can be used to examine these components separately, to
determine
which cities have high salaries and which have low taxes.
An alternative interface for setting the composition mode is to use a click-through
tool. In this interface, a single button on the lens indicates the current mode.
A click-through tool contains buttons that can be used to change the mode by clicking
through them to the lens below. The advantage of this interface is that it supports
a large set of composition modes without visual clutter The disadvantage is that
it requires more steps to change the mode. Our application supports both interfaces;
users can choose the interface they prefer. An example of a lens with a single mode
button is shown in color plate 2.
Figure 1(a) High salaries AND low taxes. Cities with partial data are shown in figure
1(b) but not in 1(a).
Figure 1(b) High salaries OR low taxes. Both conjunctive (AND) and disjunctive (OR)
queries are incorporated in our system.
Figure 2. Semantic filters can be augmented with visual filters. Here, a magnifying
lens and a call-out lens show clumped cities while maintaining context elsewhere.
For example, a scatterplot representation of the data will often have clumps, where
some set of data points map to nearby points on the display or even overlap. Figure
2 shows two examples of this clumping, and uses multiple visual filters to aid in
viewing the data. A small magnification lens is positioned over northern California,
letting us see that there are four cities clumped into the bay area. Over southern
California, a callout lens is placed. This lens displays the cities as a list, making
it easy to separate them. The rest of the map is displayed in the usual manner.
This allows easy identification of cities in the dense region while maintaining
global context. The boxes next to the city names on the callout are active, so filters
placed over them act exactly as if they were placed over the original map. Cities
listed without boxes are missing the data selected by the filter, "Median home price."
In the star-field paradigm, a user manipulates a set of selectors, and observes their
filtering effect on the data. By associating each selector with a single movable
filter, we gain the ability to pose multiple simultaneous queries, each with its
own set of parameters. For example, suppose we wish to determine which cities in
each region of the country have relatively low housing prices. We have data available
for the average housing price per city. However, the range of values for this attribute
is wide, and varies geographically. For example, houses on the west coast are typically
more expensive than houses in the midwest. Therefore, we need to filter on a higher
threshold on the west coast than in the midwest. Figure 3 shows two filters with
two different threshold values positioned over California and over Texas.
Figure 3. To find relatively high housing prices in California and Texas, two different
filters are positioned simultaneously.
For example, in figure 4(a) a boolean filter is screening for crime rate, and both
Dallas and Fort Worth are seen to have high crime rates. When we switch to real-valued
filters (figure 4(b)) we see that Dallas has a lower crime rate than Fort Worth.
The values displayed were computed as follows. As in the boolean case, the filter
slider value defines a threshold. In figure 4, for example, the threshold is 12838.
When screening for data greater than the threshold ( > ), data below the threshold
is assigned a score of 0, as is the case for Arlington in Figure 4. The city in
the database with the greatest value for the data (in the case of crime data, it
happens to be Atlanta, GA, with a crime index of 18953) is assigned a score of 1.0.
Cities with values between the threshold and the maximum are assigned a simple linear
score equal to their value divided by the maximum value. When screening for values
less than the threshold ( < ), values below the threshold are assigned non-zero scores
in a similar manner.
Computations can be performed on the output of real-valued filters. For example,
in figure 5 we have placed a sorting lens over a real-valued crime rate filter.
Only the cities under the sorting lens are sorted. We can see that in Florida, Jacksonville
has the lowest crime rate and Miami has the highest.
Figure 4(a) boolean query on crime rate for three cities in Texas.
Figure 4(b) Real-valued query on crime rate for the same cities. Extending our filters
from boolean-valued to real-valued allows distinctions to be maintained.
Real-valued filters require real-valued composition modes. The real-valued MIN and
MAX composition modes correspond to the boolean AND and OR as in fuzzy logic [21].
That is, MIN and MAX work the same on 0.0 and 1.0 as AND and OR do on 0 (False)
and 1 (True), but can also incorporate values in-between. A real-valued NOT filter
returns 1.0 minus its input value.
Just as boolean filters can be composed with an arbitrary boolean operation, not just
AND and OR, so too can real-valued filters be composed with an arbitrary numerical
operation, not just MIN and MAX. These operations can be statistical transformations
used to change the data distribution (sqrt, arcsin, etc.), fuzzy logic operators
used to tune the query ("very," "somewhat," "more or less," etc.), or mathematical
operators used for general manipulation (difference, log, etc.). Color
Figure 5. A sorting lens sorts cities by crime rate in Florida.
plate 2 shows some of these composition modes in use to visualize the correlation
between poverty and crime rate. The DIFF composition mode is an example of a mathematical
composition operator: it is defined as the absolute value of the difference between
its two operands. The filter above that, the "Very" filter, is an example of an
operator from fuzzy logic: Very(x) is defined as x squared [8]. By overlaying a
real-valued NOT filter above that, we ask where crime rate and poverty rate are NOT
VERY DIFFerent. The higher the score, the redder the city's box, the greater the
correlation. We can see that poverty and crime rate are highly correlated in most
cities.
Figure 6(a). A filter finds only one city (San Francisco) with a high score.
Figure 6(b). A missing data lens shows that attribute values are missing for many
cities. Cities with missing data are marked with an 'X'.
We visualize missing data by use of a special lens. Normally, any city whose value
as a result of a query is missing, or undefined, is simply not shown. The "Show
Missing Data" lens makes these cities visible. For example, figure 6(a) shows a
query, centered on the West Coast, filtering for cities with severe traffic delays.
Only San Francisco is shown to have such delays. By adding a "Show Missing Data"
lens, however, as shown in figure 6(b), we see that most of the other California
communities don't have data for this category-perhaps San Francisco is not as unattractive
as it first appeared.
We incorporated missing data into the semantics of composition by considering missing
data to be analogous to the IEEE concept of a non-signaling Not-A-Number (NaN), a
value that is defined to "afford retrospective diagnostic information inherited from
invalid or unavailable data and results" [12]. In accordance with common practice
[13], we then define:
Abstract
Traditional database query systems allow users to construct complicated database
queries from specialized database language primitives. While powerful and expressive,
such systems are not easy to use, especially for browsing or exploring the data.
Information visualization systems address this problem by providing graphical presentations
of the data and direct manipulation tools for exploring the data. Recent work has
reported the value of dynamic queries coupled with two-dimensional data representations
for progressive refinement of user queries. However, the queries generated by these
systems are limited to conjunctions of global ranges of parameter values. In this
paper, we extend dynamic queries by encoding each operand of the query as a Magic
Lens filter. Compound queries can be constructed by overlapping the lenses. Each
lens includes a slider and a set of buttons to control the value of the filter function
and to define the composition operation generated by overlapping the lenses. We
demonstrate a system that supports multiple, simultaneous, general, real-valued queries
on databases with incomplete data, while maintaining the simple visual interface
of dynamic query systems.
Keywords:
viewing filter, lens, database query, dynamic queries, magic lens, visualization
Introduction
Traditional database query systems require the user to construct a database query
from language primitives [19]. Such systems are powerful and expressive, but not
easy to use, especially when the user is unfamiliar with the database schema. Information
visualization systems [15] provide graphical display of database values and direct
manipulation tools for exploring relationships in the data. These systems have many
advantages over language-based systems, including a visual representation that provides
an intuitive feel for the scope of the data, immediate feedback, and incremental,
reversible interactions. There is a tension in such systems, however, between providing
expressive power and ease of use.
RELATED WORK
Scatterplots or thematic maps are well-established techniques for displaying data
as points in a 2-D field [4,18]. Such displays can encode large amounts of data
and can provide an intuitive way to visualize groups of related data items. Transformations
of the display produce patterns that are easy to interpret, even for high-dimensional
datasets [14].
BOOLEAN QUERIES BY COMPOSITION
Given some mechanism for displaying data in scatterplot format, we can use a set
of lenses to create dynamic queries on the data. Each lens acts as a filter that
screens on some attribute of the data. A slider on the filter controls a threshold
for numeric data. Buttons and other controls on the lens can control other functions.
When the lenses overlap, their operations are combined. This provides a clean model
for building up complex queries. This physical, rather than merely conceptual, composition
of multiple sliders appeals to existing user intuitions that spatial overlaps imply
a composition, as shown in Venn diagrams and color circles.
EXAMPLES
To demonstrate these ideas we created an application to browse a database of US census
data [7]. In this database, each row represents a city and the columns describe
the city along various census metrics: population, crime rate, property tax rate,
and so forth. We chose this database because it is publicly available, lends itself
to fairly interesting queries, and the data elements have an intuitive mapping to
the 2D plane, namely the physical location of the city on a map.
MULTIPLE VIEWS
Simultaneous multiple queries can be performed by positioning different lenses over
different parts of the display. Each lens, and hence each query, can be independently
customized by manipulating the controls on the lens. We can also use lenses to generate
alternate views of the data, such as magnifying lenses, callouts, sorted views, and
other such transformations. The user interface of these visual transformations is
identical to that of the semantic transformations (the filters).
EXTENSIONS
In the previous section, we discussed how movable filters can be used to perform
general boolean queries on scatterplot data. In this section, we discuss further
extending the power of our filters such that they support real-valued queries and
undefined data values.
Real-valued Queries
To support more powerful queries, we extended our system such that filters assign
a real-valued score on the range [0...1] to each datum. Data with a score of 0 fails
the filter entirely, data with a score of 1 succeeds entirely, and data with intermediate
scores partially satisfy the filter. This provides more information about the value
returned by the query. In general, the scoring function could be arbitrary-a filter
might prefer data with extreme values, or assign a gaussian falloff to data with
the highest score going to data of a certain value, or other similar metrics. Our
implementation currently supports only linear and step scoring functions, but this
is not due to any limitation imposed by the technique. We present the score visually
by showing each datum as a partially filled-in square; the higher the score, the
more of the square is filled in.
Missing Data
Databases in general, and the US census database in particular, do not always have
all data fully defined. In the case of the US census database, for example, some
cities may have population figures included but not crime data, or vice versa. A
robust information presentation system must address this problem to give an accurate
view of the data. There are two issues: how such data is visually presented and
how queries based on that data are performed.
Note that a city that is invisible because of missing data may re-appear when another
filter is laid upon it, if that filter's composition mode is MAX or OR.