Enhanced Dynamic Queries via Movable Filters

Ken Fishkin, Maureen C. Stone

Xerox PARC, 3333 Coyote Hill Rd., Palo Alto CA 94304

E-mail: {fishkin, stone}@parc.xerox.com

Abstract

Traditional database query systems allow users to construct complicated database queries from specialized database language primitives. While powerful and expressive, such systems are not easy to use, especially for browsing or exploring the data. Information visualization systems address this problem by providing graphical presentations of the data and direct manipulation tools for exploring the data. Recent work has reported the value of dynamic queries coupled with two-dimensional data representations for progressive refinement of user queries. However, the queries generated by these systems are limited to conjunctions of global ranges of parameter values. In this paper, we extend dynamic queries by encoding each operand of the query as a Magic Lens filter. Compound queries can be constructed by overlapping the lenses. Each lens includes a slider and a set of buttons to control the value of the filter function and to define the composition operation generated by overlapping the lenses. We demonstrate a system that supports multiple, simultaneous, general, real-valued queries on databases with incomplete data, while maintaining the simple visual interface of dynamic query systems.

Keywords:

viewing filter, lens, database query, dynamic queries, magic lens, visualization

Introduction

Traditional database query systems require the user to construct a database query from language primitives [19]. Such systems are powerful and expressive, but not easy to use, especially when the user is unfamiliar with the database schema. Information visualization systems [15] provide graphical display of database values and direct manipulation tools for exploring relationships in the data. These systems have many advantages over language-based systems, including a visual representation that provides an intuitive feel for the scope of the data, immediate feedback, and incremental, reversible interactions. There is a tension in such systems, however, between providing expressive power and ease of use.

In this paper, we present a new direct manipulation technique for exploring a database displayed as a two dimensional set of points. With it, users can incrementally construct full boolean queries by layering queries encoded as Magic Lens(tm) filters [5,6,17]. Our work builds directly on two techniques for information visualization that were presented at CHI '94. The first, the starfield display [2], supports interactive filtering and zooming on scatterplot displays. The second, the movable filter [17], supports multiple simultaneous visual transformations and queries on underlying data. In this paper we combine the two techniques, enhancing the starfield display by augmenting it with the flexibility and functionality of the movable filter. The advantages include: a direct manipulation mechanism for creating general boolean queries, multiple simultaneous views, and a uniform mechanism for providing alternate views of the data. We also demonstrate how our query mechanism was extended to support general, real-valued queries on databases containing missing fields.

After describing related work, we describe using magic lens filters to generate boolean queries over graphically displayed data. We then discuss how they provide a uniform mechanism for generating multiple views of the data. We then extend our model to support real-valued queries over incomplete data, and present our conclusions and plans for future work. Concrete examples taken from an application for exploring US Census data are used throughout the paper.

Dynamic queries [1] apply direct manipulation techniques to the problem of constructing database queries. A selector (e.g. a slider) or set of selectors is used to control the range of values required of a particular attribute or set of attributes. When combined with a graphical representation of the database such as a scatterplot, users can rapidly explore different subsets of the data by manipulating the selectors [2]. However, the number of attributes that can be controlled is limited by the number of selectors that can be easily applied to the data. The effect of combining slider filters is strictly conjunctive; disjunctive queries may only be performed by performing each sub-query sequentially. The effects of the selectors are global; there is no way to limit the scope to only a portion of the data except by zooming in on it. Finally, the number of selectors, and hence the number of possible queries, is fixed in advance.

The Aggregate Manipulator [11] as well as XSoft's Visual Recall(tm) [20] allow somewhat more powerful (but non-interactive) queries, supporting disjunctive and limited compound composition via textual, hierarchical, menu-driven interfaces. The conceptual prototype of Egenhofer [10] is a hybrid technique. Query operands are typed in using a database language, but those operands are visually composed, with modes that support disjunctive, conjunctive, and subtractive composition.

Magic lens filters [6,17] are a user interface tool that combine an arbitrarily-shaped region with an operator that changes the view of objects viewed through that region. The operator can be quite general, accessing the underlying data structures of the application and reformatting the data to generate a modified view. The filters are spatially bounded, may be parameterized, and are interactively positioned over on-screen applications. Filters may overlap, in which case they compose their effects in the overlap region.

Magic lens filters provide a number of advantages desirable for data visualization. Since they are spatially bounded, they can perform their operation within a restricted focus while maintaining global context. Since they can overlap, compositional semantics can be defined and controlled.

BOOLEAN QUERIES BY COMPOSITION

Given some mechanism for displaying data in scatterplot format, we can use a set of lenses to create dynamic queries on the data. Each lens acts as a filter that screens on some attribute of the data. A slider on the filter controls a threshold for numeric data. Buttons and other controls on the lens can control other functions. When the lenses overlap, their operations are combined. This provides a clean model for building up complex queries. This physical, rather than merely conceptual, composition of multiple sliders appeals to existing user intuitions that spatial overlaps imply a composition, as shown in Venn diagrams and color circles.

To create boolean queries by composing magic lens filters, we need to provide a way to specify how the filters are combined. To provide full boolean functionality, we need to provide a mechanism for the AND, OR, NOT and grouping or parenthesizing operations.

We define for each lens a filtering function and a composition mode that describes how the result of the filtering function is combined with the output of lenses underneath. More formally, a lens L=(F, M), where F is a filter and M is a boolean operator. The filter, F, describes the output calculation for the filter on some datum. The mode, M, describes how that output is combined with the output from lower filters. For example, given L1=(F1, OR) and L2=(F2, AND), the result of positioning L1 over L2 is (F1 OR F2). Conversely, the effect of positioning L2 over L1 is (F2 AND F1). We implement the composition mode as a button on the lens, making it easy to change the mode as needed.

The NOT operation can be encoded as a lens whose filter inverts the sense of the data coming in. Using the formalism of the previous paragraph, an inverting lens N = (NULL, NOT). That is, N applies a NOT to the output of lower filters, and has no intrinsic filter function. For example, consider the query (F1 OR NOT F2), where F1 and F2 filter for various attributes. To implement this query, the user would lay down filter F2, then the NOT filter N, then filter F1 with its composition mode set to OR.

To incorporate grouping, we need a mechanism for encapsulating the expression defined by a stack of lenses. To do this, we provide an operation that replaces a stack of lenses with a single compound lens that is semantically equivalent. The user creates such a compound lens by selecting a point on a stack of lenses through a click-through button, a partially transparent button on an overlaying lens [5]. All lenses beneath this point are combined to create a compound lens that generates the same query as the stack. The resulting lens also has a composition mode that defines how it is combined with other lenses. This new compound lens can be manipulated by the user just as any other lens, providing a simple way to encapsulate a complex query into a single conceptual unit.

To create the query (F1 AND F2) OR (F3 AND F4), for example, we create compound lenses for the values in parentheses: C1=(F1 AND F2) and C2=(F3 AND F4). By giving lens C1 a composition mode of OR, we can create the desired expression by positioning C1 over C2. Compound lenses may contain other compound lenses, allowing queries to grow to arbitrary complexity.

Since we can incorporate AND, OR, NOT, and grouping, we can represent any boolean query. Complex queries can be incrementally built up by direct manipulation of the lenses and their modes. Useful queries can be saved as compound lenses. The resulting lenses can then be used to build up more complex queries in a completely uniform manner. While we have implemented only three common boolean composition modes, this model supports any of the boolean operators.

EXAMPLES

To demonstrate these ideas we created an application to browse a database of US census data [7]. In this database, each row represents a city and the columns describe the city along various census metrics: population, crime rate, property tax rate, and so forth. We chose this database because it is publicly available, lends itself to fairly interesting queries, and the data elements have an intuitive mapping to the 2D plane, namely the physical location of the city on a map.

In this implementation, each lens is implemented as an X window [16]. A lens manager server extends the X window system to support magic lens functionality. Therefore, the lenses can be manipulated using the regular window manager interface. Lenses can display their output using the X graphics library, or, upon request, they can display it in PostScript(R) [3] form. Using this facility, the application can generate PostScript pictures describing its screen appearance. We used such generated PostScript for all the black and white figures in this paper. Choosing this form, instead of a screen snapshot, allows us to customize the presentation to fit the space and color limitations of the proceedings. The figures in the color plates are screen snapshots that show the screen appearance of the running application.

Color plate 1 shows a typical lens filter in action. Each city is displayed as a blue-rimmed white box at a point proportional to its latitude and longitude. The data attribute associated with the lens filter covering the center of the country, "1991 crime index," is displayed in the window header. Below that on the left is a slider used to control the threshold value for the query, and a label showing the current slider value (12158.5 in this example). The buttons to the right of the slider control whether the user is screening for data less than ( < ) or greater than ( > ) the slider value. Cities shown in red, rather than white, pass the filter. For example, in this case, cities in the center of the country with a crime rate greater than 12158.5 are shown in red. Buttons along the right edge of the application are used to spawn new lenses (13 different types are presently implemented), and for application housekeeping.

Figure 1 shows the effects of applying different composition modes to the composition of two filters. The composition mode for a lens is defined by a group of four radio buttons. The buttons labeled AND and OR set the composition mode accordingly. The SELF button causes the lens to display only the effect of its own filter, ignoring other lenses, and the NOP button sets the filtering function to NULL, disabling the effect of the lens. These two modes are useful when interpreting complex queries. In figure 1(a) we look for cities which have high annual salaries and low taxes. We can make the query less demanding by using the OR button to change the composition mode (figure 1(b)), to see cities which have either high salaries or low taxes. The SELF and NOP buttons can be used to examine these components separately, to determine which cities have high salaries and which have low taxes.

An alternative interface for setting the composition mode is to use a click-through tool. In this interface, a single button on the lens indicates the current mode. A click-through tool contains buttons that can be used to change the mode by clicking through them to the lens below. The advantage of this interface is that it supports a large set of composition modes without visual clutter The disadvantage is that it requires more steps to change the mode. Our application supports both interfaces; users can choose the interface they prefer. An example of a lens with a single mode button is shown in color plate 2.

Figure 1(a) High salaries AND low taxes. Cities with partial data are shown in figure 1(b) but not in 1(a).

Figure 1(b) High salaries OR low taxes. Both conjunctive (AND) and disjunctive (OR) queries are incorporated in our system.

Figure 2. Semantic filters can be augmented with visual filters. Here, a magnifying lens and a call-out lens show clumped cities while maintaining context elsewhere.

MULTIPLE VIEWS

Simultaneous multiple queries can be performed by positioning different lenses over different parts of the display. Each lens, and hence each query, can be independently customized by manipulating the controls on the lens. We can also use lenses to generate alternate views of the data, such as magnifying lenses, callouts, sorted views, and other such transformations. The user interface of these visual transformations is identical to that of the semantic transformations (the filters).

For example, a scatterplot representation of the data will often have clumps, where some set of data points map to nearby points on the display or even overlap. Figure 2 shows two examples of this clumping, and uses multiple visual filters to aid in viewing the data. A small magnification lens is positioned over northern California, letting us see that there are four cities clumped into the bay area. Over southern California, a callout lens is placed. This lens displays the cities as a list, making it easy to separate them. The rest of the map is displayed in the usual manner. This allows easy identification of cities in the dense region while maintaining global context. The boxes next to the city names on the callout are active, so filters placed over them act exactly as if they were placed over the original map. Cities listed without boxes are missing the data selected by the filter, "Median home price."

In the star-field paradigm, a user manipulates a set of selectors, and observes their filtering effect on the data. By associating each selector with a single movable filter, we gain the ability to pose multiple simultaneous queries, each with its own set of parameters. For example, suppose we wish to determine which cities in each region of the country have relatively low housing prices. We have data available for the average housing price per city. However, the range of values for this attribute is wide, and varies geographically. For example, houses on the west coast are typically more expensive than houses in the midwest. Therefore, we need to filter on a higher threshold on the west coast than in the midwest. Figure 3 shows two filters with two different threshold values positioned over California and over Texas.

Figure 3. To find relatively high housing prices in California and Texas, two different filters are positioned simultaneously.

EXTENSIONS

In the previous section, we discussed how movable filters can be used to perform general boolean queries on scatterplot data. In this section, we discuss further extending the power of our filters such that they support real-valued queries and undefined data values.

Real-valued Queries

To support more powerful queries, we extended our system such that filters assign a real-valued score on the range [0...1] to each datum. Data with a score of 0 fails the filter entirely, data with a score of 1 succeeds entirely, and data with intermediate scores partially satisfy the filter. This provides more information about the value returned by the query. In general, the scoring function could be arbitrary-a filter might prefer data with extreme values, or assign a gaussian falloff to data with the highest score going to data of a certain value, or other similar metrics. Our implementation currently supports only linear and step scoring functions, but this is not due to any limitation imposed by the technique. We present the score visually by showing each datum as a partially filled-in square; the higher the score, the more of the square is filled in.

For example, in figure 4(a) a boolean filter is screening for crime rate, and both Dallas and Fort Worth are seen to have high crime rates. When we switch to real-valued filters (figure 4(b)) we see that Dallas has a lower crime rate than Fort Worth. The values displayed were computed as follows. As in the boolean case, the filter slider value defines a threshold. In figure 4, for example, the threshold is 12838. When screening for data greater than the threshold ( > ), data below the threshold is assigned a score of 0, as is the case for Arlington in Figure 4. The city in the database with the greatest value for the data (in the case of crime data, it happens to be Atlanta, GA, with a crime index of 18953) is assigned a score of 1.0. Cities with values between the threshold and the maximum are assigned a simple linear score equal to their value divided by the maximum value. When screening for values less than the threshold ( < ), values below the threshold are assigned non-zero scores in a similar manner.

Computations can be performed on the output of real-valued filters. For example, in figure 5 we have placed a sorting lens over a real-valued crime rate filter. Only the cities under the sorting lens are sorted. We can see that in Florida, Jacksonville has the lowest crime rate and Miami has the highest.

Figure 4(a) boolean query on crime rate for three cities in Texas.

Figure 4(b) Real-valued query on crime rate for the same cities. Extending our filters from boolean-valued to real-valued allows distinctions to be maintained.

Real-valued filters require real-valued composition modes. The real-valued MIN and MAX composition modes correspond to the boolean AND and OR as in fuzzy logic [21]. That is, MIN and MAX work the same on 0.0 and 1.0 as AND and OR do on 0 (False) and 1 (True), but can also incorporate values in-between. A real-valued NOT filter returns 1.0 minus its input value.

Just as boolean filters can be composed with an arbitrary boolean operation, not just AND and OR, so too can real-valued filters be composed with an arbitrary numerical operation, not just MIN and MAX. These operations can be statistical transformations used to change the data distribution (sqrt, arcsin, etc.), fuzzy logic operators used to tune the query ("very," "somewhat," "more or less," etc.), or mathematical operators used for general manipulation (difference, log, etc.). Color

Figure 5. A sorting lens sorts cities by crime rate in Florida.

plate 2 shows some of these composition modes in use to visualize the correlation between poverty and crime rate. The DIFF composition mode is an example of a mathematical composition operator: it is defined as the absolute value of the difference between its two operands. The filter above that, the "Very" filter, is an example of an operator from fuzzy logic: Very(x) is defined as x squared [8]. By overlaying a real-valued NOT filter above that, we ask where crime rate and poverty rate are NOT VERY DIFFerent. The higher the score, the redder the city's box, the greater the correlation. We can see that poverty and crime rate are highly correlated in most cities.

Missing Data

Databases in general, and the US census database in particular, do not always have all data fully defined. In the case of the US census database, for example, some cities may have population figures included but not crime data, or vice versa. A robust information presentation system must address this problem to give an accurate view of the data. There are two issues: how such data is visually presented and how queries based on that data are performed.

Figure 6(a). A filter finds only one city (San Francisco) with a high score.

Figure 6(b). A missing data lens shows that attribute values are missing for many cities. Cities with missing data are marked with an 'X'.

We visualize missing data by use of a special lens. Normally, any city whose value as a result of a query is missing, or undefined, is simply not shown. The "Show Missing Data" lens makes these cities visible. For example, figure 6(a) shows a query, centered on the West Coast, filtering for cities with severe traffic delays. Only San Francisco is shown to have such delays. By adding a "Show Missing Data" lens, however, as shown in figure 6(b), we see that most of the other California communities don't have data for this category-perhaps San Francisco is not as unattractive as it first appeared.

We incorporated missing data into the semantics of composition by considering missing data to be analogous to the IEEE concept of a non-signaling Not-A-Number (NaN), a value that is defined to "afford retrospective diagnostic information inherited from invalid or unavailable data and results" [12]. In accordance with common practice [13], we then define:

OR(a,NaN) =a, MAX(a,NaN) = a
F(a,NaN) =NaN, for all other composition modes F.

Note that a city that is invisible because of missing data may re-appear when another filter is laid upon it, if that filter's composition mode is MAX or OR.

CONCLUSIONS

Applying magic lens filters to the problem of exploring and visualizing data creates a user model that is expressive yet easy to understand. Placing a lens over the data evokes the physical model of using a lens for filtering or enhancing the view of the data beneath it. Including a slider on the lens allows the user to dynamically vary the query by adjusting the filter threshold.

Overlapping the filters creates a natural metaphor for creating compound queries. By providing explicit composition modes and a mechanism for grouping expressions, the full power of a boolean query language can be expressed graphically. We have also demonstrated how this language can be expanded to include real-valued functions that gracefully handle missing values. Powerful queries can be built up incrementally, then preserved for future exploration. In this way, the lenses provide a mechanism for capturing the result of a data exploration session.

Magic lens filters provide a natural mechanism for visual transformations as well as semantic transformations of the data. We have shown examples of callout, magnification, missing data, and sorting lenses. Other possible functions include alternate representations of score such as color or size, and overlaying geographic information such as city or state boundaries.

In our application, magic lens filters have been combined with click-through tools to implement a wide range of user-interface operations. These can be used to manipulate the data, the filtering operations, or the tools themselves. For example, click-through tools are used to create compound lenses, generate figures, provide additional parameters for the filtering operations, and to change the layout of the buttons and sliders.

In summary, magic lens filters provide a uniform, powerful and extensible mechanism for data visualization and exploration applications.

FUTURE WORK

Within the particular application of analyzing city census data, we are working to include a wider variety of scoring functions, to explore other visual techniques (e.g. blurring and/or translucency as discussed in [9]) for expressing queries and their results, and to make the user interface smoother, faster and more sophisticated. This will enable us to perform user studies to help quantify the value of this technique.

We are also interested in extending this work to apply to other types of data and other forms of information visualization.

Acknowledgments

We thank Jock Mackinlay for his enthusiasm, encouragement, and shared realization that magic lens filters could support disjunctive queries. We thank Eric Bier for his user-interface and illustration design expertise, and Ken Pier for his help in the preparation of the paper. Finally, we thank Xerox PARC for its continuing support.

Trademarks and Patents: Magic Lens and Visual Recall are trademarks of the Xerox Corporation. PostScript is a trademark of Adobe Systems, Inc. Patents related to the concepts discussed in this paper have been applied for by the Xerox Corporation.

References

1. Christopher Ahlberg, Christopher Williamson, and Ben Shneiderman. Dynamic Queries for Information Exploration: an Implementation and Evaluation. Proceedings of CHI '92, 1992. pp. 619-626.

2. Christopher Ahlberg and Ben Shneiderman. Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays. Proceedings of CHI '94, (Boston, MA, April 24-28) ACM, New York, (1994), pp. 313-317.

3. Adobe Systems Incorporated. PostScript(R) Language Reference Manual, second edition. Addison-Wesley, 1990.

4. Jacques Bertin. Semiology of Graphics. University of Wisconsin Press. 1983.

5. Eric A. Bier, Maureen C. Stone, Thomas Baudel, William Buxton, and Ken Fishkin. A Taxonomy of See-Through Tools. Proceedings of CHI '94, (Boston, MA, April 24-28) ACM, New York, (1994), pp. 358-364.

6. Eric A. Bier, Maureen C. Stone, Ken Pier, William Buxton, and Tony D. DeRose. Toolglass and Magic Lenses: The See-Through Interface. Proceedings of Siggraph '93 (Anaheim, CA, August), Computer Graphics Annual Conference Series, ACM, 1993, pp. 73-80.

7. Bureau of the Census. Statistical Abstract of the United States 1993. Washington DC, 1993.

8. C.L. Chang. Interpretation and Execution of Fuzzy Programs. In Fuzzy Sets and Their Applications to Cognitive and Decision Processes. Academic Press. New York. 1975. pp. 191-218.

9. Grace Colby and Laura Scholl. Transparency and Blur as Selective Curs for Complex Visual Information. Proceedings of SPIE '91 (San Jose, Feb). SPIE, 1991, pp. 114-125.

10. Max J. Egenhofer. Manipulating the Graphical Representation of Query Results in Geographic Information Systems. Proceedings of the 1990 IEEE Workshop on Visual Languages. IEEE Computer Society Press, Los Alamitos CA, 1990, pp. 119-124.

11. Jade Goldstein and Steven F. Roth. Using Aggregation and Dynamic Queries for Exploring Large Data Sets. Proceedings of CHI '94, (Boston, MA, April 24-28) ACM, New York, (1994), pp. 23-29.

12. IEEE. The IEEE Standard for Binary Floating-Point Arithmetic. IEEE, New York, 1985.

13. W. Kahan. How should Max and Min be defined? University of California, Berkeley. May 25, 1989.

14. Daniel A. Keim and Hans-Peter Kriegel. VisDB: Database Exploration Using Multidimensional Visualization. IEEE CG&A, 14(5), September 1994. pp. 40-49.

15. George G. Robertson, Stuart K. Card, and Jock D. Mackinlay. Information Visualization Using 3D Interactive Animation. Communications of the ACM, 36(4), April 1993. pp. 57-71.

16. Robert W. Scheifler, James Gettys, and Ron Newman. X Window System. Digital Press, Bedford MA, 1988.

17. Maureen C. Stone, Ken Fishkin, and Eric A. Bier. The Movable Filter as a User Interface Tool. Proceedings of CHI '94, (Boston, MA, April 24-28) ACM, New York, (1994), pp. 306-312.

18. Edward R. Tufte. The Visual Display of Quantitative Information. Graphics Press. 1983.

19. Jeffrey D. Ullman. Principles of Database Systems. Computer Science Press, 1980.

20. XSoft Corporation. Visual Recall for Windows. Xerox Corporation. Palo Alto, CA. 1994.

21. Lotfi Zadeh. Fuzzy Sets. Information Control, vol. 8, pp. 338-353. 1965.