



Sougata Mukherjea, James D. Foley, Scott Hudson
Abstract:
Our work concerns visualizing the information space of hypermedia systems using
multiple hierarchical views. Although overview diagrams are useful for helping the user to
navigate in a hypermedia system, for any real-world system they become too complicated
and large to be really useful. This is because these diagrams represent complex network
structures which are very difficult to visualize and comprehend. On the other hand,
effective visualizations of hierarchies have been developed. Our strategy is to provide
the user with different hierarchies, each giving a different perspective to the underlying
information space to help the user better comprehend the information. We propose an
algorithm based on content and structural analysis to form hierarchies from hypermedia
networks. The algorithm is automatic but can be guided by the user. The multiple
hierarchies can be visualized in various ways. We give examples of the implementation of
the algorithm on two hypermedia systems.
Keywords:
Hypermedia, Overview Diagrams, Information Visualization, Hierarchization.
Introduction
Overview diagrams are one of the best tools for orientation and navigation in hypermedia
documents [17]. By presenting a map of the underlying
information space, they allow the users to see where they are, what other information is
available and how to access the other information. However, for any real-world hypermedia
system with many nodes and links, the overview diagrams represent large complex network
structures. They are generally shown as 2D or 3D graphs and comprehending such large
complex graphs is extremely difficult. The layout of graphs is also a very difficult
problem [1]. Other attempts to visualize networks such as
Semnet [3], have not been very successful.
In [13], Parunak notes that: "The insight for hypermedia is that a hyperbase structured as a set of distinguishable hierarchies will offer navigational and other cognitive benefits that an equally complex system of undifferentiated links does not, even if the union of all the hierarchies is not itself hierarchical." Neuwirth et al. [12] also observed that the ability to view knowledge from different perspectives is important. Thus, if different hierarchies, each of which gives a different perspective to the underlying information can be formed, the user would be able to comprehend the information better. It should be also noted that unlike networks some very effective ways of visualizing hierarchies have been developed. Examples are Treemaps [7] and Cone Trees [15].
This paper proposes an algorithm for forming hierarchies from hypermedia graphs. It uses both structural and content analysis to identify the hierarchies. The structural analysis looks at the structure of the graph while the content analysis looks at the contents of the nodes. (Note that the content analysis assumes a database-oriented hypermedia system where the nodes are described with attributes). Although our algorithm is automatic, forming the "best" possible hierarchy representing the graph, the user can guide the process so that hierarchies giving different perspectives to the underlying information can be formed. These hierarchies can be visualized in different ways.
Section 2 presents our hierarchization process. Section 3 shows the implementation of the algorithm in the Navigational View Builder, a system we are building for visualizing the information space of Hypermedia systems [10], [11]. This section discusses the application of the algorithm on a demo automobile database and a section of the World-Wide Web. Section 4 discusses how the hierarchies can be transformed to other forms of data organizations. Section 5 talks about the related work while section 6 is the conclusion.
FIGURE 1: An example pre-tree. It has a root node which does not have any parent. The descendants of the root node are graphs. However, none of these graphs have any links between them. Our hierarchization algorithm tries to identify the best pre-tree to represent the given graph. The final tree is formed by calling the algorithm recursively for the branches.
For identifying potential pre-trees both content and structural analysis are used.
For content analysis, for each attribute, the nodes of the graph are partitioned into branches based on the attribute values by Content-based Clustering. The clustering algorithm is explained in [11]. If too many or too few branches are formed, the attribute is not suitable for forming a pre-tree. Otherwise a new pre-tree is formed with these branches. The root of the pre-tree is a cluster representing all the nodes of the graph.
A pre-tree is formed for nodes in the graph which can reach all other nodes. These nodes are designated as the roots of the pre-trees. The branches are the branches of the spanning tree formed by doing a breadth-first search from the designated root node. (A detailed analysis is omitted for the purpose of brevity. The algorithm is explained with examples in the next section).
FIGURE 2: An overview diagram of an automobile database. The diagram is very difficult to comprehend.
FIGURE 3: The left hand screen shows the default tree formed for the automobile database. The top-level partitioning is by the attribute Price. The right hand screen shows the tree formed if the top-level partitioning is done by the attribute Country.
The user can form different hierarchies by selecting other pre-trees. For example, if the user wanted to select the pre-tree at the initial level, the dialog box shown in Figure 4 pops up. If the user wants to partition based on the attribute Country, the tree shown in the right hand screen in Figure 3 is formed. In this figure some of the children represents clusters for countries. For example the node labeled Japan represents all the Japanese cars and its children are shown in the left hand screen of Figure 5. Here the partitioning is done by the attribute Manufacturer. For some other countries the nodes in the cluster formed a tree. In these cases the roots of the tree were identified by structural analysis and they became the children of the overall root. Thus for Sweden, Saab-Info is the root of the tree for all nodes related to Swedish cars. Its children are shown in the right hand screen of Figure 5.
FIGURE 4: At each level various pre-trees can be used. A metric ranks these pre-trees. By default the pre-tree with the best metric is selected. However, the user can select others using the above menu.
FIGURE 5: Examples of Content and Structural analysis for forming pre-trees. The left hand screen represents the nodes for Japan. The root is a cluster representing all Japanese cars. The nodes are partitioned by the attribute Manufacturer. The right hand screen is for Swedish cars. These nodes form a tree with the node Saab-Info as the root.
Figure 6 shows a 3D Tree view of this hierarchy. In this view, the colors of the nodes represent various countries and the colors of the links represent link types. Various zooming and filtering operations that are mentioned in [15] are possible for this 3D tree. Moreover, smooth animation is used so that the view changes are not abrupt and allow the user to see the changes easily. (Note that the implementation is done using C++, Motif and Open Inventor [18].)
FIGURE 6: A 3d tree View of a hierarchy of the automobile database. Initial partitioning by the attribute Country. Node colors represent different countries and the link colors different link types.
FIGURE 7: An overview diagram of the World-Wide Web pages about the research activities at GVU. It indicates clearly why traditional overview diagrams are useless for real-world hypermedia systems.
The left hand screen of Figure 8 shows the top level of the hierarchy automatically created for the data by the system. The file research.html which lists the various research activities of the GVU Center is the root. It has branches to the major research area as well as to gvutop.html, a file containing general information about GVU. The right hand side of Figure 8 shows a view of a section of this hierarchy where the nodes are listed as a table of content of a book.
FIGURE 8: The left hand screen shows the top level of the default hierarchy formed for the GVU WWW pages. research.html is the root and the major research areas are shown. The right hand screen shows a book view of a portion of this hierarchy showing research in Software Visualization.
A major drawback of the World-Wide Web is that very few useful semantic attributes are defined for the pages. To create some other meaningful hierarchies, attributes like the topic of the page (whether it is a research page or a personal page, etc.) were inserted manually. (Efforts are underway to incorporate metadata into WWW and hopefully in the near future we can extract all useful information from the WWW automatically.) The left hand screen of Figure 9 represents a treemap view of a hierarchy formed when the initial partitioning is done by the topic of the page. The colors are used to represent the kind of users who created the pages. Green is used to represent Phd students and the color plate indicates that the Phd students are the primary authors of the pages.
FIGURE 9: The left hand scree shows a Treemap view of a hierarchy of the GVU WWW pages. Initial partitioning is by the attribute Topic . Colors represent different types of authors. The selected node is visdebug.html . The corresponding WWW page is shown on the right.
Multiple hierarchies, each giving a different perspective to the underlying information space can be formed. If a user selects a node in one view, its positions in the other views are also highlighted. Thus, these views help the user in comprehending the data. It should be also noted that the user can go directly to the corresponding WWW page for the selected node. Thus in the Treemap view, the node visdebug.html is highlighted. The corresponding WWW page is shown on the right hand screen of Figure 9.
Figure 10 represents a perspective wall [9] view of a linear arrangement of the GVU WWW pages sorted by the last modification times of the files. From the hierarchy whose initial partitioning was by the attribute last-modified-time, the files were divided into partitions based on the time when they were last modified. These partitions were arranged on walls. Only some walls are in the focus at a given time. The user can easily control the walls which are in focus through a scrollbar. Similarly, for the automobile database a Perspective Wall view can be formed where the cars are sorted by the attribute Price.
FIGURE 10: A Perspective Wall view showing a linear arrangement of the files based on the last modification time. The different walls show files which were last modified in different time frames. Only some walls are in the focus at a given time.
Other views can also be generated. For example, a tabular view showing information like average price, mileage, etc. for various car models and also such useful statistics for different manufacturers of the cars can be formed by a depth-first traversal of the hierarchical structure whose partitionings are done by the attributes Manufacturer and Car-Model.
This paper is also related to systems that deal with graphical presentation of information automatically or semi-automatically. Examples include APT [8] and SAGE [16]. However, our information domain is different from these systems - these systems deal with highly structured information. The views that we want to develop are also different. The previous systems generally produced bar diagrams, scatter plots and such graph views.
CONCLUSION
One of the best ways to comprehend a large complicated information structure is to form
multiple simpler structures each highlighting different aspects of the original
structure. Our work tries to use this philosophy to make a complex hypermedia system
understandable to the user. We believe that by forming various effective views of the
underlying space, we would allow the user to better understand the complex information.
We give examples of the hierarchization process from two complicated hypermedia systems to
illustrate our point. These examples show that our algorithm was able to extract
meaningful hierarchies which gave better insights into the complex information spaces.
Future
work is planned along the following directions:
algorithm to identify roots. On the other hand we use
a
algorithm to identify roots (by calling the
breadth-first search for each node). Although in the worst case l =
, on average l =
and our algorithm will perform better. For the WWW
database with about 400 nodes and 800 links our algorithm took about 7 seconds on a SGI
reality engine. Although this is acceptable, we will face problems for larger databases.
We are investigating ways to enhance the performance by improving the efficiency of the
code and using probabilistic algorithms to identify roots. Moreover, even cone trees and
treemaps are not able to visualize larger databases effectively. New visualization techniques are needed.
ACKNOWLEDGEMENT
This work is supported by grants from Digital Equipment Corporation, Bell South
Enterprises, Inc. and Emory University System of Health Care, Atlanta, Georgia as part of
the Hypermedia Interface for Multimedia Databases project. We would also like to thank
the reviewers of this paper for their useful comments.
References