In NUS-HCI Lab, we are also interested in the research topic of Information Visualization.

 

PCP vs. SCP

Tracing Tuples Across Dimensions: A Comparison of Scatterplots and Parallel Coordinate Plots

Xiaole Kuang, Haimo Zhang, Shengdong Zhao, Michael J. McGuffin

 

Paper and Presentation

Additional Information

Definition

image

We evaluated 4 techniques for our experiment (PCP, SCP-rotated, SCP-common, SCP-staircase), see figure below.

image

Below are the take away lessons from this study.

  • The first take away lesson is both SCP-rotated and SCP-staircase perform badly for the value retrieval task, so they are not recommended, see figure below.

image

The second set of major take away lessons are:

  • PCP and SCP-common performed better and were preferred by participants. However, these two techniques seem suited for different scenarios: PCP is better at low dimensionality and low density, and SCP-common is better when these are higher.
  • The performance of PCP is dependent on dimensionality, while the performance of SCP-common seems roughly independent of dimensionality.
  • Increasing density affects the performance of PCP more than it affects SCP-common.

image

Figure 10 shows the recommendation of techniques for value retrieval under varied dimensionality and densities. Cells with “PCP” or “SCP-common” means PCP or SCP-common has significant better performance. Cells with the “~” symbol  means PCP and SCP-common have comparable performance. And, cells with question mark is the condition which we have not covered in this study.
It can be seen that, PCP is recommended for cells in top left corner, which represent multivariate data with lower dimensionalities and densities. The cells in the bottom right corner represent multivariate data with high dimensionality and densities, and SCP-common is preferred. Cells on the diagonal line can use either of the two approaches, which offers users a choice depending on other considerations.

image

Figure 10: The recommendation of techniques based on experiment results. The dotted red rectangle highlights the conditions in experiment 1; the solid green rectangle highlights the conditions in experiment 2.

 

ABSTRACT

One of the fundamental tasks for analytic activity is retrieving (i.e., reading) the value of a particular quantity in an information visualization. However, few previous studies have compared user performance in such value retrieval tasks for different visualizations. We present an experimental comparison of user performance (time and error distance) across four multivariate data visualizations. Three variants of scatterplot (SCP) visualizations, namely SCPs with common vertical axes (SCP-common), SCPs with a staircase layout (SCP-staircase), and SCPs with rotated axes between neighboring cells (SCP-rotated), and a baseline parallel coordinate plots (PCP) were compared. Results show that the baseline PCP is better than SCP-rotated and SCP-staircase under all conditions, while the difference between SCP-common and PCP depends on the dimensionality and density of the dataset. PCP shows advantages over SCP-common when the dimensionality and density of the dataset are low, but SCPcommon eventually outperforms PCP as data dimensionality and density increase. The results suggest guidelines for the use of SCPs and PCPs that can benefit future researchers and practitioners.

INTRODUCTION

Multivariate data is a commonly encountered type of data (e.g., in relational databases), consisting of a list of points or tuples, each corresponding to a row in a table, whose columns are the attributes or variables of the data. Two widely used visualization techniques for multivariate data are parallel coordinate plots (PCP) and scatterplots (SCP) [Weg90,KD09,TGS04,SS05,AR11]. PCPs display each tuple as a polygonal line intersecting parallel axes, each representing one of the variables, thus providing a continuous view of the multidimensional values of the data tuples [Ins85]. SCPs, on the other hand, show only 2 variables per plot, but can be combined to visualize multivariate data with more than 2 dimensions, such as in a scatterplot matrix [Har75].

Despite the call for rigorous evaluation of experimental visualization techniques over a decade ago [WB97], to date, much still remains unknown about the respective advantages of PCPs and SCPs for different user analytic tasks. To our knowledge, there are only two empirical comparisons of these techniques. One [LMvW10] asked users to estimate correlation coefficients using PCPs and SCPs, and another [HvW10] asked users to count clusters; both studies found SCPs to be superior. Given these results, it seems unclear what advantage, if any, PCPs provide. However, the tasks in the two previous studies are just two of many possible tasks. Several other tasks with visualizations have been identified [Shn96,AES05] and have yet to be tested.

We extend previous efforts by comparing SCPs and PCPs for the task of value retrieval, a fundamental task that is the first in the taxonomy of analytic tasks by Amar et al.’s [AES05] and said to be a building block of other tasks such as finding extrema or sorting [AES05]. As an initial exploration, our study focuses on differences due to the basic visual designs of SCPs and PCPs in their static form. We believe it is important to understand trade-offs due to their basic visual designs before investigating the effects of visual or interactive enhancements. Therefore, brushing, linking as well as additional visual enhancements such as gridlines are not included in this investigation. Furthermore, value retrieval by visual scan is commonly performed in practice since it is an integral component of many higher-level tasks in which explicit clicks would be inappropriate.

We conducted two controlled experiments involving four visualization techniques: three SCP variants (SCP-common, SCP-rotated, and SCP-staircase) and the baseline PCP, on datasets of varied dimensionalities and densities. It was found that SCP-rotated and SCP-staircase are not suitable for value retrieval. PCP and SCP-common yield better performance and are preferred by participants, but each is suited for different scenarios: PCP is better at low dimensionality and low density, while SCP-common is better in the opposite case. Increasing dimensionality seems to only affect performance with PCP, not SCP-common. Increasing density, while affecting both visualizations, has a stronger effect on PCP than SCP-common. Such differences are likely due to the different value retrieval strategies adopted by users and the different visual encodings of data tuples in the two visualization techniques (points versus lines). These results may be used by researchers and practitioners to better understand the differences between PCPs and SCPs, and to promote their appropriate use in the future.

RELATED WORK

Two aspects of previous research are related to our study: variants and hybrids involving scatterplots and parallel coordinate plots, and their comparisons.

A single SCP depicts two variables, and is thus insufficient for multivariate data. The scatterplot matrix (SPLOM) [Har75] shows every possible pairing of variables with multiple SCPs. Other variants with multiple SCPs have been proposed [QCX07, VMCJ10] that show a subset of the SCPs in a SPLOM, arranged with various layouts. Qu et al. [QCX07] showed a row of SCP cells, where consecutive SCPs have an axis in common that is rotated (a technique we call SCP-rotated). These SCPs correspond to cells that are adjacent to the diagonal in a SPLOM. Viau et al. [VMCJ10] consider rows of SCPs taken directly from a SPLOM, in which all the SCPs of the row have the same vertical axis (a technique we call SCP-common), privileging the variable along the shared, vertical axis. Viau et al. [VMCJ10] also presented a novel “staircase” arrangement (we call SCPstaircase), where adjacent SCPs share a common axis.

Parallel coordinates [Ins85] lend themselves naturally to multivariate data due to their inherently multidimensional design. Research into PCP variants has examined the use of curves instead of polylines [The00], variations in colors and transparency, and animation for line disambiguation, as surveyed by Holten and van Wijk [HvW10]. Qu et al. [QCX07] have extended PCPs with S-shaped axes to indicate wind direction. Artero et al. [AdOL04] proposed an interactive PCP variant.

Hybrid visualizations that combine SCPs and PCPs have included embedding SCP cells between PCP axes [HvW10], scattering points along curves between PCP axes [YGX09], the parallel scatterplot matrix [VMCJ10], and highly flexible custom visualizations integrating SCPs and PCPs [CvW11].

In contrast to the many variants and hybrids of SCPs and PCPs, and evaluation within PCP variants [HLKW12], comparisons between these two families of visualizations have been rare. Li et al. [LMvW10] found SCPs to be significantly superior to PCPs for judging correlation coefficients. Holten and van Wijk [HvW10] compared cluster identification performance over several PCP variants, and found that the PCP variant with embedded SCPs significantly outperformed other variants, implying that SCPs hold an advantage over PCPs. Our work extends these previous studies by comparing performance in value retrieval with PCPs and three variants of SCPs.

EXPERIMENT DESIGN

We conducted two controlled experiments to compare SCP and PCP visualizations. The next few subsections first describe aspects common to both experiments.

Task

We define a “value retrieval” task [AES05] in the context of multivariate data: given the numerical value of one  attribute of a data tuple, find the numerical value of another attribute of the same data tuple. Value retrieval is a common, fundamental user analytic task. For example, if a user wants to find the average mileage for a car with 230 horsepower in a multivariate visualization, s/he may first locate the horsepower axis and find a data tuple corresponding to 230 horsepower, and then trace the tuple to the mileage axis and read its value off that axis. In general, it is possible for some axes to correspond to categorical (such as car brands) or ordinal (such as degree of satisfaction) variables, however our study focuses on the most general case: quantitative variables.

Independent Variables

Our experiments involved three independent variables: visualization technique, data dimensionality, and data density.

3.2.1. Visualization Technique

PCPs have a single straightforward layout (Figure 1:(a)). SCPs, in contrast, afford many different layouts. The aforementioned full SPLOM shows all pairings of variables, so its space utilization is quadratic with the number of variables. PCPs, however, have space requirements linear with the number of variables. A fair comparison requires all techniques occupy the same space. Therefore, we evaluated three SCP variants with linear space requirements:

 

pcpvsscp fig.1

Figure 1: The four evaluated techniques (a). Baseline PCP; (b). SCP-common; (c). SCP-rotated; (d). SCP-staircase.

  1. SCP-common: a row of SCPs taken from a standardSPLOM. SCP-common has the advantage of having a common and aligned vertical axis for all its individual cells (Figure 1:(b)).
  2. SCP-rotated: a row of SCPs formed from the SCPs adjacent to the diagonal of a SPLOM (Figure 1:(c)).
  3. SCP-staircase: adjacent SCPs in this layout have a common and aligned axis (Figure 1:(d)).

As shown previously [VMCJ10], PCPs and each of the above SCP variants require O(NL2) space, where L is the length of the axes, and N is the number of variables.

Data Dimensionality and Density

Two important characteristics of a multidimensional visualization are the dimensionality of the data, and the density of tuples (number of tuples per unit display area). Since both characteristics may affect the difficulty of value retrieval, they were both varied in our experiments.

Following Li et al. [LMvW10], we conducted a pilot study with five participants (1 female, 4 males) to identify the feasible range of data density for each visualization technique. We fixed the size of cells in the visualizations to be 49 cm2. For each visualization technique, participants tried to finish value retrieval tasks with datasets of increasing densities, starting at 5 tuples per cell with increments of 5 tuples, until they found it too difficult to complete the trials and gave up. We recorded the number of tuples each participant completed just before giving up as the maximum tolerance.

On average, the maximum tolerance for density was 50 tuples for SCP-common, 35 tuples for SCP-rotated, 30 tuples for SCP-staircase, and 45 tuples for PCP, suggesting that users are less frustrated with SCP-common and PCP when they are dealing with dense datasets.

Since reported densities in different studies may have different units, we must normalize to standard units (tuples/cm2) to allow for comparisons. The densities of 10, 20, 30, and 40 tuples in a cell of 49 cm2 correspond to 0.20 tuples/cm2, 0.41 tuples/cm2, 0.61 tuples/cm2, and 0.81 tuples/cm2, respectively. In previous work [LMvW10], the densities used were 10, 40, 160 tuples displayed in a 24 cm × 26cm area, which is equivalent to 0.016 tuples/cm2, 0.064 tuples/cm2, and 0.26 tuples/cm2, respectively. In comparison with Li et al.’s [LMvW10], our densities are higher in terms of tuples/cm2, but lower in terms of total number of tuples. Because our experiment displays multiple plots to participants, we cannot have the same number of tuples as Li et al.’s [LMvW10] previous experiment without increasing their screen density even more, thus our chosen values are a compromise. For convenience, we use “tuples” instead of tuples/cm2 to refer to density in the rest of the paper.

Dependent Variables

Two dependent variables, completion time and error distance, were used to measure user performance.

Completion time, in milliseconds, is measured from the appearance of the task stimuli on the screen to the moment the user hit a key to indicate that s/he has found the answer. Note that the time spent on typing in the exact numerical value is not counted in completion time. This is because we intended to prevent the input time from contaminating the raw result for value retrieval.

Error distance is measured as the absolute difference between the actual target value and participant’s input. For example, if the actual value for a tuple on the target axis is 15, but the participant keys in 10, the error distance would be Abs(10 − 15) = 5. The smaller the error distance, the better the accuracy it is. We chose the continuous scale of error distance instead of a Boolean category of hit and miss to measure the errors, in favor of its added level of details.

Apparatus

Two iMac11,3 computers with 2.7GHz quad-core Intel Core i5 processors running on OS X Lion were used for the experiment. Each computer was equipped with a standard mouse and keyboard. The display size was 27 inches (597.73mm by 336.22mm), 2560 by 1440 pixels, corresponding to a pixel pitch of 0.233mm. The experiment software was implemented in JavaScript with Protovis[1]and run in Firefox browser version 8.0.1 in full screen mode.

 

pcpvsscp fig.2

Figure 2: The stimuli used in the experiment. 1) Basic experimental information: trial number, time spent on the current trial, and task description; 2) the red × indicating the value for the tuple of interest; 3) the highlighted target axis.

Stimuli

Figure 2 illustrates an example of stimuli used in the experiment. The top of the screen displays information about the current trial: number, time spent, and task description (e.g. for an N dimensional dataset, it shows “with the highlighted X1 value, what’s the corresponding X_N value?”). Just below this is the main experimental area in which the data and the visualization techniques are displayed.

Cell size: To fully utilize the screen estate while allowing the participants to simultaneously view the maximum number of dimensions without scrolling, each plot cell has a fixed length of 70mm, which translates to 300 pixels in our display configuration. This allows a maximum of 8 dimensions to be comfortably displayed (e.g. 300×7 = 2100 pixels for the 7 scatterplots + 50 × 6 = 300 pixels for the 6 visible gaps of 50 pixels each between adjacent scatterplots + spaces before and after the first and last scatterplot).

Tuple size and color: The data tuples in SCP are visualized using points of 4-pixel radius; for PCP, each data tuple is represented using a line of 1 pixel in width, both rendered with anti-aliasing. Based on our observation, these are the minimum data tuple sizes for participants to comfortably recognize under the current screen resolution. All data tuples are displayed in blue. All axes, numeric labels, and tick marks on the axes are in black. The value for the target data tuple is highlighted in red on the corresponding axis.

Stimuli generation: Data tuples are generated randomly with uniform distribution along each dimension according to the density requirement. The numeric values of all data tuples are integers between 0 and 50. This range is fixed for all axes across all conditions and techniques so that it can serve as a constant. To avoid possible ambiguity of multiple data tuples having the same value as the highlighted tuple (in which case the users are unable to determine the tuple to trace from), when choosing the target tuple, we purposely avoid those with neighbors that are closer than 8 pixels or equivalently 1.9mm on all dimensions.

Procedure

Prior to the experiment, each participant was introduced to the visualization techniques and the value retrieval task. They were also instructed to finish the trials as quickly and accurately as possible while not using any visual aid (mouse cursor, finger, ruler, pen tip, etc.) other than their eyes. They were also informed that there is no ambiguity in the highlighted value.

A training session familiarized the participants with the techniques. They were instructed to continue practicing until they were fully comfortable with the value retrieval tasks with each technique before starting the main experiment.

For each trial in the main experiment, upon determination of the numeric value on the target axis, the participants were expected to hit the space bar, after which the timer is stopped and the visual stimuli is masked. The participants are then required to take their time to key in the numeric value in the provided input box. The visual stimuli were masked to prevent participants’ visual residue from affecting their responses, which should not change after hitting the space bar.

Considering the switch between different techniques may result in relative longer response time to readapt, a pop-up window is shown whenever there is a change in techniques between the trials to remind the participants and to facilitate mental adjustment between different techniques. Upon finishing all the trials in the official session, the participants were invited to a brief interview to collect their subjective opinions. Their responses were audio recorded with their consent.

Result Analysis Method

Both experiments used the within-subject design involving three independent variables: technique, density, and dimensionality. Data were analyzed using factorial RepeatedMeasures ANOVA, with significance level of α = .05. Mauchly’s test was used to verify the assumption of sphericity. Pairwise comparisons for the main effects of different variables were corrected using Bonferroni adjustments.

EXPERIMENT 1

This first experiment is to provide an overall understanding of the performance differences among the four techniques and to identify the winning techniques.

Participants: 12 participants, 5 females and 7 males aged 20 to 25 years, from the university community, volunteered for the experiment. All participants had seen and used 2D SCP before, but none had experience with either PCPs or one of the SCP variants on multivariate data.

Experiment setup: Techniques were counterbalanced using balanced Latin Square. Participants were randomly assigned to four groups of three participants each.

For each technique, participants perform 3 trials in each of the three different data densities: 10, 20, and 30 tuples.

Within each technique and dimension combination, participants perform the trials in three different dimensions (2D, 4D, 6D). Presentation order of the dimensions and densities is both from easy to hard, (i.e., 2D, 4D, 6D for dimensions, and 10-tuple, 20-tuple, 30-tuple for densities) to allow participants to ease gradually to more difficult conditions. Note that since the main purpose of this experiment is to obtain an overall picture for the performance differences among the four techniques, we only counterbalanced the main factor, technique, in this first experiment.

After training, each participant performed the entire experiment in one sitting, including breaks, and post questionnaires in approximately 1 hour. In summary, the design was as follows (excluding trainings): 12 participants × 4 visualization techniques (PCP, SCP-common, SCP-rotated, SCPstandard) × 3 levels of data dimension (2D, 4D, 6D) × 3 levels of data density (10 tuples, 20 tuples, 30 tuples) × 3 repetitions of trails = 1296 trials in total.

Results

For experiment 1, we focus on revealing the overall performance for the four techniques. With regards to the main effect of the techniques, Mauchly’s test verified the assumption of sphericity has been met in both error distance (p = .119) and completion time (p = .057) analysis.

Error Distance

Figure 3:(a) shows the average error distance of each technique. Repeated-measures ANOVA tests suggest that there is a significant main effect of the technique (F(3,33) = 22.34, p < .001= .672, η2 observed power = 1.0).

For reporting the results in pairwise comparison among these four techniques, we use “[]” to enclose techniques  with comparable performance (p > .05) and “>” to indicate the technique on the left of the operator is significantly better than the technique on the right side (p < .05). The relative accuracy performance relationship among the four technique is [PCP (0.98), SCP-common (2.7)] > [SCP-rotated (5.04)] > [SCP-staircase (7.31)].

Completion Time

Figure 3:(b) shows the average completion time with standard errors for the four techniques. Similar with the error distance, it shows that both PCP and SCP-common techniques are better than the SCP-rotate and SCP-staircase (p < .05).

pcpvsscp fig.3

 

Figure 3: The average error distance (left) and completion time (right) with standard error bars among four techniques

Repeated-measures ANOVA tests suggest that the four techniques have significant difference in the completion time of tracing tuples across dimensions (F(3,33) = 27.83, p <.001, η2 = .717, with an observed power = 1.0). Post hoc tests further indicate the differences and ordering among the four techniques as [PCP (8.99s), SCP-common (12.02s)] > [SCP-rotated (18.58s), SCP-staircase (17.93s)].

Experiment 1 Summary

Comparing error distance and completion time among the four techniques, PCP and SCP-common are clearly the two better techniques. Both SCP-rotated and SCP-staircase are not suitable for value retrieval tasks, taking significantly longer time and are more error-prone.

Furthermore, the subjective feedback of both SCP-rotated and SCP-staircase is consistent with the quantitative results: 6 out of 12 participants ranked the SCP-staircase as the least preferred technique while the other half ranked SCP-rotated as the least preferred one. The reported reason for disliking SCP-staircase is the difficulty in tracing tuples across non-horizontal lines. The 45-degree tilted cells require the participant to “tilt the head to see (through imagined projection) the correct value”. This is not only “more tiring”, but also “more difficult to judge whether two points are on the same level”. To many participants, such combined difficulties are so discouraging that they “gave up after a while”.

While fatigue and perceptual difficulties caused by tilting are the main reasons for participants to dislike SCP-staircase, the difficulty in using SCP-rotated was reported to have a different reason. In SCP-rotated, to trace a tuple from one cell to another, it requires the following set of actions: find the target data tuple based on the value marked on the first axis, read the value of that tuple on the second axis, remember that value and locate that value on the same axis in the adjacent cell, and find the tuple in the adjacent cell with that value. As reported by one participant, “you have to always find and remember the value on the axis to move to the next cell (plot). This is too much work when the number of dimensions increases”.

pcpvsscp fig.4

Figure 4: The average completion time among four techniques under varied data dimensions and densities.

While no significant overall performance differences are found between PCP and SCP-common, a further breakdown of the results (Figure 4) struck us with several interesting phenomena.

It is observed that in the 2D case, the performance difference between PCP and SCP-common is small. As the number of dimensions increases to 4, PCP seems to have advantages over SCP-common in all three densities. As the number of dimension increases to 6D, we found that PCP seems to have an advantage over SCP-common in the 10tuple density case, but becomes inferior to SCP-common in the 30-tuple density case.

While PCP seems to have comparable overall performance with SCP-common, fine-grained investigation revealed that there are differences under different conditions. PCP seems to have advantages over SCP-common when density and dimension are low, but this advantage diminishes as dimension and density increase, indicating the strategy and cost for retrieving values for the two techniques are likely to be different.

EXPERIMENT 2

In experiment 1, we identified PCP and SCP-common as the two winning techniques for value retrieval. In experiment 2, we attempt to further investigate the influence of dimensionality and density on these two techniques. While not counterbalancing dimensionality and densities were less of a concern in experiment 1, proper counterbalancing is needed for both factors in this experiment as they become the focus of the study. Furthermore, in experiment 1, we learned that both techniques have similar performance in the 2D condition, but as the dimensionality and density increase, greater performance differences seem to emerge. This motivated us to use both higher dimensionality and density conditions in the second experiment.

Participants: 18 participants, 7 females and 11 males, aged between 20 to 30 years, from the university community, volunteered for the experiment. None had participated in experiment 1. All participants had seen and used 2D SCP before, but none had experience with either PCPs or one of the SCP variants with multivariate data.

Experiment setup: Similar to experiment 1, a withinsubject design was used. However, instead of only counterbalancing the technique, all three factors (technique, dimensionality, and density) are counterbalanced. The technique, with only two levels (PCP and SCP-common), is fully counterbalanced. The dimensionality and density both have three levels (4D, 6D, 8D for dimensionality and 20-tuple, 30-tuple, 40-tuple for density), were counterbalanced using Latin Square.

Combining the 2 techniques with 3 different order sequences in dimensions and with 3 different order sequences in density leads to 18 arrangements of the three factors (2 × 3 × 3 = 18). Participants were randomly assigned to one of the 18 experiment arrangements. For each of the technique, dimensionality, and density combination, participants were asked to perform 5 randomly generated trials.

The flow of the experiment procedure is exactly the same as experiment 1. Each experiment session took approximately 1 hour. The design of experiment 2 can be summarized as follows (excluding trainings):

18 participants × 2 techniques (PCP, SCP-common) × 3 dimensions (4D, 6D, 8D) × 3 densities (20 tuples, 30 tuples, 40 tuples) × 5 trials for each technique, dimension, density combination = 1620 trials in total.

Results

For experiment 2, we counterbalanced all three independent variables (e.g. technique, density, and dimensionality). Mauchly’s tests verified that the assumption of sphericity have been met for the main effects and interaction effects of these variables we mentioned as follows (p > .05)[1]. The observed power for all significant effects were above .80.

Error Distance

Overall, the repeated-measures ANOVA tests revealed no significant differences between techniques (p = .436). However, there were significant main effects in both dimensionality (F(2,34)= 6.124, p < .01, η2 = .265), and density (F(2,34) = 10.637, p < .001, η2 = .385).

Furthermore, Post-hoc comparison (Bonferroni correction) on dimensionality showed the ordering and differences among the dimension and density conditions to be [4D (1.44)] > [6D (2.85), 8D (2.48)] and [20 tuples (1.18)] > [30 tuples (2.59), 40 tuples (2.99)], respectively. These results are less surprising as the error distance is likely to increase as the dimensionality and density increase (as the task becomes more difficult).

pcpvsscp fig.5

Figure 5: The interaction effect for technique × density (left) and technique × dimension (right) in terms of error distance.

However, we found a number of significant interaction effects. There were a significant Technique × Density interaction (F(2,34) = 7.05, p < .01, η2 = .293), and a Technique × Dimension interaction (F(2,34) =10.81, p <.001, η2 = .389). These interaction effects contain key information for us to reveal the relationship among these factors. Figure 5 shows the interaction effects for Technique × Density (left) and Technique × Dimensionality (right).

We found that for SCP-common, the error distance is relatively stable as dimensionality and density changes, but in PCP, the error distance dramatically increases as the dimension or density increases.

Completion Time

Overall, there are significant main effects on techniques (F(1,17) = 9.79, p < .01, η2 = .365), dimensionality (F(2,34) = 64.17, p < .001, η2 = .791), and density (F(2,34)= 42.98, p< .001, η2 = .717).

Post-hoc (Bonferroni correction) comparison on dimensionality and density finds the following relationship among different levels: [4D (13.62s)] > [6D (18.37s), 8D (19.30s)] for dimensionality and [20 tuples (13.50s)] > [30 tuples (17.50s)] > [40 tuples (20.29s)] for density. Just like the observations we made with error distance, the significant effects found in dimensionality and density are expected as the completion time is likely to increase as the dimensionality and density increase.

However, the significant effect found in technique is somewhat surprising as it differs from what we got from experiment 1. In experiment 1, we found that the completion time is comparable (p > .05) between the two techniques with PCP (8.99s) being slightly quicker than that of SCP (12.02s), but experiment 2 tells an almost opposite story, as PCP-common is significantly slower than SCP-common. To understand the reason behind this phenomenon, we need to further analyze the interaction effects below.

Similar to the results found with error distance, we found two significant interaction effects. There were a significant Technique × Density interaction (F(2,34) = 8.74, p < .01, η2= .340), and a Technique × Dimension interaction (F(2,34)= 73.46, p <.001, η2 = .812). Figure 6 shows the interaction effects for Technique × Density (left) and Technique × Dimensionality (right).

pcpvsscp fig.6

Figure 6: The interaction effect for technique × density (left) and technique × dimension (right) in terms of completion time.

pcpvsscp fig.7

Figure 7: The average completion time for SCP-common and PCP under varied dimensions and densities.

We found that the increase of dimensionality has almost no effect on SCP-common, but causes the significant performance degradation to that of PCP. On the other hand, the Technique x Density interaction showed that both PCP and SCP-common are affected by increased density. However, the increase in density seems to cause more damage to PCP than that of SCP-common (i.e., at the density of 20 tuples, PCP has almost equal performance with SCP-common, but when the density is increased to 30 or 40 tuples, PCP is much slower than SCP-common, and the performance gap between the techniques increases with number of dimension).

This effect is further elaborated in Figure 7, in which the effects of all three factors on completion time are simultaneously displayed. Overall, it shows that PCP has advantages over SCP-common when dimension and density are low.

Under each particular (density, dimensionality) condition, we applied Pairwise T-test to compare these two techniques. The results show the advantage of PCP over SCP-common in two low dimension and density cases (4D, 20 tuples; 4D, 30 tuples) (both p < .05). A single step increment in either dimension or density renders PCP comparable to SCPcommon, as proven by pairwise T-test in these two cases (6D, 20 tuples; 4D, 40 tuples) (both p > .05). Finally, further increase in either dimension or density will make PCP inferior to SCP-common, as demonstrated by pairwise T-test on the rest of the 5 conditions. (8D, 20 tuples; 6D, 30 tuples; 8D, 30 tuples; 6D, 40 tuples; 8D, 40 tuples) (all p < .05).

DISCUSSION

Experiments 1 and 2 revealed the following relationships between the four techniques:

  1. SCP-rotated and SCP-staircase yielded poor performanceand users found them difficult to use. This seems to be because tilting the axes 45makes the task more difficult, and requiring users to remember the value from axis to axis also increases difficulty.
  2. PCP and SCP-common performed better and were preferred by participants. However, these two techniques seem suited for different scenarios: PCP is better at low dimensionality and low density, and SCP-common is better when these are higher.
  3. The performance of PCP is dependent on dimensionality,while the performance of SCP-common seems roughly independent of dimensionality.
  4. Increasing density affects the performance of PCP morethan it affects SCP-common.

We now offer theoretical explanations of the observed differences between PCP and SCP, partly to guide future design of visualization techniques.

User strategy for value retrieval

To inspect the values of a data tuple across multiple dimensions, one needs to trace it from one cell to another. There are different hypothetical strategies which users may use (Figure 8):

  • “Remember-value”: The user memorizes the position or numeric value along the axis common to the two cells, which is invariant to the arrangement and alignment of cells.
  • “Count-point”: The user memorizes the ordinal position of the tuple within a local interval, e.g. a tuple can be identified as “the tuple with second largest X2 attribute value, among all those having X2 values between 0 and 10”.
  • “Trace-line”: This strategy can be used with SCPcommon: the user imagines the horizontal line passing through all the points of a tuple, and follows this imaginary line to the point above the target horizontal axis. (Note that SCP-staircase also allows tracing along imaginary lines perpendicular to the shared axes, but this must be repeated for each pair of adjacent scatterplots, rather than done once globally as in SCP-common.)

Actually, users’ choice of strategy may be affected by the visualization technique and the specific instance of the trial.

pcpvsscp fig.8

Figure 8: The possible strategies for users to perform when tracing tuples across dimensions

With SCP-common, when data points are sparse, users may trace along an imaginary line, because “it is easier than estimating and remembering the value on the axis”, given a sparse neighborhood around the point of question. With increased density, however, tracing along an imaginary line surrounded by many distracters can become difficult. Especially, one participant commented that “without grid lines, the virtual line is quite misleading when there are many neighbors”. In such case, the user may prefer one of the other two strategies, neither of which should be hindered by increase in dimensionality. Indeed, for SCP-common, the experimental results found that dimension has no significant effect on completion time and error distance, for a dataset denser than 20 tuples.

For PCP, the only strategy we can think of is to visually follow the polygonal line representing a tuple. Since this operation takes more time with increased dimensionality, we expect dimension should have an effect on performance with PCP, and this is indeed found in our experimental results.

Clutter problem in SCP and PCP

The most fundamental difference between SCP and PCP is the tuple’s visual representation: points (SCP) versus a polygonal line (PCP) (or, in some variants of PCP, a smooth curve [HvW10]). The tradeoff between a point and a line may explain why performance with PCP is more sensitive to density than SCP. Points are more space-efficient than lines: adding more points introduces less clutter than adding more polygonal lines.

However, when the screen is not cluttered, a line that intersects with the associated axis allows the user to directly read the numerical value, without the need to visually project the data tuple to the axis through imagination. At low density, tracing along visual lines (as in PCP) may be easier than tracing along an imaginary line or memorizing positions or numeric values (as in SCP). Therefore, tracing tuples across dimensions will be easier with PCP as compared to SCP when the screen is not cluttered. Figure 9 shows the examples of equal-density dataset with both PCP and SCP, in which it can be seen that, when the number of tuples increases from 10 to 40, the increased difficulty of task for PCP is clearly greater than for SCP.

pcpvsscp fig.9

Figure 9: Examples of PCP and SCP with 10 tuples (top) and 40 tuples (bottom). It is apparent that clutter increases faster in PCP.

Guideline for user

Based on the experimental results, we have come up with a table that can guide users in choosing which visualization technique to use for value retrieving task in respect to dimensionalities and densities.

Figure 10 shows the recommendation of techniques for value retrieval under varied dimensionality and densities. Cells with “PCP” or “SCP-common” means PCP or SCP-common has significant better performance. Cells with “∼” means PCP and SCP-common have comparable performance. And, cells with question mark is the condition which we have not covered in this study.

It can be seen that, PCP is recommended for cells in top left corner, which represent multivariate data with lower dimensionalities and densities. The cells in the bottom right corner represent multivariate data with high dimensionality and densities, and SCP-common is preferred. Cells on the diagonal line can use either of the two approaches, which offers users a choice depending on other considerations.

LIMITATION AND FUTURE WORK

For practical reasons of experimental design, the testing conditions in our study only involved datasets with relatively low dimensions and density. In practice, visual analysts often face datasets presented with much higher on-screen density and dimensions. Future studies may want to further validate our experimental results with such scenarios. In addition, many possible PCP and SCP variants have been proposed in the literature in which our study has only investigated a few. Future studies can involve other interesting variants, such as the Radar plot [CCKT83], to further our investigation. Lastly, PCP and SCP are only two of the vast number of visualization techniques presented in the literature. The value retrieval task is also one of the many visual analytical tasks. The InfoVis research community has a long way to go before being able to fully understand the design tradeoffs of the different visualization techniques in different tasks.

pcpvsscp fig.10

Figure 10: The recommendation of techniques based on experiment results. The dotted red rectangle highlights the conditions in experiment 1; the solid green rectangle highlights the conditions in experiment 2.

CONCLUSION

In this paper, two controlled experiments compared user performance in value retrieval tasks between four visualization techniques: three SCP variants (SCP-common, SCP-rotated, and SCP-staircase) and the baseline PCP, while varying dimensionality and data density. Results indicate PCP and SCP-common outperform the other two techniques. Furthermore, PCP shows advantages in low dimensionality and low density dataset, while SCP-common outperforms PCP in higher dimensionality and density dataset. We also proposed a guideline for choosing a technique based on the dataset properties. This is the first study we know of that empirically compares PCP and SCP for this task, and also the first study that has found an advantage for PCP over SCP for any task in any conditions. We believe the experimental results, the analysis and reasoning we formulated on the observed phenomena, and the proposed guideline of usage can be valuable for both researchers and practitioners to better understand and utilize PCP and SCP for more effective information visualization.

ACKNOWLEDGEMENTS

This research is supported by the National University of Singapore Academic Research Fund R-252-000-375-133 and by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office.

REFERENCES

  1. [AdOL04] ARTERO A., DE OLIVEIRA M., LEVKOWITZ H.: Uncovering clusters in crowded parallel coordinates visualizations. In Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on (0-0 2004), pp. 81 –88.
  2. [AES05] AMAR R., EAGAN J., STASKO J.: Low-level components of analytic activity in information visualization. In Proceedings of IEEE Symposium on Information Visualization (InfoVis) (2005), pp. 111–117.
  3. [AR11] AZHAR S., RISSANEN M.: Evaluation of parallel coordinates for interactive alarm filtering. In Information Visualisation (IV), 2011 15th International Conference on (july 2011), pp. 102 –109.
  4. [CCKT83] CHAMBERS J. M., CLEVELAND W. S., KLEINER B., TUKEY P. A.: Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole Publishing Company. New Jersey., 1983.
  5. [CvW11] CLAESSEN J. H. T., VAN WIJK. J. J.: Flexible linked axes for multivariate data visualization. IEEE Transactions on Visualization and Computer Graphics (TVCG) 17, 12 (2011), 2310–2316.
  6. [Har75] HARTIGAN J. A.: Printer graphics for clustering. Journal of Statistical Computation and Simulation 4, 3 (1975), 187–213.
  7. [HLKW12] HEINRICH J., LUO Y., KIRKPATRICK A.           E., WEISKOPF D.: Evaluation of a bundling technique for parallel coordinates. In Proceedings of International Conference on Information Visualization Theory and Applications (2012).
  8. [HvW10] HOLTEN D., VAN WIJK J. J.: Evaluation of cluster identification performance for different PCP variants. In Proceedings of Eurographics/IEEE-VGTC Symposium on Visualization (EuroVis) (2010).
  9. [Ins85] INSELBERG A.: The plane with parallel coordinates. Visual Computer 1 (1985), 69–91.
  10. [KD09] KINCAID R., DEJGAARD K.: Massvis: Visual analysis of protein complexes using mass spectrometry. In Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on (oct. 2009), pp. 163 –170.
  11. [LMvW10] LI J., MARTENS J.-B., VAN WIJK J. J.: Judging correlation from scatterplots and parallel coordinate plots. Information Visualization 9 (2010), 13–30.
  12. [QCX07] QU H., CHAN W.-Y., XU A., CHUNG K.-L., LAU K.-H., GUO P.: Visual analysis of the air pollution problem in Hong Kong. IEEE Transactions on Visualization and Computer Graphics (TVCG) 13, 6 (2007), 1408–1415.
  13. [Shn96] SHNEIDERMAN B.: The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of IEEE Symposium on Visual Languages (VL) (1996), pp. 336–343.
  14. [SS05] SEO J., SHNEIDERMAN B.: A knowledge integration framework for information visualization. In From Integrated Publication and Information Systems to Information and Knowledge Environments, Hemmje M., Niederée C., Risse T., (Eds.), vol. 3379 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2005, pp. 207–220.
  15. [TGS04] TYMAN J., GRUETZMACHER G., STASKO J.: Infovisexplorer. In Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on (oct. 2004), p. r7.
  16. [The00] THEISEL H.: Higher order parallel coordinates. In Proc. Vision, Modeling and Visualization (VMV) (SaarbrÃijcken, 2000), Girod B., Greiner G., Niemann H., (Ed.) H.-P. S., (Eds.), pp. 119–125.
  17. [VMCJ10] VIAU C., MCGUFFIN M. J., CHIRICOTA Y., JURISICA I.: The FlowVizMenu and parallel scatterplot matrix: Hybrid multidimensional visualizations for network exploration. IEEE Transactions on Visualization and Computer Graphics (TVCG) 16, 6 (2010), 1100–1108.
  18. [WB97] WONG P. C., BERGERON R. D.: 30 years of multidimensional multivariate visualization, 1997. Chapter 1 (pp. 3– 33) of Gregory M. Nielson, Hans Hagen, and Heinrich Müller, editors, Scientific Visualization: Overviews, Methodologies, and Techniques, IEEE Computer Society.
  19. [Weg90] WEGMAN E. J.: Hyperdimensional data analysis using parallel coordinates. J. of the American Statistical Association 85, 411 (1990), 664–675.
  20. [YGX09] YUAN X., GUO P., XIAO H., ZHOU H., QU H.: Scattering points in parallel coordinates. IEEE Transactions on Visualization and Computer Graphics (TVCG) 15, 6 (2009), 1001– 1008.

Elastic Hierarchies: Combining Treemaps and Node-Link Diagrams

Shengdong Zhao, Michael J. McGuffin, Mark H. Chignell

 

Video

 

Paper and Presentations

 

image

ABSTRACT

We investigate the use of elastic hierarchies for representing trees, where a single graphical depiction uses a hybrid mixture, or “interleaving”, of more basic forms at different nodes of the tree. In particular, we explore combinations of node-link and Treemap forms, to combine the space-efficiency of Treemaps with the structural clarity of node-link diagrams. A taxonomy is developed to characterize the design space of such hybrid combinations. A software prototype is described, which we used to explore various techniques for visualizing, browsing and interacting with elastic hierarchies, such as side-by-side overview and detail views, highlighting and rubber banding across views, visualization of multiple foci, and smooth animations across transitions. The paper concludes with a discussion of the characteristics of elastic hierarchies and suggestions for research on their properties and uses.

CR Categories and Subject Descriptors: I.3.6 [Computer Graphics]: Methodology and Techniques–interaction techniques; E.1 [Data Structures]: trees

Additional Keywords: Elastic Hierarchies, Treemaps, nodelink diagrams, hybrids, combinations, overview+detail, multiple views, trees, interaction techniques, interactive visualization

INTRODUCTION

Trees are a fundamental organizing structure, with menu hierarchies and file directories being prominent examples of their use. The data size of a tree typically grows exponentially with its depth, which raises many challenges for visualization. Showing the structure is space consuming, and the exponential growth in the number of nodes from the root to the leaves creates difficulties for laying out the items of large trees effectively in a given space.

Many tree representations have been proposed in the past. Various styles have unique visual and interactive properties that may be useful in different scenarios, but they also have limiting constraints, creating tradeoffs in their use. For example, the classical node-link diagram [15] is probably the most natural way to display nesting structure, but fails to scale to large datasets. In contrast, Treemaps [17] are space efficient, scaling up to thousands of nodes, but at the cost of making the different levels within the tree harder to perceive and distinguish.

Trees are often complex and can have very different local properties across nodes. In addition, trees are often dynamic, making a single style of representation harder to adjust to variations over time. In this paper we explore the concept of allowing different styles of representation at different places in a tree. The resulting hybrid may allow designers to combine the best features of different representations, enabling a user to view each part of the data in the most effective way. However, hybrids may carry the disadvantage of being less uniform and less familiar to users, making it all the more important to use good visual design. Our research investigates the properties and affordances of such mixed-representation trees, or elastic hierarchies, as a first step toward determining when and how to use hybrid tree representations. “Elastic” refers to the flexibility allowed by arbitrarily interleaving representations (right image, Figure 1, Figure 4). I.e., we allow the representation portraying nesting at each point to be chosen independently of the representation choices made at other points in the tree.

Elastic hierarchies, as described below, are a means of exploring the large design space of 2D hierarchical visualization. As multi-representational views where the representational form of each subtree can be modified on the fly by the user, they allow researchers to explore how many representations can usefully be shown within a tree, and what forms of transition should be used between different types of representation. Studies of how people use elastic hierarchies can help researchers determine what tools and interaction techniques should be provided to assist users in transitioning from one representation to another in viewing a large hierarchy.

In this paper, we describe the design space of elastic hierarchies made up of interleaved Treemaps and node-link diagrams, and discuss ways in which such hybrids may be more suitable than traditional visualizations. We describe a prototype that implements our more promising design ideas, including support for visualizing multiple foci. We also identify a set of research questions concerning elastic hierarchies to be addressed in future studies.

elastichi fig.2

Figure 2: common tree representations, each showing the same tree in a different way. A: node-link. B: nested containment, or nested enclosure. C: use of alignment and adjacency. D: indented outline style. (Note that D is not simply a variation on A. In D, the edges are implied by the positioning of the nodes, which is not generally the case in A.)

BACKGROUND

The most common representational form for a tree is the nodelink diagram [15]. While node-link diagrams show nesting structure very clearly, they use screen space inefficiently, and do not scale well to large datasets. As a result many approaches have been proposed to supplement node-link diagrams. Well known alternatives include Treemaps [17], cone trees [16], and the hyperbolic browser [11].

Since many visualization methods have been proposed, it is useful to organize these into categories. Besides node-link diagrams (Figure 2 A), another major style involves nested containment or nested enclosure (Figure 2 B). Two other forms use alignment and adjacency (Figure 2 C), and the indented outline style (Figure 2 D). Most other proposals can be viewed as extensions or combinations of these. For example, the hyperbolic browser [11] arranges a node-link diagram (A) radially, with a focus point that can be manipulated interactively by the user. Cone Trees [16] extend A into 3D. Treemaps [17] are an example of B where the area of nodes encodes additional information associated with them. Information slices [2] and Sunburst [19] are variations of C using polar coordinates.

Although hybrid tree forms have not been systematically explored in depth, various hybrid techniques have been used to visualize trees and graphs. Fekete et al. [7] have described a graphical representation for graphs that uses both a Treemap, to represent a spanning tree over the graph, and curved links, to represent the remaining edges. In addition, Harel [8] and Sindre et al. [18] have described representations for graphs that are richer than simple node-link diagrams, and that can encode additional information by using various graphical conventions and symbols. In their work, different types of relationships between nodes are shown simultaneously using different conventions, such as connection (Figure 2 A) and containment (Figure 2 B). By contrast, in elastic hierarchies, only one type of relationship needs to be shown, that between parent and children nodes, but we allow multiple conventions to be used for this, e.g., connection and containment, creating a range of choices in the graphical layout, and allowing the user to pick the one desired.

Various hybrid representations of trees have been implemented in 3D. Information pyramids [1] combine nested containment (Figure 2 B) in 2D with adjacency (Figure 2 C) along a 3rd dimension, by stacking nodes in a layered fashion. Balzer and Deussen [3] present a visualization which uses two styles: nested enclosure and linked nodes are shown simultaneously to represent the same tree. In these hybrids, the combination of representations is fixed, and cannot be independently changed at different locations of the tree.

Given all these variations in tree representations, and research that identifies their pros and cons, a single representation style, or even a fixed hybrid form, may not accommodate the needs of complex and dynamic real world problems. Our work investigates improving tree representations using dynamically adjustable hybrids, i.e. elastic hierarchies, and focuses on the case of combining Treemaps and node-link diagrams in a single display.

DESIGN SPACE

In theory, hybrids between any of the forms in Figure 2 might be feasible; however, we note that nested containment (Figure 2 B) differs from the 3 others in that child nodes are inside parent nodes, whereas they are outside in the other forms. Furthermore, node-link diagrams (Figure 2 A) and nested containment are perhaps the most contrasting forms. Node-link diagrams show topological structure well, while Treemaps are space efficient. Treemaps tend to allot more screen space to shallow nodes, making them easier to see, independently of how deep the subtree under a node may be. Treemaps also allow for visual comparison of the relative sizes of nodes. Node-link diagrams, however, are more familiar to users, and perhaps better at showing the different levels and depth of a tree.

Due to their complementary properties, Treemaps and nodelink diagrams are especially suited for hybrid combinations that access a continuum of intermediate forms. These two styles can be mixed in a straightforward manner without introducing ambiguity, as will be shown.

elastichi fig.3

Figure 3: a taxonomy of graphical representations of the relationship between a node x and its child y. The neighbourhood of nodes above x, and the neighbourhood of nodes below y, can each by drawn in node-link (NL) or Treemap (TM) style. 

elastichi fig.4

Figure 4: here, the same tree is depicted 6 different ways (illustration): in A, with a traditional node-link diagram, in F, with a Treemap, and in B-E, with mixed, hybrid representations.

Figure 3 presents a taxonomy that can be used to generate and explain different methods for graphically combining Treemap and node-link styles of representation within a visualization of a hierarchy. The structure of an elastic hierarchy can be catalogued in terms of the types of transition that occur between different representational styles within the hierarchy.

While child nodes in a node-link diagram are always “outside” the parent node, this is not true for Treemaps, where a child node may be represented as either inside or outside a node (e.g., in Figure 4 C and D show Treemaps where parent nodes are linked to child nodes that are placed outside the parent, whereas Figure 4 E and F show child nodes that are nested inside the parent node.)

Based on the above description, six possible transitions can occur between node-link and Treemap styles as documented in Figure 3. The two rightmost columns of Figure 3 distinguish conditions where node-link style is used below some node y (left column), from conditions where a Treemap is used below the y node (right column). The three rows in the body of the table refer to the parent node x that is immediately above node y in the tree. The first row shows the case where the portion of the tree above node x is shown in node-link form. Thus, in the left column of the first row the entire subtree of interest is shown in node-link form (with node-link style used above node x and below node y), whereas in the corresponding column on the right a Treemap is shown nested underneath the node-link subtree that is above node x, leading to a mixed representation with the node-link style above a Treemap. The remaining two rows of the table show cases where a Treemap is used above node x. The middle row shows the situation where node y is shown as being outside node x, while the final row shows node y as being nested inside node x.

In both Figure 3 and Figure 4, panels A and F illustrate pure node-link diagram and Treemap styles, respectively. The remaining four transitional forms (BE) use hybrids, and each of them will now be discussed.

elastichi fig.5

Figure 5: example of Treemap outside of node-link (screenshot). The node-link parent “postgres” has three subtrees (directories): “data”, “include”, and “share”, and each has many children laid out using the Treemap algorithm.

Treemap outside of Node-Link

In this first transitioning form (Figure 4 B, Figure 5) a subtree in a node-link diagram is shown as a Treemap. Since node-link diagrams of large hierarchies typically become more crowded at lower levels, the space saving properties of Treemaps allow more of the hierarchy to be shown within a given space. In contrast to a branch that might have hundreds of nodes that cannot be shown on the screen using nodes and links if fully expanded, a Treemap of those same nodes could be much more compact.

If the topology of the tree is of primary interest, this hybrid technique can show as many of the higher levels as convenient in node-link form. When the node-link style becomes too dense, Treemaps may be used to represent the deeper subtrees, making more information visible.

Small Treemaps can act as previews or thumbnails of the subtrees they contain, but with additional useful properties. First, they are not only previews but also overviews, containing information on the entire subtree, rather than just the first few nodes. Second, they are not just views, but also fully functional sub-hierarchies that can be operated on directly using a rich set of interaction techniques.

elastichi fig.6

Figure 6: example of drilling down (screenshot). A combination of the Treemap outside of Treemap, and node-link outside of Treemap techniques is used to show deeper nodes. 

Treemap/Node-Link outside of Treemap

In Figure 4 C and D, subtrees are “pulled out” of the Treemap, and shown as either node-link diagrams or additional Treemaps. This process might become confusing if used with a large number of nodes in a Treemap, but when used with a small number of nodes it may be well suited for focusing on particular subtrees, while retaining surrounding context in a space-efficient manner. This could function as an effective technique for drilling down. For example, in Figure 6, following the links from right to left, the leaf nodes “Lindeman” and “Perth” are node-link children of the Treemap node “Australia”, which is itself a child of the Treemap node “timezone”. The “timezone” node is a descendant of the “share” directory, and that node is a child of the node-link parent labeled “postgres” (an example of scheme B). Each of the Treemap nodes in the figure has numerous children that cannot fit inside the screen when fully expanded. By combining C and D, we are able to display all the relevant subtrees along the path within the screen. Note that because the edges connecting subtrees to the parent Treemap are overlaid on the Treemap, this results in mild occlusion which may be inconvenient if many subtrees are pulled out.

Having multiple foci in a hierarchy raises challenges for navigation and display. Points of interest could reside at distant locations within the hierarchy (or, in the case of a Treemap, appear very tiny), causing difficulties in showing them simultaneously on the same screen. Using techniques C and D, multiple foci can likely be shown more effectively than if viewed using non-hybrid representations (Figures 8 and 11).

elastichi fig.7

Figure 7: a design sketch showing a node-link diagram inside a Treemap. The subtree within the Treemap on the left is transformed into a node-link diagram on the right. The parent Treemap node adjusts its internal boundaries accordingly.

 

Node-Link inside Treemap

In this fourth kind of transition between styles (Figure 4 E), a node-link diagram is nested inside an enclosing Treemap. The Treemap acts as a kind of overview, and local features are presented using the node-link diagram. Globally, a space saving representation is used; while a standard node-link diagram is used locally. This can be thought of as a kind of detail-in-context technique with different representations used for the context and detail.

This scheme can be enhanced using an interaction technique where a portion of the Treemap is enlarged (similar to elastic windows [9]), either interactively or automatically, when transformed into a node-link diagram by the user (Figure 7). However, as different portions of the Treemap are expanded, this would change the sizes of subtrees and nodes within the Treemap, distorting substructures within the tree, which may hinder the formation and use of perceptual landmarks. Alternatively, one could limit the amount of distortion/expansion allowed, in which case the amount of space available to show embedded node-link diagrams would also be limited, thereby reducing the scalability of the method. Thus there may be a tradeoff between flexibility of presentation using distortion/expansion, and ease of perceptual orientation and landmark recognition and retention. It is possible that individual differences in perceptual and cognitive ability and preferences may determine how this tradeoff can best be handled for different users.

elastichi fig.8

Figure 8: elastic hierarchy techniques allow users to explore and drill-down multiple branches of large trees and still fit much contextual information within a limited screen space (screenshot). 

 

Combining the different techniques

Finally, these techniques may be used together and combined to create useful visual presentations. Figure 6 (described in section 3.2) and Figure 8 show combinations of B, C, and D. Likewise, B and E can also be combined. For example, in Figure 7, if the nodes labeled “16723” and “16724” each had many descendants, they could be replaced with two small Treemaps. There are many possibilities for such combinations from the four basic elastic hierarchical hybrid techniques BE. Empirical research is needed to determine when different combinations of the basic techniques are useful.

 

CHARACTERISTICS OF ELASTIC HIERARCHIES

As already explained, elastic hierarchies are graphical depictions of trees containing multiple representation forms interleaved at different nodes, and that can be modified dynamically and interactively.

Ideally, compatible animation and interaction techniques should be used to facilitate exploration and interpretation of the hybrid structure in an elastic hierarchy. Elasticity can be exploited during interaction, as shown in the following examples. First, users may view different portions of a tree with different representations, and may transition between the different representations at will. This enables the fine-grained differentiation of nodes. For example, children of the same parent may be presented in different styles, with transitional animations being provided to guide or facilitate the transformation. Second, the size of a node (whether of Treemap, or node-link, or other style) could be resized to reflect user interest or structure priorities. Third, various other interaction techniques (coloring, brushing, etc.) could be used to support the perceptual and cognitive operations associated with manipulating and using the tree.

The main advantage of elastic hierarchies is how they allow flexible arrangement and display of contents and structures within a large tree. This is particularly useful for visualizing multiple foci. Typically, display of multiple foci in large trees using conventional methods creates challenges for navigation and display. Points of interest may reside at distant locations within the tree (or, in the case of a Treemap, appear very tiny), causing difficulties in showing them simultaneously on the same screen. This difficulty can be addressed using elastic hierarchies, where multiple foci can be highlighted within a single view (Figures 8 and 11).

In other words, elastic hierarchies generalize how nodes in a tree can be “collapsed”. Conventional tree representations often only allow any subtree to be collapsed entirely, to “roll up” deeper nodes and save space. However, in an elastic hierarchy, any connected subgraph of the tree, such as intermediate levels between shallow and deep nodes, can be collapsed into a Treemap. Given a connected subgraph S of a tree, and the shallowest node N in S, we can display the subtree under N as a Treemap, and “pull out” of the Treemap any nodes under N that are not in S. Thus, distant branches under S can be shown pulled out and side-by-side, with the Treemap of S providing a compact overview of the context above the branches.

In order to experiment with elastic hierarchies, and investigate appropriate interaction and animation techniques, we created an interactive prototype that allows rapid transitioning between the different representational forms.

IMPLEMENTATION

There are many ways in which elastic hierarchies can be constructed and used. Our current prototype implements the transitional forms in Figure 4 B, C, and D, and allows them to be mixed and used together. Ideally, we would eventually like to support all of the possible schemes and allow arbitrary mixing of them, however the current implementation supports the most important possibilities and enables investigation of a large part of the design space.

Figures 5-12 show screenshots of various aspects of the working prototype, except for Figures 7 and 11 which are mocked-up design sketches. The prototype divides the screen vertically into two windows (Figures 9, 11, 12), with an overview on the left and a detail view on the right, to support overview+detail visualization, as discussed in more detail later. The overview shows a Treemap of the entire tree, and the detail view shows an elastic hierarchy of the same tree in which the user may zoom and pan.

Having two side-by-side views of the trees not only enables investigation of overview+detail visualization techniques [10, 12, 13], but also serves the following second purpose. Just as a single elastic hierarchy may help a user learn an unfamiliar tree style (e.g. Treemaps) by interactively transitioning between the unfamiliar style and a more familiar one (e.g. node-link), it may also be true that showing two views side-by-side, i.e. an elastic hierarchy in tandem with a non-elastic view (in this case, the Treemap on the left), could help reinforce a correct mental model in the user of the elastic hierarchy’s meaning.

Platform & Code

The prototype was developed in Java 1.4.2 using the Piccolo Toolkit from the University of Maryland. Piccolo was chosen for its built-in support for zooming, panning, and certain types of animations. For laying out the Treemaps, we used a variant of the Strip Treemap algorithm in the open source Java library written by

Ben Bederson and Martin Wattenberg [6]. We chose this algorithm because it preserves the ordering of the nodes, has better readability than the ordered Treemap algorithm, and has a reasonable running speed. The algorithm used for laying out node-link diagrams in 2D is similar to Walker’s algorithm [20] and that used in SpaceTree [14]. The prototype can read in a tree described in a file, or can read in the tree structure of a hard drive’s file directories (screenshots in this paper mostly show the directory structure of Postgres, a database consisting of almost 1000 files and folders installed on a hard disk).

elastichi fig.9

Figure 9: screenshot of the initial state of the prototype after reading in the Postgres dataset. The left window contains a Treemap overview of the tree. The right window shows an elastic hierarchy view of the tree. A search box is at the bottom. 

 

Data Structure & Algorithms

Internally, the tree is stored in a data structure where each node can take on one of various graphical states corresponding to different representations, allowing the Treemap and node-link styles to be intermixed. Although our internal data structure allows for all the hybrids sketched in Figure 4, the supporting code so far only implements the schemes in Figure 4 A, B, C, D, and F, which are the most promising of the transitional forms.

Two complementary approaches are available for choosing representational forms and generating a layout of an elastic hierarchy: (1) automatic methods, which might use heuristics based on local attributes of a tree such as branching factor or depth, and (2) manual interaction, where the user explicitly specifies the representation to use for each node. We tried to strike a balance between these two approaches in the prototype, whereby the software makes a best first guess as to which style to use for each node, and the user may subsequently adjust specific nodes as needed.

When our prototype reads in a tree, it computes a layout that fills the available screen space with a hybrid of the type in Figure 4 B. Our intent is to give the user an initial view of the tree that uses the node-link style as much as possible, without excessive crowding, and to use Treemaps to show large subtrees wherever necessary.

Thus, the primary goal of the algorithm is to first maximize the number of higher-level nodes shown in the node-link style.  To do this, the algorithm performs a greedy breadth-first traversal of the tree, setting the node type of each node encountered to be nodelink, and stopping and/or backtracking when there is no room to continue.

Next, the second goal of the algorithm is to maximize the remaining area allotted to Treemaps used to show lower levels of the tree. To do this, the system first determines the bounding boxes of the Treemaps using a heuristic based on the size and depth of its subtree. The system recalculates the overall space needed, and resizes the Treemaps so that the entire structure is scaled to fit within the screen. In order to guarantee that the detail view has the same visual pattern as the overview, any Treemap appearing in both views is given the same aspect ratio in both, creating visual consistency across the views (Figure 9).

elastichi fig.10

Figure 10: a sequence of user interaction allowed in the prototype (screenshots): (1) initiate resizing; (2) node resized, tabs appeared beside the labels; (3) and (4), click on the “4>” tab to show the labels of layers for the Treemap node; (5) layer 1 is selected; (6) double click to expand a subtree from layer 1.

User Interaction

The prototype elastic hierarchy system is designed for easy switching between Treemap and node-link views at different points in the tree. Users can right click on the label of a particular node to bring up a popup menu to change the form (Treemap or node-link) of the node. Users can also resize a Treemap node by dragging the bottom left corner of the node (Figure 10, 1-2), after which the tree layout is automatically adjusted. This allows users to examine in more detail the content of a Treemap, and select nodes with in it more easily.

Since the internal (i.e. non-leaf) nodes in a Treemap are graphically covered almost entirely by descendants, these can be difficult to select, even with borders and margins around nodes. Thus, a special selection technique was implemented for unambiguously selecting nodes of Treemaps. A set of tabs is displayed above each Treemap node (Figure 10, 3-4). The user may click on a tab to select the level at which they wish to select a node. Then, the user can rollover the nodes of the Treemap, and see the nodes at the chosen level highlighted (Figure 10, 5), with descendant nodes partially faded.  The user can then double click to transform a node (Figure 10, 6) into a node-link subtree.

Additional Features

Elastic hierarchies can incorporate Treemaps at any point, allowing hybrid representations to scale better to large numbers of nodes than pure node-link diagrams.  Ultimately, however, any incremental improvement in scalability will still fail given a sufficiently large data set. Thus we built our prototype to support an overview+detail view of the elastic hierarchy.  A pure Treemap is shown in an overview window, and the hybrid Treemap/nodelink diagram is shown in a detail window (Figure 9). A Treemap was chosen for the overview rather than a pure node-link diagram due to its space efficiency. The vertical division between the overview and detail windows in our prototype can be dragged to resize windows, accommodating different data sets and changes in the user’s focus of attention.

Both elastic hierarchies and overview+detail visualization lend themselves naturally to viewing multiple foci. Although not yet implemented in our prototype, Figure 11 shows a sketch of how the user might select multiple foci in the Treemap overview, and in response, the detail view adjusts its layout to show the three foci simultaneously with rubber bands linking between the windows.

elastichi fig.11

Figure 11: a design sketch of multiple foci in the tree. The foci are highlighted in red in both the overview (Treemap representation on the left), and the detail view (hybrid representation on the right). The hybrid representation allows us to provide much contextual information for the three foci.

The hybrid mixing of multiple representations in elastic hierarchies is unfamiliar to users. This, combined with the fact that elastic hierarchies change form on demand, motivates the need for mechanisms that help the user understand the correspondence between nodes during changes of representation, and across the two views involved in our overview+detail scheme.

To investigate this, our prototype implements multiple strategies. First, we have experimented with using smooth animations to display transitions between representation styles, to help the user maintain context, as has been discussed elsewhere [4, 5, 21]. Second, whenever a Treemap is visible in the detail view, its aspect ratio is constrained to match the aspect ratio of the corresponding (sub)Treemap in the overview window, to make it easier for the user to visually scan for corresponding subtrees. This aspect ratio is maintained even during resizing of Treemaps by the user. Third, rubber bands (Figures 9 and 11) are drawn that connect selected nodes in the overview and detail view. Fourth, because the representation in the overview is persistently a Treemap, the user may change a subtree in the detail view from Treemap style to node-link style, and still see the Treemap form of the subtree in the overview.

elastichi fig.12

Figure 12:  node searching in the prototype (screenshot). The key word is typed in the search box at the bottom. Matches are shown in orange in both the Treemap overview on the left and the detail view on the right.

Thus, the user has a choice of either (a) looking at different representations simultaneously of the same nodes, each in different windows, or (b) toggling in-place between representations of nodes that are shown within a single, integrated view (i.e. the detail view).

 

DISCUSSION AND CONCLUSION

Elastic hierarchies have many valuable properties. They may be useful in dealing with a wider range of information. Since our hybrid tree representations can use compact Treemaps for various portions of the tree, it is possible to introduce space-efficiency as needed. Thus different nesting strategies within elastic hierarchies create a continuum of space efficiency between node-link diagrams at one end of the continuum and Treemaps at the other. A side-benefit of elastic hierarchies is that, by allowing the user to toggle between traditional node-link diagrams and Treemaps at any point in the tree, this may help users to better learn and understand Treemaps (which are themselves unfamiliar to most users) by seeing how they relate to familiar node-link diagrams.

Elastic hierarchies can be combined with other visualization strategies, such as the use of multiple views, or focus+context approaches. While this creates a large design space, it is one where there may be a number of useful “sweet spots” that will only be discovered after there is an opportunity to examine the properties of the space with prototyping and user studies. The provision of multiple representations both within a tree (i.e., using an elastic hierarchy) and between multiple views may help users learn and understand the content and structure of hierarchies better. The ability to see the same tree rendered in different ways, and to see the correspondence between elements in the different views, may encourage the development of more accurate mental models of information structure.

There is a large design space of possible elastic hierarchy implementations of which this paper has considered a small portion. Systematic research is needed to reinforce and challenge the design intuitions that underpin this form of hierarchy visualization. One obvious research issue which overlaps with both the present work and earlier research on hierarchy visualization is the relationship between the number of nodes in a hierarchy and which representational forms should be used at different levels of the hierarchy. Other research questions (a selection from many that could be posed) that may be worth pursuing with reference to the design and use of elastic hierarchies include:

  • How and when should users be able to choose which representations to use (vs. having layouts assigned automatically)?
  • What elastic hierarchies using representations other than Treemap and node-link diagrams could be designed?
  • What types of design cues and strategies can be used to facilitate the formation and use of cognitive/perceptual landmarks within large hierarchies?
  • How can smooth animation facilitate the use of multiple views (e.g., overviews, “you-are-here” diagrams, detail views, etc.)?
  • How should elastic hierarchies be personalized for different types of user?

One possible future direction for elastic hierarchy prototypes would be to generalize the roles played by the overview and detail views, and to link multiple views with various types of visualization and animation to highlight correspondences.

In our exploratory prototype, the local file system is chosen as the content for the elastic hierarchies. A next step could be using elastic hierarchies for other kinds of real world data, and further investigating their characteristics under these domains.

In conclusion, the research reported in this paper investigated the use of elastic hierarchy representations for trees. A design space for elastic hierarchies was characterized where a single graphical depiction uses a mixture of Treemap and node-link views at different levels of the tree. A prototype was created to demonstrate and explore related design features and to illustrate some of the properties of elastic hierarchies. Empirical research is now needed to determine if, when, and how, elastic hierarchies can be used.

ACKNOWLEDGMENTS

Thanks to Ben Bederson and Ben Shneiderman for some detailed feedback and pointers to related work.  Thanks to Amy Zhu, Maneesh Agrawala, Joe Laszlo, Jim Chengming Cai, and Noah Lockwood for support and feedback. Thanks to Ravin Balakrishnan and other members of the DGP lab at University of Toronto for feedback during an early stage of this work. Thanks to John Hancock for technical help with making the video. Thanks to University of Maryland’s HCIL for making Piccolo, SpaceTree, and Treemap demos, code, and datasets available to the public. Thanks also to the anonymous reviewers for their constructive feedback and suggestions.

REFERENCES

[1]    Andrews, K., Visual exploration of large hierarchies with information pyramids. Proceedings of Sixth International Conference on Information Visualisation. 2002: IEEE Computer Society Press. 793-798.

[2]    Andrews, K. and Heidegger, H., Information slices: Visualising and exploring large hierarchies using cascading, semi-circular discs. Proceedings of IEEE Symposium on Information Visualization, Late Breaking Hot Topics. 1998. 9-12.

[3]    Balzer, M. and Deussen, O., Hierarchy based 3D visualization of large software structures. Proceedings of the Conference on Visualization Posters Compendium. 2004: IEEE Computer Society. 81-82.

[4]    Bartram, L., Can motion increase user interface bandwidth? Proceedings of IEEE Conference on Systems, Man, and Cybernetics. 1997. 1686-1692.

[5]    Bartram, L., Perceptual and interpretative properties of motion for information visualization. Proceedings of the 1997 Workshop on New Paradigms in Information Visualization and Manipulation. 1997, Las Vegas, Nevada, United States: ACM Press. 3-7.

[6]    Bederson, B. and Wattenberg, M., Treemap Java Algorithms. http://www.cs.umd.edu/hcil/treemap-history/Treemaps-JavaAlgorithms.zip.

[7]    Fekete, J.-D., Wang, D., Dang, N., Aris, A., and Plaisant, C., Overlaying Graph Links on Treemaps. Proceedings of IEEE Information Visualization Symposium Posters Compendium. 2003: IEEE Computer Society.

[8]    Harel, D., On visual formalisms. Communications of the ACM, 1988. 31(5): p. 514-530.

[9]    Kandogan, E. and Shneiderman, B., Elastic Windows: a hierarchical multi-window World-Wide Web browser. Proceedings of the 10th annual ACM symposium on User interface software and technology. 1997, Banff, Alberta, Canada: ACM Press. 169-177.

[10]  Kumar, H.P., Plaisant, C., and Shneiderman, B., Browsing hierarchical data with multi-level dynamic queries and pruning. International Journal of Human-Computer Studies, 1997. 46(1): p. 103-124.

[11]  Lamping, J., Rao, R., and Pirolli, P., A focus+context technique based on hyperbolic geometry for visualizing large hierarchies. Proceedings of ACM Conference on Human Factors in Computing Systems. 1995. 401-408.

[12]  Mukherjea, S., Foley, J.D., and Hudson, S., Visualizing complex hypermedia networks through multiple hierarchical views. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1995, Denver, Colorado, United States: ACM Press/Addison-Wesley Publishing Co. 331-337.

[13]  North, C., A user interface for coordinating visualizations based on relational schemata: snap-together visualization, Ph.D. Thesis, Computer Science Department, University of Maryland, 2000.

[14]  Plaisant, C., Grosjean, J., and Bederson, B.B., SpaceTree: Supporting exploration in large node link tree, design evolution and empirical evaluation. Proceedings of IEEE Symposium on Information Visualization. 2002. 57-64.

[15]  Reingold, E.M. and Tilford, J.S., Tidier drawings of trees. IEEE Transactions on Software Engineering, 1981. SE-7(2): p. 223-228.

[16]  Robertson, G.G., Mackinlay, J.D., and Card, S.K., Cone trees: Animated 3D visualizations of hierarchical information. Proceedings of ACM Conference on Human Factors in Computing Systems. 1991. 189-194.

[17]  Shneiderman, B., Tree visualization with tree-maps: 2-d spacefilling approach. ACM Transactions on Graphics, 1992. 11(1): p. 92-99.

[18]  Sindre, G., Gulla, B., and Jokstad, H.G., Onion graphs: aesthetics and layout. Proceedings of IEEE Symposium on Visual Languages. 1993. 287-291.

[19]  Stasko, J. and Zhang, E., Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. Proceedings of IEEE Symposium on Information Visualization. 2000. 57-65.

[20]  Walker, J.Q., A node-positioning algorithm for general trees. Software–Practice and Experience, 1990. 20(7): p. 685-705.

[21]  Woods, D.D., Visual momentum: a concept to improve the cognitive coupling of person and computer. International Journal of ManMachine Studies, 1984. 21: p. 229-244.