GeneShelf: A Web-based Visual Interface for Spinal Cord Injury Study

Motivation: A widespread use of high-throughput gene expression analysis techniques enabled the biomedical research community to share a huge body of gene expression datasets in many public databases on the web. However, current gene expression data repositories provide static representations of the data and support limited interactions. This hinders biologists from effectively exploring shared gene expression datasets. Responding to the growing need for better interfaces to improve the utility of the public datasets, we have designed and developed a new web-based visual interface entitled GeneShelf. It builds upon a zoomable grid display to represent two categorical dimensions. It also incorporates an augmented timeline with expandable time points that better shows multiple data values for the focused time point by embedding bar charts. We applied GeneShelf to one of the largest microarray datasets generated to study the progression and recovery process of injuries at the spinal cord of mice and rats. There are also considerations of the analysis methods, and the entire data set was converted into three probe set algorithms (Plier, GC-RMA, and dChip), leading to nearly 10,000 microarray data files. SpinalCordLink can provide researchers with a good resource to interactively investigate one of the world largest microarray datasets.

Support: This work was supported by NIH NINDS-01 (NS-1-2339) and by NIH NCMRR/NINDS 5R24 HD 050846 (Integrated Molecular Core for Rehabilitation Medicine). This work was also supported by the Engineering Research Center of Excellence Program of Korea MEST/KOSEF (R11-2008-007-01002-0) and the Brain Korea 21 Project. The ICT at Seoul National University provided research facilities for this study.

Publications:

GOTreePlus: Interactive GO Visualization for Proteomics Projects

Motivation: We developed an interactive gene ontology visualization tool named GOTreePlus that can superimpose annotation information over gene ontology structures. GOTreePlus can facilitate the identification of important GO terms while visualizing them in the gene ontology structure. The interactive pie chart summary for a selected gene ontology term provides users with a succinct overview of their experimental results.

Support: This work was supported by NIH 5R24HD050846-02 Integrated molecular core for rehabilitation medicine, and NIH 1P30HD40677-01 (MRDDRC Genetics Core).

Publications:

ConSet: Visualizing set concordance with permutation matrices and fan diagrams

Motivation: Scientific problem solving often involves concordance (or discordance) analysis among the result sets from different approaches. For example, different scientific analysis methods with the same samples often lead to different or even conflicting conclusions. To reach a more judicious conclusion, it is crucial to consider different perspectives by checking concordance among those result sets by different methods. In this paper, we present an interactive visualization tool called ConSet, where users can effectively examine relationships among multiple sets at once. ConSet provides an overview using an improved permutation matrix to enable users to easily identify relationships among sets with a large number of elements. Not only do we use a standard Venn diagram, we also introduce a new diagram called Fan diagram that allows users to compare two or three sets without any inconsistencies that may exist in Venn diagrams. A qualitative user study was conducted to evaluate how our tool works in comparison with a traditional set visualization tool based on a Venn diagram. We observed that ConSet enabled users to complete more tasks with fewer errors than the traditional interface did and most users preferred ConSet.

Support: This work was supported by NIH 5R24HD050846-02 Integrated molecular core for rehabilitation medicine, and NIH 1P30HD40677-01 (MRDDRC Genetics Core).

Publications:

Interactive Power Analysis for Microarray Hypothesis Testing and Generation

Motivation: Human clinical projects typically require a priori statistical power analysis. Towards this end, we sought to build a flexible and interactive power analysis tool for microarray studies integrated into our public domain HCE 3.5 software package. We then sought to determine if probe set algorithms or organism type strongly influenced power analysis results.

Availability: HCE 3.5 or later

Support: This work was supported by Department of Defense W81XWH-04-01-0081 and NIH 1P30HD40677-01 (MRDDRC Genetics Core).

Publications:

Interactive Optimization of Signal-to-Noise Ratios for Affymetrix Microarray Projects

Motivation: The most commonly utilized microarrays for mRNA profiling (Affymetrix) include ‘probe sets’ of a series of perfect match and mismatch probes (typically 22 oligonucleotides per probe set). There are an increasing number of reported ‘probe set algorithms’ that differ in their interpretation of a probe set to derive a single normalized ‘signal’ representative of expression of each mRNA. These algorithms are known to differ in accuracy and sensitivity, and optimization has been done using a small set of standardized control microarray data. We hypothesized that different mRNA profiling projects have varying sources and degrees of confounding noise, and that these should alter the choice of a specific probe set algorithm. Also, we hypothesized that use of the Microarray Suite (MAS) 5.0 probe set detection p-value as a weighting function would improve the performance of all probe set algorithms.

Availability: HCE 3.0 or later

Support: This work was supported by N01 NS-1-2339 from the NIH.

Publications:

Knowledge Discovery in High-Dimensional Data: Case Studies and a User Survey for the Rank-by-Feature Framework

Motivation: Knowledge discovery in high-dimensional data is a challenging enterprise, but new visual analytic tools appear to offer users remarkable powers if they are ready to learn new concepts and interfaces. Our three-year effort to develop versions of the Hierarchical Clustering Explorer (HCE) began with building an interactive tool for exploring clustering results. It expanded, based on user needs, to include other potent analytic and visualization tools for multivariate data, especially the rank-by-feature framework. Our own successes using HCE provided some testimonial evidence of its utility, but we felt it necessary to get beyond our subjective impressions. We presents an evaluation of the Hierarchical Clustering Explorer (HCE) using three case studies and an e-mail user survey (n = 57) to focus on skill acquisition with the novel concepts and interface for the rank-by-feature framework. Knowledgeable and motivated users in diverse fields provided multiple perspectives that refined our understanding of strengths and weaknesses. A user survey confirmed the benefits of HCE, but gave less guidance about improvements. Both evaluations suggested improved training methods.

Availability: HCE 3.0 or later

Support: This work was supported by Department of Defense W81XWH-04-01-0081, NIH 1P30HD40677-01 (MRDDRC Genetics Core) and NSF EIA 0129978.

Publications:

A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data

Interactive exploration of multidimensional data sets is challenging because: (1) it is difficult to comprehend patterns in more than three dimensions, and (2) current systems often are a patchwork of graphical and statistical methods leaving many researchers uncertain about how to explore their data in an orderly manner. We offer a set of principles and a novel rank-by-feature framework that could enable users to better understand distributions in one (1D) or two dimensions (2D), and then discover relationships, clusters, gaps, outliers, and other features. Users of our framework can view graphical presentations (histograms, boxplots, and scatterplots), and then choose a feature detection criterion to rank 1D or 2D axis-parallel projections. By combining information visualization techniques (overview, coordination, and dynamic query) with summaries and statistical methods users can systematically examine the most important 1D and 2D axis-parallel projections. We summarize our Graphics, Ranking, and Interaction for Discovery (GRID) principles as: (1) study 1D, study 2D, then find features (2) ranking guides insight, statistics confirm. We implemented the rank-by-feature framework in the Hierarchical Clustering Explorer, but the same data exploration principles could enable users to organize their discovery process so as to produce more thorough analyses and extract deeper insights in any multidimensional data application, such as spreadsheets, statistical packages, or information visualization tools.

Publications:

Hierarchical Clustering Explorer: Interactively Exploring Hierarchical Clustering Results

The Hierarchical Clustering Explorer (HCE) is an interactive knowledge discovery tool for multivariate data, especially of microarray data sets. Its unique visualization interface and powerful analytic tools, based on more than three years of effort, have induced more than 7000 downloads from more than 60 different countries since April 2002. In addition to our genomic research papers with biologist partners and our information visualization publications, we are encouraged that at least six scientific papers from authors unknown to us were published since 2004 that describe using HCE in their analysis.

Publications:

Binary Volume Rendering Using the Visible Human Data

We presents a new data structure, Slice-based Binary Shell (SBS), for efficient manipulation and rendering of binary volume data. Since SBS stores only surface voxels with selected attributes of the voxels in a slice-based data structure that allows direct access to the voxels, it shows high storage and computational efficiency. This efficiency becomes more prominent when representingmultiple binary objects.We also present an efficient rendering algorithm for SBS. The algorithm, based on the shear-warp technique, provides high-speed interactive rendering for binary volumes of many objects on a PC with no specialized hardware.

Publications: