CINF 66: Interactive exploration of high volume
datasets using HiVol and HiStats
David Baker, and Ralph
Walden, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144,
dabaker@tripos.com
Abstract
HiVol and HiStats are new
software tools for analyzing and visualizing the large datasets typical of
high-throughput synthesis and screening efforts. Chemical and property data for
over a million compounds can be readily calculated, filtered, sorted, and
graphed. Datasets can be interactively and iteratively partitioned into subsets
based on 2D structure searching, diversity/similarity, registration IDs, and
property range. Multiple databases and subsets are simultaneously accessible,
each displayed in a spreadsheet complete with 2D structures and associated
properties. Additional visualization tools include scatter plots, histograms,
and dendrograms. HiStats calculates univariate statistics, performs hierarchical
clustering, and builds regression models that profile the properties of large
datasets in order to guide follow-up experiments.