CINF 66: Interactive exploration of high volume
datasets using HiVol and HiStats
David Baker, and Ralph Walden, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, firstname.lastname@example.org
HiVol and HiStats are new software tools for analyzing and visualizing the large datasets typical of high-throughput synthesis and screening efforts. Chemical and property data for over a million compounds can be readily calculated, filtered, sorted, and graphed. Datasets can be interactively and iteratively partitioned into subsets based on 2D structure searching, diversity/similarity, registration IDs, and property range. Multiple databases and subsets are simultaneously accessible, each displayed in a spreadsheet complete with 2D structures and associated properties. Additional visualization tools include scatter plots, histograms, and dendrograms. HiStats calculates univariate statistics, performs hierarchical clustering, and builds regression models that profile the properties of large datasets in order to guide follow-up experiments.