CINF
19 Hierarchical
k-means clustering using principal components to solve the unsupervised
multi-class classification problem
James F. Rathman1, Syed B. Mohiddin1, and
Chihae Yang2. (1) Department of Chemical and Biomolecular
Engineering, The Ohio State University, Koffolt Laboratories, 140 West 19th
Avenue, Columbus, OH 43210-1110, Fax: 614-292-3769, rathman.1@osu.edu, (2)
Leadscope, Inc
Current
clustering techniques can be grouped as either supervised or unsupervised.
In a supervised method, each observation in the training dataset is
pre-assigned to a class based on prior knowledge, while an unsupervised
method uses no prior knowledge of the class distinction. Numerous supervised
techniques have been demonstrated to work well for binary classification and
a few of these are reasonably good at making supervised multi-class
predictions. However, techniques for unsupervised binary and multi-class
predictions have not been fully developed. In this work, we present an
analysis technique based on hierarchical K-means using differentially
weighted principal component analysis to address unsupervised classification
for both binary and multi-class problems. We demonstrate the methodology on
both biological (NCI 60 cancer cell lines dataset and acute leukemia
dataset) as well as chemical datasets with the objectives of predicting
class membership and identifying non-redundant features most responsible for
differentiating the observed classes.