#223 - Abstracts

ACS National Meeting
April 7-11, 2002
Orlando, FL


Section A

Chemical Descriptors
Convention Center 101 A
B.A. Vickery, Organizer
9:00   Introductory Remarks.
9:10 1 An efficient representation for chemical descriptors
Jeffrey M. Blaney, and J. Kevin Lanctot, Computational Sciences, Bristol-Myers Squibb Pharma Research Labs, Inc, 150 California St, Suite 1100, San Francisco, CA 94111, Fax: 415-732-7170, jblaney@combichem.com

3D-pharmacophores have become popular in the last decade as descriptors for chemical similarity searching and combinatorial library design. These pharmacophores are typically encoded in large bitstrings, up to hundreds of millions of bits long. Research on other classes of chemical descriptors suggests that "chemical space" requires on the order of 10-20 dimensions. Many of the millions of dimensions are correlated or irrelevant and their large size makes them a less efficient representation for computation. We have developed an approach based on classical multidimensional scaling to determine the inherent dimensionality of these large chemical bitstrings (pharmacophore or otherwise), resulting in real-valued cartesian coordinates. We have also developed a novel approach to calculate much smaller bitstrings (hundreds of bits) that preserve the original pairwise similarity of the original large bitstrings. The new smaller space is general enough that it can be used for many common descriptor-based library design approaches: coverage, diversity, D-optimal, or informative design.

9:40 2 A hierarchy of structure representation.
Johann Gasteiger, Thomas Kleinöder, Jens Sadowski, Markus Wagener, and Markus C. Hemmer, Computer-Chemie-Centrum and Institute of Organic Chemistry, University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen 91052, Germany, Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de

Modern drug design generates massive amounts of data that have to be related to chemical structures. Therefore, methods are needed that encode the physicochemical effects of chemical structures responsible for biological activity which are simultaneously applicable to large sets of molecules. We have developed a hierarchy of structure representations that start from the constitution of a molecule, proceed to 3D structures by using CORINA and then calculate molecular surfaces. A host of physicochemical effects such as charge distribution, polarizability, inductive and resonance effects as calculated by the PETRA package can be combined with each level of structure representation. Application of these structure encoding schemes to the definition of diversity, analysis of high-throughput experiments, and quantitative structure-activity relationships will be shown. Thus, these methods have their value in lead finding and optimization.

10:10 3 Use of molecular descriptors based on medicinal chemistry building blocks.
Paul E. Blower Jr.1, Kevin Cross1, Michael Fligner2, and Joseph Verducci2. (1) LeadScope, Inc, 1275 Kinnear Rd, Columbus, OH 43212, Fax: 614-675-3732, pblower@leadscope.com, (2) Ohio State University

LeadScope™ provides a large set of molecular descriptors based on structural features commonly used for experimental design in drug discovery programs, the building blocks of medicinal chemistry. The software performs a systematic substructure analysis using predefined structural features stored in a feature library. The features represent a wide range of structural specificity from very specific substructures such as benzene, 1-hydroxymethyl, 3-methoxy- to generic features such as pharmacophores which are pairs of generalized physiochemical atom types. At the present time, the feature library contains over 27,000 structural features. We have also developed a new association coefficient for diversity analysis that overcomes intrinsic biases of the Tanimoto and Hamming coefficients. This paper will describe the content and organization of the LeadScope™ molecular descriptors, give details of the new association coefficient, and contrast the three coefficients for selecting diverse sets of compounds from a large collection of known drugs.

10:40 4 Molecular descriptors as a tool for data mining the Registry file.
Jeffrey M. Wilson, and Roger J. Schenck, Authority Database Operations, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: 614-447-3713, jwilson@cas.org - SLIDES

The CAS Registry file is among the world's largest virtual screening libraries. The addition of a variety of calculated physical properties to Registry allows data mining of this database in ways that were never before possible. The searchable nature of these molecular descriptors allows the user not only to refine searches based on types of properties and specific value ranges but also to visualize the relationships of these values for groups of similar structures.

11:10 5 Multiresolution analysis of topological representations of structural and physico-chemical properties of pharmacological molecules.
John Binamé, Laurence Leherte, and Daniel P. Vercauteren, Laboratoire de Physico-Chimie Informatique, Facultés Universitaires Notre-Dame de la Paix, 61, Rue de Bruxelles, Namur B-5000, Belgium, john.biname@fundp.ac.be

The 3D structure of most biological receptors may still be difficult to obtain experimentally. Theoretical methods present thus interesting alternatives to elucidate the corresponding pharmacophore elements. Our goal is to develop new methods to propose pharmacophores based on reduced representation of the electron density of potent ligands.

Such representations are expected to be composed of few relevant element (one or two) for each chemical function in the molecule. To determine the position and the properties of such elements, we applied a topological analysis algorithm to calculated electron density maps of the molecules. Pharmacophore elements are thus identified as the critical points of the electron density.

Preliminary results are presented for small sets of molecules at different crystallographic resolutions (2.0 to 5.0 Å) to determine the best representation. Properties of the critical points are then statistically analyzed for broader sets of molecules to evaluate their transferability to various families of molecules.


Section A

Chemical Descriptors
Convention Center 101 A
B.A. Vickery, Organizer
1:40 6 Combinatorial descriptors for virtual screening.
Victor S. Lobanov, Dimitris K. Agrafiotis, and Huafeng Xu, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Dr., Suite 104, Exton, PA 19341, Fax: 610-458-8249, victor@3dp.com, hxu@3dp.com

The advent of combinatorial chemistry has sparked renewed interest in the use of molecular descriptors for virtual screening. Whether it is based on molecular diversity, molecular similarity or structure-activity correlation, the design of a combinatorial experiment usually involves the enumeration of every product in the virtual library, and the computation of key molecular properties that are thought to be pertinent to the application at hand. Unfortunately, this simplistic approach collapses with large combinatorial libraries, which often defy enumeration. Recently, we presented a machine learning approach that allows the prediction of product descriptors from pertinent features of their respective building blocks, thus limiting the computationally expensive steps of enumeration and descriptor generation to only a small fraction of products. In this paper, we present the application of this technique to several popular sets of descriptors, introduce hybrid schemes that reveal their mutual redundancy, and present several algorithmic enhancements aimed at improving the quality of the predictions.

1:30 7 Prediction of drug solubility: cohesive interactions modeled by Monte Carlo simulations.
Anton Filikov, Syrrx, Inc, 10450 Science Center Drive, San Diego, CA 92121, Fax: (858) 623-0460, anton.filikov@syrrx.com

Monte Carlo (MC) simulations in torsion angle space have been used to model cohesive interactions in solid phase. Each simulation consists of sampling 3 million conformations of a tetramer of a compound followed by sequential fully flexible MC docking of additional 16 molecules onto the crystalline nucleus. The following descriptors of the solid phase interactions are calculated: van der Waals and hydrogen bond interaction energies, torsion angle strain energy, the difference between hydrogen bond interaction energies in the solid phase simulation and in solution, number of rotatable bonds and number of rotatable bonds without symmetrical groups. The descriptors for the solution phase include solvation energy calculated via atomic solvation model, surface tension solvation energy, Poisson electrostatics, polar surface area, clogP, etc. The descriptors calculated for an extensive set of drug molecules have been used to derive several regression equations to predict solubility. The accuracy of this approach will be compared to other methods.

2:00 8 Collection of chemically intuitive molecular descriptors proven as highly effective and fast predictors of ADME properties.
Robert Fraczkiewicz1, Boyd Steere1, and Michael B. Bolger2. (1) Life Sciences Department, Simulations Plus, Inc, 1220 West Avenue J, Lancaster, CA 93534, Fax: (661) 723-5524, (2) Department of Pharmaceutical Sciences, USC School of Pharmacy

The interest of pharmaceutical companies in time- and cost-effective methods of in silico drug lead screening has been growing rapidly. Methods for estimation of ADME properties before respective molecules are actually synthesized plays a pivotal role in this process. A majority of these methods use descriptors of molecular structure as input variables to predictive mathematical models. This presentation illustrates how novel algorithms developed at Simulations Plus for rapid computation of an original set of unique molecular descriptors lead to high-performance predictive models of ADME properties.

2:45 9 An efficient bitmap container package for very high-dimensional fingerprints.
Peter Fox1, Lars Naerum2, Henning Thogersen2, Robert Clark1, and Trevor Heritage1. (1) Research Department, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO MO 63144, (2) MedChem Research, Novo Nordisk A/S

Most existing bitmap containers used to store fingerprints for molecular descriptors are poorly suited for descriptors that can span a very large solution space. We present a fingerprint (bitmap) container that uses a compression scheme to characterize the bitmap, while allowing on-the-fly bitmap operations on the compressed bitmap, with no need to decompress it first. We are using this approach very successfully with pharmacophoric multiplets, where the solution space, and consequently the bitmap size, is extremely large. In particular, compression makes it possible to carry out stochastic sampling of the full conformational space, obviating the need to consider only fixed torsional increments. Examples and detailed performance analyses will be presented

3:15 10 Controlling degeneracy with the extended valence sequence Signature molecular descriptor.
Jean-Loup Faulon1, Carla J Churchwell1, and Donald P Visco2. (1) Computation, Computers and Mathematics, Sandia National Laboratories, P.O. Box 969, MS 9951, Livermore, CA 94551, Fax: 925-924-3020, jfaulon@sandia.gov, (2) Department of Chemical Engineering, Tennessee Tech. University

We present a new molecular descriptor named Signature based on extended valence sequence. The new descriptor can be computed and store efficiently. We rigorously prove that all topological indices (TIs) based on counts of walks, paths, and distances are computable from signature. The degeneracy of signature and popular TIs is then computed for homogenous series of alkanes, alcohols, fullerene-type structures, and peptides. We believe this study to be the first where degeneracy is systematically probed for homogenous molecular series. Results indicate that signature is the only molecular descriptor that can fully control degeneracy. As a general rule, we find that hydrocarbon structures comprising n non-hydrogen atoms are uniquely characterized by signatures of height n/4, while peptides up to 5,000 amino acids can be singled out with signatures of heights as small as 2 or 3. Aside from signature, Kier and Hall total topological index exhibits low degeneracy as well.


Section B

ADME/Tox Informatics
Convention Center 101 B
O. F. Güner, Organizer
Cosponsored with COMP, MEDI, TOX
3:45 11 Qualitative structure-property approach to ADME/TOX using Idiotropic Electrostatic and Steric Field Orientation.
Philippa R. N. Jayatilleke, and Robert D. Clark, Research Department, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241, pjayat@tripos.com

We will present a new strategy to model the complex properties of ADME/TOX by combining three-dimensional molecular interaction fields with soft independent modeling of chemical analogy (SIMCA). It is necessary to consider “bulk” properties that contribute towards solubility and penetration, as well as specific structural features when modeling a compound’s pharmacokinetic profile. Models are constructed from data sets aligned using a variation of the inertial field orientation (IFO) method. The new approach uses principal axes derived from both steric and electrostatic fields. By utilizing both fields, we can more closely relate the CoMFA or CoMSIA field type descriptors to structural properties. We have explored several computational schemes including testing conformational sensitivity and assessing the influence of the molecular field types used. Models will be presented for human intestinal absorption, blood-brain barrier penetration and oral bioavailability, demonstrating a tool for developing leads with enhanced therapeutic potential.

1:30-3:30 12 Getting the ADME properties right through property-based design.
Han van de Waterbeemd, PDM, Department of Drug Metabolism, Pfizer Global Research and Development, IPC351, Sandwich CT13 9NJ, United Kingdom, Fax: 01304-651817, han_waterbeemd@sandwich.pfizer.com - SLIDES

Physicochemical properties, including solubility, permeability, lipophilicity and pKa, have been widely used to optimise biopharmaceutical and pharmacokinetic properties of drug candidates. Therefore various screens have been developed to assess such properties in high throughput. An overview of such technologies will be given. For the evaluation of virtual compounds and libraries computational methods have been developed and others are currently in progress to address the various ADME properties as early as possible. Analyses of the key molecular properties of drugs in collections such as the World Drug Index led to the formulation of simple rules such as the rule-of-5. These rules can now be used in property-based design of hits, leads and drugs. Examples of relationships between molecular properties and ADME endpoints will be presented and challenges discussed.


D.A. Smith, H. van de Waterbeemd and D.K. Walker, Pharmacokinetics and metabolism in drug design, Wiley-VCH (2000).

H. van de Waterbeemd, D.A. Smith, K. Beaumont and D.K. Walker, Property-based design: optimization of drug absorption and pharmacokinetics, J.Med.Chem. 44 (2001) 1313-1333.



In Silico Methodologies in Accelerating Drug Development.
Jay T. Goodwin, Philip S. Burton, Thomas J. Vidmar, Pil H. Lee, Hua Gao, and Gerry M. Maggiora, Drug Absorption and Transport, Pharmacia Corporation, 301 Henrietta St., Kalamazoo, MI 49007, Fax: (616) 833-2325, jay.t.goodwin@pharmacia.com

Computationally based predictive models of various biopharmaceutical and ADME properties have been developed by numerous academic, commercial, and industrial groups to address compound availability and conservation, to perform virtual screening and design, and to mitigate limitations of experimental throughput. Such properties include aqueous solubility, cellular and intestinal permeability, protein binding, CNS penetration, and oral bioavailability (although it is not often acknowledged that the latter two properties are complex composite functions of many properties and mechanisms). This provides a context for the hierarchical development of predictive models of physicochemical and biopharmaceutical properties, and ultimately the integration of such models into prediction of in vivo performance. Predictive model development is critically dependent upon several factors: experimental data, statistical methods, and the requirements of the intended application. Examples and application of computational properties models developed within Pharmacia will be presented in the context of these issues.



Prospective Evaluation of Structure-based ADME Predictions: Knowing When the Experiment is the Prudent Course of Action.
Troy Bremer, Jehangir Athwal, Kevin Holme, and Carleton R. Sage, Computational Sciences, Lion Bioscience, 9880 Campus Point Dr, San Diego, CA 92121, Fax: 858-410-6665, carleton.sage@lionbioscience.com

Statistically based computational models that predict a biological output from structural inputs are interpolative systems likely to perform poorly when used for extrapolation. The ADME (Absorption, Distribution, Metabolism, Excretion) properties of a compound are largely unrelated to its pharmacological target; the “space” described by ADME models is largely unconstrained, and data representation within this space is sparse. Therefore, methods to describe when a prediction is an interpolation or an extrapolation should be useful in prospectively determining prediction validity. We have used robust statistical methods to develop predictive models for CYP2D6, CYP3A4, FDP, and Caco-2 effective permeability. To encourage the appropriate use of these predictive tools, we have developed methods that provide a measure of uncertainty associated for each prediction. When used in the evaluation of completely external test sets, the measures of uncertainty results have been useful in the identification of poor predictions, allowing their prospective use in prediction evaluation.

2:30 15 Exploring the relationships between chemical structure, in vitro profiles, and in vivo behavior.
Julie E. Penzotti1, Boryeu Mao1, Dragos Horvath2, Jacques Migeon1, Cecile Krejsa1, and Dave Porubek1. (1) Cerep, Inc, 15318 NE 95th St., Redmond, WA 98052, Fax: 425-895-8668, j.penzotti@cerep.com, (2) Cerep - SLIDES

The drive to identify development liabilities for compounds earlier in the drug discovery process has made the generation of reliable computational methods for predicting ADME/Tox properties increasingly important. Consistent and reliable data for a large, diverse set of compounds is needed to derive models that can generalize well to new chemical series. We have screened nearly 2000 drugs and related compounds against a panel of ~90 pharmaceutical and pharmacological assays to create a biological profile or BioPrintTM for each compound. We are applying computational chemistry methods and data mining techniques to investigate the inter-relationships between this in vitro data, in vivo behavior, and chemical feature based descriptors, to generate models that correlate molecular features to patterns of in vitro and in vivo biological activity. We will describe our efforts towards developing a series of computational models addressing ADME/Tox properties such as metabolic stability, bioavailability and toxicity effects that can be used to guide the selection of compounds likely to have more favorable ADME/Tox profiles.

3:00 16 CACO-2 permeability modeling: Feature selection via sparse support vector machines.
Curt M. Breneman1, Kristin P. Bennett2, Jinbo Bi2, Mark J. Embrechts3, and Minghu Song1. (1) Department of Chemistry, Rensselaer Polytechnic Institute, 110-8th Street, Cogswell Bldg, Troy, NY 12180, Fax: 518-276-4045, brenec@rpi.edu, (2) Department of Mathematics, Rensselaer Polytechnic Institute, (3) Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute - SLIDES

We describe a methodology for performing variable selection and ranking using support vector machines (SVM). The basic idea of the method is simple: Construct a series of sparse linear SVMs that exhibit good generalization, then create a subset of variables having nonzero weights in the linear models. This subset of variables is then used in a nonlinear SVM to produce the final regression or classification function. The method exploits the fact that linear SVMs with 1-norm regularization (no kernels) inherently perform variable selection as a side-effect of minimizing capacity in the SVM model. In linear 1-norm SVMs, the optimal weight vector will have relatively few nonzero weights with the degree of sparsity depending on the SVM model parameters. The variables with nonzero weights then become potential features to be used in the nonlinear SVM. In some sense, we trade the variable selection problem for the model parameter selection problem in SVM.

The small number of molecules and descriptor collinearity makes the results of the linear 1-norm SVMs somewhat unstable -- small changes in the training and tuning data and/or model parameters may produce very different sets of nonzero weighted attributes. Our final variable selection and ranking methodology exploits this instability. For each training partition, the data is further divided to create a tuning set used in a pattern search algorithm for parameter selection. Multiple linear models are then created based on different tuning set partitions, each producing different variable weights. The final variable subset is chosen from the superset of all nonzero weighted attributes in any of the linear models. The simple strategy of selecting the entire superset works well in practice. The distribution of the linear model weight vectors provides a mechanism for ranking and interpreting the effects of variables. Starplots or stackplots are used to visualize the magnitude and variance of the weights found by the linear models for each attribute.


Section A

New Developments in Electronic Publishing
Convention Center 101 A
E. M. Shanbrom, Organizer


Digital Content in its New Context.
Richard T. Kaser, Vice President, Content, Information Today, Inc, 143 Old Marlton Pike, Medford, NJ 08055-8750, Fax: (609) 654-4309, kaser@infotoday.com - SLIDES

Over the last 30 years, we have seen the scholarly literature (and just about every other kind of information) expand from the confines of a book spine into flexible components that can quickly regroup into dynamic arrangements. The document still persists as the preferred form to contain the formal expression of knowledge. But in our networked world, we are increasingly aware that other ways to express and convey information and share knowledge exist as well. Today's services are blending documents, data, visual representations, and other elements with interactive elements to create services honed to the users' needs. All the pieces that publishers, librarians, system operators, aggregators, developers and a host of others have worked for over a generation to perfect are now in place. The underlying technology finally works. And it is a rich array of digital information services that now exist. Our technology supports everything from off-the-shelf, ready-to-wear solutions to do-it-yourself enterprise and institutional portals. The choices are mind boggling. If we could only make full use of everything we've got, we would feel rich. This presentation will focus on developments in electronic publishing at large, with an emphasis on those that support research activities.

9:00 18 The all-inclusive, totally functional, super-connected scientific information machine.
Robert D. Bovenschulte, Director, ACS Publications, American Chemical Society, 1155 16th Street NW, Washington, DC 20036, Fax: (202) 872-6060, rbovenschulte@acs.org - SLIDES
9:30 19 Online publishing a chance for new alliances: a report from Springer-Verlag.
Gertraud Griepke, Journals/LINK Director, Springer-Verlag, Tiergartenstrabe 17, D-69121 Heidelberg, Germany, Fax: 49 (0 62 21) 487-288, griepke@springer.de - SLIDES

There are unique challenges involved in setting up and running a successful online service for scientific content. While many of the aspects handled are the same as any online information service on the internet; the digital production workflow for publishing scientific content needs to be rapid, of high quality with enhanced functionality which are unique to science. The result of this effort is visible in the alliances between publishing houses, abstracting indexing services, agencies, libraries and information users. This talk explores the progress Springer has made so far and looks forward to the challenges to come.

10:00 20 A 'sea change' in chemical information.
William G Town, Elsevier Science, Director of Operations, ChemWeb, Inc, 84 Theobalds Road, London WC1X 8RR, United Kingdom, Fax: +44(0)20 7611 4301, bill.town@chemweb.com - SLIDES

In the last five years, a 'sea change' in chemical information has occurred: community websites, publisher websites, content aggregators, preprint servers, e-commerce market places, and scientific search engines have all been launched in this timeframe. The rate of innovation has shown a dramatic increase, which shows no sign of abating. Publishers are completing the transition from print-based to electronic/print-based businesses. New players and new partnerships characterise the scientific information market today. Consolidation of the industry has already begun but is this just the start or the end of the process? What will the next five years bring?

10:30 21 The future of the 'infomediary'
Andrea Keyhani, Chief Operating Officer, Ingenta, Inc, 23-28 Hythe Bridge Street, Oxford OX1 2ET, United Kingdom, Fax: +44(0)1865 799111, akeyhani@ingenta.com - SLIDES

Ingenta is one of the world's largest resources of academic and professional research articles online - recognizing subscriptions and offering document delivery of 26,000 publications and the full-text of 5,400 journals from 180 publishers.

Incorporating UnCover, CatchWord, Dynamic Diagrams and PCG, Ingenta provides publishers' solutions to empower the exchange of research content online - from a database of journal metadata to sophisticated e-communities.

Complementary services to libraries include free access to subscribed-to journals, document delivery with cost accounting, and customized library gateways. This Fall, Ingenta launched the PCG Library Consortia Sales Program, an innovative solution to consortia site licensing of the electronic content from Ingenta publishers.

The Ingenta Institute, a non-profit organization, commissions independent research into the future of scholarly publishing. Using this research as a starting-point, Andrea will predict the future of the 'infomediary' and its role in electronic publishing

11:00 22 Electronic information and the innovation challenge.
Robert J. Massie, Director, Chemical Abstracts Service, American Chemical Society, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: (614) 447-3713, rmassie@cas.org - SLIDES

Many of today's concerns in the information industry resolve into a single broad question: can information providers continue to innovate? An analysis of developments in electronic information reveals that innovation occurs along two axes: delivery platforms and content. Each platform offers its own range of tools and possiblities and we have experienced successive waves of technical advances in recent memory. But the excitement of new technology and media must not obscure the role of content, the sine qua non of value in information products. It is the dynamic interaction of platform and content that gives rise to new products. This thesis will be explored with concrete examples and their implications for giving consumers of scitech information what they really want in the new era of integrated access.

11:30   Lunch Break.


Section B

ADME/Tox Informatics
Convention Center 101 B
O. F. Güner, Organizer
Co-sponsored with COMP, MEDI, TOX
9:00 23 Use of robust classification techniques for the prediction of human cytochrome P450 inhibition.
Roberta G. Susnow, and Steve Dixon, ADMET R&D, Accelrys Inc, Box 5350, Princeton, NJ 08543-5350, rsusnow@accelrys.com

The ability to predict the inhibition of the cytochrome P450’s is important because of their role in the metabolism of xenobiotics and the consequent potential for drug-drug interactions. The human CYP 450’s are responsible for the metabolism of more than 50% of all known drugs. We will present our latest research into the use of robust classification techniques for predicting the ability of molecules to inhibit the P450 isozymes. These techniques are designed to produce models with a low sensitivity to noise and broad applicability across chemical families.

9:30 24 Use of predictive ADME in library profiling and lead optimization.
Osman F. Güner, and Robert D. Brown, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, osman@accelrys.com - SLIDES

High-throughput in silico ADME models can be used to select subsets of combinatorial libraries based on not only diversity or similarity, but also a combination of various ADME properties as well. The contribution of the ADME properties-based constraints can be weighted against diversity assessment. We present how the drug-like properties of the selected subset of library can be improved without compromising the diversity and coverage of the library. The process is demonstrated with several examples. Finally, we provide an example of how this process is used in lead optimization while both potency and pharmacokinetic properties are simultaneously optimized to yield potent candidates with better anticipated ADME characteristics

10:00 25 Computational strategies in support of early ADME drug discovery efforts.
Michelle L. Lamb, Jayashree Srinivasan, John E. Eksterowicz, Robert V. Stanton, Kelly M. Jenkins, Robyn A. Rourick, and Peter D. J. Grootenhuis, Bristol-Myers Squibb Pharmaceutical Research Laboratories, 150 California Street, Suite 1100, San Francisco, CA 94111, Fax: 415-732-7170, mlamb@combichem.com

Early identification of liabilities associated with molecular absorption, distribution, metabolism, and excretion (ADME) accelerates the drug discovery process by identifying poor candidates prior to large investment in their development. As the mechanisms involved in ADME are complex, simple filters may only be applied in limited situations. Strategies that incorporate more complex models, such as ensembles of pharmacophores or shape descriptors may be more successful. We will describe the computational filters and classification models that we have developed to guide the design and selection of libraries likely to have more favorable absorption and metabolism profiles and to assist in the prioritization of chemical series.

10:30 26 Conceptual models for structure-nonspecific ADME/Tox.
Stefan Balaz, and Viera Lukacova, Department of Pharmaceutical Sciences, North Dakota State University, College of Pharmacy, Sudro Hall 108, Fargo, ND 58105, Fax: 701-231-7606, stefan.balaz@ndsu.nodak.edu - SLIDES

For most chemicals, ADME/Tox processes except enzymatic metabolism and active transport are governed by their overall properties including lipophilicity, amphiphilicity, acidity, and reactivity. Structure-nonspecific (sn-)ADME/Tox processes can be analyzed using models of subcellular pharmacokinetics, which describe the kinetics of membrane transport, protein binding, hydrolysis, and other reactions with cell constituents in terms of differential equations. Using the time hierarchy of the included processes, the equations can be simplified and solved explicitly. The solutions (called disposition functions) represent conceptual models for sn-ADME/Tox processes in terms of chemical properties and time. The attributes of biological systems are kept invariant during the experiments and are collected in adjustable coefficients of disposition functions. Once calibrated for given biosystem, the models provide a detailed recipe for structure optimization of chemicals with regard to sn-ADME/Tox. The models have much better predictivity outside the tested property space than empirical models as demonstrated using the leave-extremes-out cross-validation procedure.

9:30 27 Model for absorption of drug-like compounds based on structural features and interfacial properties.
Chihae Yang1, Ilya Utkin1, James Rathman2, and Paul E. Blower1. (1) LeadScope, Inc, 1245 Kinnear Rd, Columbus, OH 43212, cyang@leadscope.com, iutkin@leadscope.com, (2) Deparment of Chemical Engineering, The Ohio State University - SLIDES

Predicting ADME properties from structure-based models is still not reliable and is considered to be one of the most difficult problems in the lead optimization process. ADME models are typically based on physical properties and assay data from lipid vesicles or monolayer experiments, where KD values can be experimentally determined. The complex interfacial interactions of a compound with membrane lipids are difficult to extract from the conventional set of molecular properties predicted from a QSAR model. In this study, a set of drug-like compounds are selected and their interfacial properties predicted based on the structural features using available correlation methods. These predicted interfacial properties, in conjunction with structural features selected by informatics methods for their high association with desired physical properties, are employed to build a model for human intestinal absorption. Various informatics methods, including genetic algorithm, K-nearest neighbor, and partial least square methods are used to build these models.

11:30 28 Using ADME properties with SciFinder to target new drugs.
Michael McBrien1, Robert DeWitte1, Robin Martin1, Eduard Kolovanov1, and Kurt Zielenbach2. (1) Advanced Chemistry Development, 600-90 Adelaide W, Toronto, ON M5H 3V9, Canada, michael@acdlabs.com, (2) Chemical Abstracts Service

In the last several years, biological chemists have begun to apply physical criteria when selecting compounds for evaluation. By avoiding compounds with extremely high (or low) lipophilicity, and low solubility, for example, chemists hope to focus their investigations on compounds that are more likely to be succesfully absorbed by passive processes. Recently, Chemical Abstracts Services and Advanced Chemistry Development have collaborated to make predicted physical properties available for over eight million organic substances in the CAS Registry database. This talk will explain how these predicted properties are computed, and how the user may use them in conjunction with SciFinder to narrow queries to compounds with suitable physical properties.

11:30   Lunch Break.


Section A

New Developments in Electronic Publishing
Convention Center 101 A
E. M. Shanbrom, Organizer
2:00 29 Chain, chain, chain...: Implementing linking technologies at the University of Chicago Library.
Andrea B. Twiss-Brooks, John Crerar Library, University of Chicago, 5730 S. Ellis, Chicago, IL 60637, Fax: 773-702-7429, atbrooks@midway.uchicago.edu - SLIDES

Licensing of electronic databases and journals in major academic research libraries represents a significant investment of resources to activate, manage and maintain. It is in the best interests of both academic libraries and their primary user communities to make the most efficient use of these online resources possible. The University of Chicago Library has been motivated recently to provide links among various electronic resources in order to guide users to the appropriate information using the most direct means available. The linking systems utilized included ChemPort, OvidLinks and SFX. This report describes the implementation and some early evaluation of these efforts.

2:30 30 ACS Journals on the Web: A 5-Year Retrospective.
Lorrin R Garson, David P Martinsen, and Ralph E Youngen, ACS Publications, American Chemical Society, 1155 16th Street NW, Washington, DC 20036, Fax: (202) 872-4389, l_garson@acs.org - SLIDES

The ACS journals have been available to the scientific public on the World Wide Web since September 8, 1997. Prior to 1997, important work on core electronic delivery technology was accomplished which made Web delivery practical. Several features of the ACS Web journals will be discussed. Digitization of the backfile of ACS journals was accomplished in 2001, the earliest issues being from 1879 for the Journal of the American Chemical Society. Creation of the backfile, with emphasis on engineering aspects, will also be discussed.

3:00 31 Building the digital research environment: a report from the construction site.
Harry F Boyle, Manager, Web Content, Chemical Abstracts Service, American Chemical Society, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: (614) 447-7149, hboyle@cas.org - SLIDES

Despite the lack of a blueprint, the foundations of the digital research environment of the future are under construction. Many organizations are building its components. Lack of a shared vision or blueprint ensures that the pieces will not fit together optimally. Meanwhile the traditional foundation of scholarly communication - the print journal, is in decline. CAS and the Publications Division of the ACS, along with many others STM publishers and service providers are working together to build the digital research environment of the future. This presentation will provide examples of these efforts, as seen through the eyes of research scientists and administrators.

3:30   Intermission.
4:30 32 Open Meeting: Committees on Publications and on Chemical Abstracts Service.
Robert J. Massie, Director, Chemical Abstracts Service, American Chemical Society, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: (614) 447-3713, rmassie@cas.org, and Robert D. Bovenschulte, Director, ACS Publications, American Chemical Society


Section B

ADME/Tox Informatics
Convention Center 101 B
O. F. Güner, Organizer
Co-sponsored with COMP, MEDI, TOX
1:30 33 Computer predicting metabolic profile for drug-like compounds.
Yulia V. Borodina, Dmitrii A. Filimonov, and Vladimir V. Poroikov, Institute of Biomedical Chemistry of Russian Academy of Medical Science, Pogodinskaya Str., 10, Moscow 119992, Russia, Fax: +007/095/245-0857, borodina@ibmh.msk.su

Metabolic biotransformations of drug-like compounds depend on their structures. Dozens enzymes metabolise xenobiotics in human organism, and sometimes toxic metabolic products are generated. Thus, computer-aided prediction of metabolic biotransformations might help to select the most prospective compounds at the early stage of R & D. Computer program PASS is shown to predict with reasonable accuracy more than 700 pharmacological effects, mechanisms of action, carcinogenicity, mutagenicity, teratogenicity and embryotoxicity of compound on the basis of its structural formula (http://www.ibmh.msk.su/PASS). We applied PASS to prediction of specificity for metabolism of compounds by different isoforms of cytochromes P450 and to estimation of first step in metabolic transformation. Database Metabolite 2001.1 (MDL Information System INC) was used as the training set. It was shown that the average accuracy of prediction in leave one out cross-validation is satisfactory for use this approach in practice. This talk will focus on possibilities and limitations of metabolism prediction by PASS.

2:00 34 In Silico Models for the Prediction of Hepatotoxicity on Human.
Ailan Cheng, and Steve Dixon, ADMET R&D, Accelrys, CN 5350, Princeton, NJ 08543, Fax: 609-919-6155, acheng@accelrys.com

The liver has been recognized as a target organ for xenobiotic-induced toxicity due to its crucial role in metabolism. Hepatotoxicity has been dose-limiting factor for many INDs. Many drugs were withdrawn from clinical trials and even market due to hepatotoxicity. “Fail early and fail fast” is the current paradigm of pharmaceutical industry. Eliminating the compounds with poor ADME/Tox profile in the early stage will lead to tremendous savings. Accurate predictive method can be used to identify and prioritize candidates for development, to assistant designing compounds with desirable profile, and to prioritize and even to reduce the experimental studies and animal tests. We will present our latest in silico models for the prediction of hepatotoxicity potential on human. The model was based on a set of diverse compounds. The model is rather fast and allows the model to be used in data mining and profiling of large synthesized or virtual libraries.

2:30 35 Data mining to identify structural alerts for liver toxicity.
Paul E. Blower Jr., Gulsevin Roberts, and Ilya Utkin, LeadScope, Inc, 1245 Kinnear Rd, Columbus, OH 43212, Fax: 614-675-3732, pblower@leadscope.com

Adverse liver findings are frequently responsible for the failure of drug candidates and marketed drugs. We first developed a grading scheme for liver toxicity that encompasses a range of pathology findings and dose effects. Using data from the RTECS files and other sources, we established a database containing structural information and liver gradings. Structural alerts can be identified using data mining approaches for investigating correlations between molecular structure and biological activity. We have developed a new statistical search procedure that quickly identifies specific combinations of structural features corresponding to compound sets with high average activities. This study demonstrates that data mining tools can identify a number of structural alerts for liver.

3:00 36 The prediction of Water Solubility and of pKa-Values by Physicochemical Descriptors.
Johann Gasteiger, Ai-Xia Yan, and Thomas Kleinöder, Computer-Chemie-Centrum and Institute of Organic Chemistry, University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen 91052, Germany, Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de - SLIDES

Water solubility and protonation states are two important properties to be considered in drug development. We have used a variety of physicochemical descriptors such as charge distribution, inductive, resonance and polarizability effects that can rapidly be calculated by empirical methods collected in the program package PETRA. The correlation of these descriptors with water solubility and pKa-values has been investigated with statistical methods and with unsupervised and supervised neural networks.

3:30 37 Prediction of aqueous solubility.
Flemming Steen Jørgensen1, Jørgen Bonefeld Kristensen1, and Inge Thøger Chistensen2. (1) Department of Medicinal Chemistry, Royal Danish School of Pharmacy, Universitetsparken 2, DK-2100 Copenhagen, Denmark, Fax: +45 35 30 60 40, fsj@dfh.dk, (2) Novo Nordisk A/S

Two new models for prediction of aqueous solubility will be presented and compared with other known methods. The first model is based on the atom-type weighted water-accessible surface area (ATW WASA) approach. The water-accessible surface area is calculated for each atom and its contribution to the aqueous solubility is weighted by multiplying with a coefficient characteristic for each atom type. In the second method, the group contribution method, the molecules are split up in predefined fragments covering important functional groups and substructural units of the compounds. The number of fragments obtained are weighted by coefficients determined by multidimensional least-squares fitting. A set of 1292 structurally diverse compounds was used as a training set. For this set we obtained a correlation between experimental and predicted aqueous solubility on r=0.87 with an average error on 0.82 log units for the ATW WASA model and r=0.93 with an average error on 0.59 log units for the group contribution model.

4:00 38 Accurate Prediction of Aqueous Solubility.
Michael McBrien, Robert S DeWitte, and Eduard Kolovanov, Advanced Chemistry Development, 600-90 Adelaide W, Toronto, ON M5H 3V9, Canada, michael@acdlabs.com

Although prediction of human absorption is confounding, it is clear that aqueous solubility is among the key driving factors. This talk will focus on the physical chemistry of the process of dissolving, and describe the methods used at Advanced Chemistry Development to produce a global predictive method for the accurate prediction of aqueous solubility. Finally, a software product will be described that makes this prediction technology available to every medicinal chemistry in a simple and intuitive user interface or through your company's informatics infrastructure.

4:30 39 Prediction of cytogenetic activity of organic compounds from molecular structure.
Jon R. Serra, Chemistry Department, Pennsylvania State University, 152 Davey Lab, University Park, PA 16802, jrs@zeus.chem.psu.edu, and Peter C. Jurs, Department of Chemistry, Pennsylvania State University

Computational classifiers for cytogenetic activity are being developed with a large, diverse set of organic compounds which have been tested with an in vitro chromosomal aberration assay using Chinese hamster cells. Classifiers are being developed to separate active from inactive compounds. Compounds that are common to both a 24-hour and 48-hour exposure assay are included. Each compound is represented by descriptors calculated from its molecular structure that encode topological, geometric, electronic, and polar surface features. Subsets of informative descriptors are identified with simulated annealing or genetic algorithm feature selection. The classifiers are built with k-nearest neighbor, multiple discriminant analysis, radial basis function neural networks, or support vector machines classifier algorithms. In one specific investigation, classifiers working with several hundred compounds each represented by a few topological descriptors achieve classification rates of approximately 80 percent. The details of the study and the classification results achieved will be described.


Sci-Mix Poster Session
Convention Center Hall C
R.W. Snyder, Organizer
8:00pm 40 How to search effectively chemical information in text files using chemical formulas.
Aleksandr Belinskiy, Information Consultant, 503 Roosevelt Blvd., Apt. A-615, Falls Church, VA 22044-3118, abelinskiy@aol.com

Even chemical formulas per se bear important information, there are few discussions how to search chemical information in text files using chemical formulas. A difference of presentation of chemical formulas in different systems (USPTO, ESPACENET, JAPIO, MICROPATENT, Delphion, STN, DIALOG,etc.) will be discussed, along with tools used to verify a correctness of chosen search strategy (dictionary files, an expand command, an observation of known examples).Examples of search strategy for piezoelectric ceramics like (PbZrxTiy (Zn1/3Nb2/3)z (Mn1/2W1/2)tO3) where syntax of search command depends on spacing between atom fragments,. will be presented.


Section A

Living With AIPA: Impact of the American Inventors Protection Act After a Year
Convention Center 101 A
S. M. Kaback, E. S. Simmons, Organizer
9:00   Introductory Remarks.
9:05 41 USPTO records after the AIPA and new challenges resulting from the AIPA.
R. Spar - SLIDES
10:05 42 Is the sky really falling? A review of the first two years of AIPA and its effect on inventors, patent lawyers and businesses.
Jacqueline M. Hutter, and Brian C. Meadows, Needle & Rosenberg, PC, 127 Peachtree Street N.E, 12th Floor, Atlanta, GA 30303, Fax: 404-688-9880, Hutter@needlepatent.com, meadows@needlepatent.com

The provisions of the 1999 American Inventor's Protection Act (AIPA) markedly changed long-standing practices and procedures of the U.S. patent laws. These changes have not only affected patent practitioners, but also inventors and the entities that seek to profit from their inventive activity. This presentation will provide an overview of some of the provisions of AIPA that have modified the manner in which patent strategy decisions are made by inventors, businesses and patent practitioners. In particular, this presentation will address AIPA's effect on matters related to patent application filing and patent application prosecution decisions, as well as new post-issuance considerations.

11:50 43 IFI indexing of pre-grant publications: Opportunities and challenges.
Darlene Slaughter, IFI CLAIMS Patent Services, 3202 Kirkwood Highway, Wilmington, DE 19808, Fax: 302-998-0733, darlene.slaughter@aspenpubl.com

Publication of US patent applications by the USPTO presents an opportunity for IFI to offer fast, accurate access to a whole new category of patent documents. IFI has been indexing US chemical applications since they began to publish in March 2001, with the goal of providing index term access to chemical patent documents as quickly as possible. We are adjusting to some associated challenges, including untested and sparsely documented source data, absence of an Official Gazette, larger records requiring more indexing terms, and a significant increase in overall volume of chemical patent publications, both granted and pre-grant, requiring IFI chemical indexing. This presentation focuses on IFI's approach to meeting those challenges, and current status of the indexed database.


Section B

Analysis and Visualization of Chemical Information
Convention Center 101 B
L. O'Korn, Organizer
9:00 44 Information management for research in the chemical industry.
L. David Rothman, Materials Science & Information Research, The Dow Chemical Company, 1776 Building, 2nd Floor, Midland, MI 48674, LDROTHMAN@dow.com - SLIDES

Wherever one looks, people are increasing the rate of data acquisition and seeking to use those data to answer more complex questions. The world of chemistry is no exception. High-throughput experimental and virtual science, growth of the public literature, demands for ever-improved materials and manufacturing processes and the overlap of industrial chemistry with the life sciences all create new challenges and opportunities in the analysis of data to drive decision-making in chemical research. Among the challenges with data are acquisition, management, integration and analysis, with the extra complication of the culture change researchers may undergo as these data challenges are addressed. There is much to learn from other industries, but chemistry information certainly has its unique problems. This talk will discuss these subjects and the needs that arise from them.

9:30 45 Realizing the dream: Analysis and visualization tools for today, problems and issues for tomorrow.
William F. Bartelt III, CAS and Web Content, CAS, 2540 Olentangy River Road, Columbus, OH 43202-1505, wbartelt@cas.org - SLIDES

Ways to deal with the ever-growing tidal wave of chemical information are sorely needed. Software tools for analysis and visualization of data are frequently cited as holding the most promise for solving the problem of information overflow. Different classes of tools are needed to deal with different types of data. How are we doing? This survey focuses on analysis and visualization tools that are commercially available and how well they help stem the tide. In addition, technical and economic issues affecting advances in the state-of-the art will be explored.

10:00 46 Command and control of the drug discovery factory: Putting researchers in the driver's seat.
Christopher Ahlberg, Spotfire, Inc, 212 Elm Street, Somerville, MA, MA 02115, Fax: 617-702-1700, ahlberg@spotfire.com - SLIDES

The last decade has seen an abundance of novel technologies, methodologies, and research content coming into the domain of drug discovery. High throughput technologies have the possibility of significantly improving the results of pharmaceutical research.

However - the results have not yet been shown. The output of novel products in the market place has decreased rather than increased while these new technologies have been implemented in current processes.

Much of the blame for this has been put on how research organizations have not been ready for dealing with the data explosion from novel technologies. Researchers have had to deal with 100x more data - in terms of number of compounds as well as in number of properties. Novel visualization and analytic technologies have been successful in battling this explosion - allowing researchers who otherwise would be confined to spreadsheets to rapidly browse data searching for trends and outliers.

While these novel visualization and analytic technologies have had big impact I will argue that to see real improvements in research productivity we need to see a discontinuous change in how research organizations deal with data and decision-making.

Chemists need to be able to see their results in the context of biology; biologists need to be able to see their results in the context of chemistry, etc. Decisions need to be made cross functionally - taking every aspect of chemistry and biology into consideration. Every decision need to be continuously monitored and updated as new data becomes available.

This is easier said than done. As much as such decision-making indeed would be a discontinuous change, a discontinuous change in software infrastructure for decision-making will be needed to enable a change in methodology - and put researchers in the driver's seat.

I will outline a novel architecture for analytical software for the world of drug discovery - building on previous success in data visualization - and showing how integrated decision-making can be made possible, though improvements at every level from the UI to the database. The presentation will include architecture as well as user interface issues - and discuss impact on pharmaceutical research.

10:30 47 Integrating analysis of chemical information from diverse sources and data types.
Jeffrey D. Saffer, OmniViz, Inc, Two Clock Tower Center, Suite 600, Maynard, MD 01754, jsaffer@omniviz.com - SLIDES

Today's chemist deals with very large collections of information from diverse sources. Textual information from literature and patents, high throughput screening results, structures, descriptors and fingerprints, and ADME/Tox results represent just a sampling of the different types of information used by the chemist. Being able to integrate these varied data into a cohesive understanding can lead to improved decision-making. One of the best instruments for this integration is the human mind, but this tool can only be fully engaged when the diverse information is presented in a context that is easy to assimilate. To this end, we have developed a visualization framework that integrates analysis of experimental and computational data with conceptual analysis of textual information. The application of these approaches to very large (hundreds of millions of data points) chemistry data sets will be discussed in the context of discovery research.

11:00   Lunch Break.


Section A

Living With AIPA: Impact of the American Inventors Protection Act After a Year
Convention Center 101 A
S. M. Kaback, E. S. Simmons, Organizer
co-sponsored with ACS Committee on Patents and Related Matters, PIUG, CHAL
2:00   Introductory Remarks.
2:05 48 Trends and impacts in chemical patent information.
Matthew J. Toussant, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: 614-447-3906, mtoussant@cas.org - SLIDES

CAS has observed two significant trends in the chemistry-related patents it monitors: 1) Major patent offices are issuing more applications and granted patents than ever before; and 2) Bioscience patents are becoming more complex and dense with information--and this could affect the currency and completeness of secondary databases. The enactment of the American Inventors Protection Act heightens the need to access patent information as soon as possible. Statistics from five major patent offices will be reviewed and the impact of the USPTO patent application release beginning in March 2001 will be examined. At CAS, these developments change the requirements for chemical patent information, while trends in intellectual property disclosure and protection are affecting chemical and pharmaceutical companies. This presentation will discuss how a secondary information provider adjusts to these changes, in the interest of serving researchers' evolving needs.

2:35 49 One change in law; a myriad of industry effects.
Sarah Hamer, Editorial Manager, Chemistry and Life Sciences, Derwent Information Ltd, 14 Great Queen Street, London, WC2B 5DF, United Kingdom, Fax: 44 20 7344 2900, Sarah.Hamer@derwent.co.uk - SLIDES

Passage of the American Inventors Protection Act (AIPA) was greeted at Derwent Information Ltd. with a mixture of eagerness and apprehension. Given the importance of the USPTO as a patenting authority, we needed from day 1 to provide full coverage and value-added data for each of the published applications in the Derwent World Patents Index® (WPI), Derwent GENESEQ, Derwent Patents Citation Index® and other products. To ensure achievement of this commitment, a major project was established to recruit and train additional staff required to process the data, as well as to secure office space for them. A brief overview of the issues Derwent faced, and steps taken to address them, will be followed by statistics and a trend analysis demonstrating the impact of the law changes on the structure of Derwent WPI patent families. This will include first-to-file issues and the proportion of patent families containing only US applications. The range of technologies covered in US applications published during the first year will also be reviewed.

3:05 50 Living with AIPA: A patent vendor perspective.
David T. Dickens, Questel.Orbit, 8000 Westpark Drive, McLean, VA 22102, ddickens@questel.orbit.com, and Linda Williams, Questel S.A - SLIDES

Questel-Orbit offers traditional online and internet access to a large collection of intellectual property databases. The pre-grant publication (PGP) of US patent applications as of 15 March 2001 has posed quality and design issues for patent searchers, database producers, and patent vendors alike. Difficulties for producers include such design issues as formatting of the new 11-digit patent number format for crossfile searching, the merging of PGPs and grants in a single database and/or document, optimal handling of claims from different stages, managing continuations, CIPs, and divisionals, and patent family definition. Other technical issues include the handling of missing data elements and non-standard formatting of priority numbers. This paper discusses the ways that Questel-Orbit, as both vendor and database producer, has implemented four very different databases: Questel-Orbit's USAPPS fulltext database, IFI's CLAIMS IFIPAT, Derwent's World Patents Index, and Questel-Orbit's PlusPat.

3:35 51 Comparing US and European early-publication practices.
Stephen R. Adams, Magister Ltd, Crown House, 231 Kings Road, Reading RG1 4LS, United Kingdom, Fax: +44 118 929 9516, stevea@magister.co.uk

The United States is a late adopter of the early publication system, which has been well-established in Western Europe since the early 1960's. Consequently, most European patent information specialists have long been familiar with the characteristics, content and usefulness of such documents and the corresponding search databases. At first sight, the US legislation results in an analogous publication. However, this paper provides a more detailed examination of the operation of US 18-month publication, which reveals a number of significant differences when compared to European practice. These variations can have an impact upon the expectations of the searcher, the nature of their search tools and the results which can be obtained from them.

4:05 52 A patent searcher looks at the American Inventors Protection Act.
Stuart M. Kaback, Information Research & Analysis, Research Support Services, ExxonMobil Research & Engineering Co, 1545 Route 22 East, Annandale, NJ 08801, Fax: 908-730-3230, stuart.m.kaback@exxonmobil.com - SLIDES

Unexamined published patent applications have been with us for a long time. Non-examining countries such as Belgium and South Africa were early-publishing countries even before the Netherlands started, in 1964, to publish all applications 18 months after their priority dates. Most major patenting authorities eventually followed suit, and one might have supposed that when the US began to publish pre-grant applications in March of 2001 things wouldn't have changed that much, beyond the availability of additional documents in the English language. That turns out to have been an oversimplification; the US pre-grants are quite different from earlier published applications, in a number of ways. This presentation will examine a number of changes in patent information availability brought about by the American Inventors Protection Act.

4:35   Panel Discussion.


Section B

Analysis and Visualization of Chemical Information
Convention Center 101 B
L. O'Korn, Organizer
2:00 53 Evolving techniques to analyze and visualize chemical information.
Kim S. Dunwoody1, John L. Macko1, William F. Bartelt2, and Kurt W. Zielenbach3. (1) Research Department, Chemical Abstracts Service, P.O. Box 3012, Columbus, OH 43210, kdunwoody@cas.org, (2) CAS and Web Content Department, Chemical Abstracts Service, (3) Online Services Department, Chemical Abstracts Service - SLIDES

Techniques for analyzing and visualizing chemical structures and text have become instrumental in exploring, problem-solving, and decision-making. The purpose of evolving tools is to organize and elevate information, so that an investigator can gain insights that would otherwise be out of reach. This presentation includes practical tips for using STN Express and SciFinder and for launching other products. The examples include (1) an overview of developments in the field of chemical data analysis and visualization and (2) a case relating activities to a structural class of substances.

2:30 54 Integrating chemical information & visualizations to support scientific decisions.
Mark C. Surles, MDL Information Systems, Inc, 5910 Pacific Center Blvd., Suite 310, San Diego, CA 92121, Fax: 858-658-9463, surles@mdli.com - SLIDES

Discovery projects are under pressure to reduce the time to get compounds to clinical trials while improving their likelihood of success. At their disposal are high throughput data, predictive ADME-T tools, and historical, corporate know-how. Numerous applications exist that separately either integrate data, build predictive models, visualize chemical information, or exchange information among project members. This non-integrated, modular approach results in few users and poor exchange of information, because scientists must be proficient with numerous applications. Alternatively, systems that have integrated too much into a single interface have also had limited success because they either became unwieldy while trying to solve too much, or too simple because they eliminated domain specific features.

Software advances including multi-tier architectures and XML provide the tools to address this problem. This talk discusses some of the challenges of providing tools for scientific members of discovery projects that incorporate multi-disciplinary, disparate data into a usable, collaborative environment. Examples show a hybrid approach that provides a common denominator of visualizations, analyses, and data in a single front end for scientists, while supporting data exchange with other analysis applications and supporting databases. This approach can provide competitive advantage by including more scientists from more disciplines in the creative discovery process, while shortening the time to clinical trial by communicating project advances to all users in real time.

3:00 55 Visualization of chemical patents: Source titles and abstracts vs. enhanced titles and abstracts.
Anthony J. Trippe, Aurigin Systems, 3065 S. Hegry Circle, Cincinnati, OH 45238, Fax: 513-347-6260, atrippe@aurigin.com - SLIDES

In recent years text-mining software has been developed that allows analysts to organize and visualize large collections of documents without having to read and manually place each individual document. In particular, ThemeScape from Aurigin Systems allows large collections of documents to be clustered based on co-occurance of subject topics or themes. Once similar documents are group based on shared content they are visualized using a topographical map representation. These maps allow for relative document density to be measured based on the height of the content peaks and they allow for secondary relationships to be identified within a set by observing the relative distance between document clusters that are spatially close together.

In previous work, a map created from original patent titles and abstracts was compared to a map created using intellectually assigned, hierarchical classification. These maps were similar to one another but it appeared that the original title and abstract was a better analysis source for clustering documents by their function or use.

The current study will continue to explore this area by comparing a map created from the original titles and abstracts to the identical collection of chemical patent documents using enhanced titles and abstracts produced by Chemical Abstracts Service and Derwent Information. The discussion will revolve around differences and similarities in each approach and will attempt to provide information on which source provides the most valuable insight under different circumstances.

3:30 56 Identification and visualization of chemical series: Finding structural series in HTS data.
Stephan Reiling, Research, Tripos, Inc, 1699 S. Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241, sreiling@tripos.com

The talk will present Structural Unit Analysis, a new method to identify and visualize relevant structural series in chemical data. The method was developed for the analysis of HTS data, but is not limited to this. In addition to identifying the molecules that are members of a certain series, the method also identfies the structural features that the compounds share and are relevant to explain the observed activity.

4:00 57 Visualization and data analysis with VIDA.
Joseph J Corkery, OpenEye Scientific Software, 80 Kinnaird Street, #2, Cambridge, MA 02139, jcorkery@eyesopen.com - SLIDES

VIDA is a graphics program designed to visualize, manage, and manipulate large sets of molecular data such as vendor or corporate collections, multiconformer virtual libraries, or the results of computational experiments -- such as docking. Facilities such as visual list management, data filtering, SMARTS matching, spreadsheet and graphing utilities, pharmacological property calculation, and clustering are tightly integrated to 3-D and 2-D visualization. On a PC with 2 GB of RAM, VIDA can read and manipulate 1 million structures. Chemical property calculation features included are log P, 2-D polar surface area, rotatable bond count, number of heavy atoms, molecular weight, and presence/count of SMARTS patterns. Electrostatics calculations are done by an internal Poisson-Boltzmann solver. The spreadsheet supports typical formula creation (including chemical properties), sorting, and graphing. Versions are available for Windows, Linux, and SGI.


Section A

Text-Based Retrieval in Chemistry
Convention Center 101 A
J. Williams, Organizer
9:00   Introductory Remarks.
9:05 58 Overview of 15th Collective Index changes in policy at CAS.
Kathy J. Wolfgram, Ida Copenhaver, Mark E. Prince, and Linda Toler, Editorial Operations, CAS, 2540 Olentangy River Rd, Columbus, OH 43209, Fax: 614-461-7140, kwolfgram@cas.org - SLIDES

At the start of the 15th Collective Index (15CI) Period (2002-2006), CAS introduced several changes to indexing policies and practices to better serve customers. The presentation will summarize these changes. The implications of these changes to search strategies and search results will be addressed through several examples.

9:35 59 Searching the CA and CAplus files with the enhanced CAS Role Indicators.
Eva M. Hedrick1, Maria G.V. Rosenthal2, and Sandra L. Augustine2. (1) Database Quality Engineering, CAS, 2540 Olentangy River Rd., Columbus, OH 43210, Fax: 614-461-7140, ehedrick@cas.org, (2) Editorial Operations, CAS - SLIDES

CAS Role Indicators have been enhanced to provide additional access points in the leading fields of scientific research. As CAS scientists analyze the literature, they assign pertinent Role Indicators to each substance that is indexed. Role Indicators allow searchers to break down large answer sets into smaller groupings and to link these groupings to search for common trends. As the databases grow in size, this intellectual assignment by a specialist in the field is of great value to database searchers.

10:05 60 Creating a customized report in STN Express 6.0 with Discover.
Steven W. Yang, Olga Grushin, and Luray Minkiewicz, Leveraged Information Technologies, DuPont Inc, Experimental Station, Wilmington, DE 19880, Fax: 302-695-7731, steven.w.yang@usa.dupont.com

We have exploited the new report and table features in STN Express 6.0 with Discover to create customized search reports. We are able to prepare search reports with optional content from selected single or multiple session transcripts. Patent family information and patent application information can be displayed in table format for easy retrieve . The efficiency and effectiveness of postprocess of search results have been improved by using these tools. The new report features also enable the collaborations in multiple activities and information leverage. The Statistics function has facilitated the competitive analysis.

10:35 61 Uncertainty in retrieval from large databases.
Andrew Berks, Patent Dept, Merck & Co, RY 60-35, 126 E. Lincoln Ave, Rahway, NJ 07065, Fax: 732-594-5832, andrew_berks@merck.com - SLIDES

A recent talk by Sandy Lawson of Beilstein discussed a concept “Question-query-response, pick any two.” This presentation will further develop this idea and discuss query complexity and retrieval of records from large databases. Relationships between database structure, including indexing, query complexity, and retrieval are shown to depend on the relationship of the original question to the secondary indexing of the database. Queries can be closely related to the original question or the database structure. A query closely related to the database structure is addressed by the indexing of the database, but this limits the nature of the questions that can be posed directly to a database. A complex query, not directly addressed by database indexing, is shown to have limits to completeness of retrieval. An uncertainty equation is developed, relating retrieval, the original question, and a variable based on the complexity of the question and indexing in the database. Natural language interfaces provide a solution to the problem of query complexity, but at a cost of relevant retrieval.

11:05 62 DIALOG in a dot com labyrinth: Text based information retrieval in a graphical user interface culture.
James J. Heinis, 11000 Regency Parkway #10, DIALOG Corporation, Cary, NC 27511, jim_heinis@dialog.com - SLIDES

Chemical information may be accessed through graphical, text or combination interfaces which conceal the underlying database structure and strategies from nonspecialist searchers. In a traditional online service (e.g. DIALOG), the crux of search and retrieval depend on system-wide context breadth, indexing consistency and the ease of coordinating search results between databases (e.g. MAP command or equivalents). Consistent indexing is the cornerstone for multivariate cluster analysis which is the foundation for bibliometrics and data mining. Retrieval from web sources rely on the initial credibility of the source (e.g. governmental, international or technical societies are considered more reliable) and effectiveness of the search engine. Spider web crawler based search engines retrieve only static web pages that are linked to other pages but do not index content which is not in flat HTML format (e.g. image, audio, video or Adobe PDF) or is dynamically generated in response to a query. These non-indexed material forms the "deep or invisible web." Web based search engine ratings of ranking or relevance may be skewed by design or economic considerations. In contrast, traditional online searvices offer access to an orderly selection of well defined databases with a well defined search engine, indexing structure and standardized search language with interfaces which may be implemented to ease users into the full capabilities of the search language as implemented on the system. This paper will outline the merits of textual based retrieval on a commercial online system by providing examples of DIALOG searches in pharmaceutical data, use of lesser known files with kinetic information, isolation of prior art data, linkage to patent information, retrieval of records and generation of summary data.

11:35   Lunch Break.


Section B

Text-Based Retrieval in Chemistry
Convention Center 101 A
J. Williams, Organizer
9:00 63 Chemical information is more valuable in context: DiscoveryCenter(TM) as a chemical information scaffold.
Mitchell A. Miller, and Andrew Payne, NetGenics, Inc, 955 Ridge Hill Lane, Suite 30, Midvale, UT 84047, mmiller@netgenics.com - SLIDES

In the current research environment, data is certainly not in short supply. Every organization has an abundance of chemical structure, screening and property data. Putting data of these types of together in a coherent way so researchers can make the best-informed decisions about which compounds to pursue is more of a challenge. To this end, NetGenics has developed DiscoveryCenterTM, a software environment that provides an integrated view of chemical and biological information held in both internal and external repositories. DiscoveryCenter allows researchers to search on and view chemical structures in the context of screening data, biological sequences, analytical testing data, etc. What's more, its flexible architecture allows us to plug in the user's choice of data sources, including molecular property calculators. The right information in the right context gives researchers just what they need.

9:30 64 Molecular shape graphs..
W. Todd Wipke, John Lawton, and Holly Hendrick, Molecular Engineering Laboratory, Department of Chemistry and Biochemistry, University of California, Santa Cruz, CA 95064, wipke@chemistry.ucsc.edu

Molecular shape comparison is a complicated undertaking. A graph-based representation of molecular shape is attractive in that it may be possible to leverage preexisting graph-theoretical algorithms to simplify molecular shape comparison. In this paper, we present methodology for deriving topographical graphs, a graph-like, high-level representation of molecular shape, which are considerably simpler than the molecules from which they were derived. The nodes in a topographical graph correspond to surface segments that possess a given topography, while edges denote the adjacency of the surface segments. In addition to the graph-theoretical properties, the nodes have three-dimensional position and the edges have length. We will present examples of topographical graphs generated for a variety of molecules and will illustrate the potential benefits of this representation.

10:00 65 Nonlinear mapping of massive combinatorial libraries: beyond enumeration.
Dimitris K. Agrafiotis, Victor S. Lobanov, and Huafeng Xu, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Drive, Exton, PA 19341, Fax: 610-458-8249, dimitris@3dp.com - SLIDES

Nonlinear mapping (NLM) is a collection of statistical techniques that embed a set of patterns described by a dissimilarity matrix into a low-dimensional display plane in a way that best preserves their original pairwise relationships. Unfortunately, current NLM algorithms are notoriously slow, and their use is limited to small data sets. In this paper, we present a family of algorithms that combine iterative nonlinear mapping techniques with neural networks, which makes it possible to handle very large data sets that are intractable with conventional methodologies. The method employs a multidimensional scaling algorithm to project a small random sample set, and then 'learns' the underlying transform using one or more multi-layer perceptrons. The distinct advantage of this approach is that it captures the nonlinear mapping relationship in an explicit function, and allows the scaling of additional patterns as they become available, without the need to reconstruct the entire map. This methodology is broadly applicable and can be used with a wide variety of input data representations and similarity functions. It is shown that in the case of combinatorial libraries, it is possible to predict the coordinates of the products on the nonlinear map from pertinent features of their respective building blocks, and thus limit the computationally expensive steps of virtual synthesis and descriptor generation to only a small fraction of products. In effect, the method provides an explicit mapping function from reagents to products, and allows the vast majority of compounds to be projected without constructing their connection tables.

10:30 66 Interactive exploration of high volume datasets using HiVol and HiStats.
David Baker, and Ralph Walden, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, dabaker@tripos.com

HiVol and HiStats are new software tools for analyzing and visualizing the large datasets typical of high-throughput synthesis and screening efforts. Chemical and property data for over a million compounds can be readily calculated, filtered, sorted, and graphed. Datasets can be interactively and iteratively partitioned into subsets based on 2D structure searching, diversity/similarity, registration IDs, and property range. Multiple databases and subsets are simultaneously accessible, each displayed in a spreadsheet complete with 2D structures and associated properties. Additional visualization tools include scatter plots, histograms, and dendrograms. HiStats calculates univariate statistics, performs hierarchical clustering, and builds regression models that profile the properties of large datasets in order to guide follow-up experiments.

11:00 67 Structural class-based analysis, reasoning, and visualization.
Terence K. Brunck, Bioreason, Inc, 150 Washington Ave Ste. 303, Santa Fe, NM 87501, terry.brunck@bioreason.com - SLIDES

Given the rapidly growing body of data being generated by automated synthesis and screening technologies, analysis and decision-making processes are becoming over-whelmed. One approach to the analysis, reasoning, and visualization of such large amounts of data is the use of homogeneous structural classes as the basis for analysis. This approach enables the characterization and prioritization of groups of compounds rather than individual compounds. Methods to generate and use such classes will be presented. Benefits resulting from class-based analysis, including noise detection, predictive modeling, and similarity screening will be described.


Text-Based Retrieval in Chemistry
Convention Center 101 A,
J. Williams, Organizer
1:30   Introductory Remarks
1:35 68 Support tool for searching the CA and CAplus files.
Ida L. Copenhaver, and Alan E. Amos, Editorial Operations, Chemical Abstracts Service, P.O. Box 3012, Columbus, OH 43210, Fax: 614-447-3906 - SLIDES

With completion of the addition of bibliographic and abstract data to the CA and CAplus Files, CAS recognizes the need for providing support tools that facilitate access to information in the 1907 to current time period. CA production policies and practices have been modified over time. Changes have occurred in the key focus areas of science. A new support tool to aid in developing broad subject-based searching strategies will be discussed.

2:05 69 Lexicon-enhanced text searching for biological information in the CAS databases.
Sabine P. Kuhn, Al E. Amos, Margaret T. Haldeman, and Cynthia Liu, CAS, Columbus, OH 43202-1505, Fax: 614 447 3713, skuhn@cas.org - SLIDES

CAS is best known for its chemistry, but its databases contain a wealth of biological information. Today nearly 40% of the document references in the CAS databases focus on biology, life sciences and medical sciences. The key to the CAS databases, and thus to accessing the world's literature, is CAS' controlled vocabulary. In the CA Lexicon on STN the controlled vocabulary is organized under specific topic areas, and provides relationships between synonyms, broader, narrower, and related terms. CAS is keeping pace with science, evolving with developing terminology in the literature and always providing users with accurate access points to the abundance of scientific knowledge. Meaningful indexing of biological chemistry requires the accurate recording of sources and targets of this chemistry. The CA Lexicon on STN captures this information in a detailed and precisely organized collection of headings: anatomical, taxonomic, biological process, biological property, and biological activity. Additional access to biological subjects is provided by an exhaustive collection of synonyms. The CA Lexicon links these biological headings to relevant chemicals. In-depth hierarchical arrangements of these headings make it easy to search for specific substances across broad classifications. The biological information collected in the CA Lexicon is an indispensable complement to its chemical content.

2:35 70 Keeping up with the Jones’s. Text based searching for competitor intelligence.
Richard W Neale1, Paul Sayer1, Gez Cross1, and Steve Hajkowski2. (1) Product Development Group Chemistry & Life Sciences, Derwent Information UK, 14 Great Queen Street, Holborn, London, United Kingdom, Fax: +44 207 344 2911, richard.neale@derwent.co.uk, gez.cross@derwent.co.uk, (2) Online Training Department, Derwent Information

The chemical industry is one of the world’s largest and most competitive industries with a total turnover in 2000 in the region of US$ 1,200 Billion. Industry organisations and governments recognise that the key to improve competitiveness is to pursue sound Research and Technical Development (R&TD) programmes. As a consequence, the chemical industry is one of the world’s largest sponsors of industrial R&TD.

With information providers producing a wide range of valuable databases, offering various text searching methodologies, text based searching has proved to be an essential tool in aiding competitor intelligence. However, with increasing volumes and varying types of chemical information, how can searchers be sure of obtaining high quality results?

This paper will concentrate on the use of varying classification & indexing systems for the search and retrieval of patent information. It will also examine their efficiency in the retrieval of chemical information when used individually and in combination

3:05 71 Not just full text articles: A study for the Search function among chemistry electronic journal websites.
Song Yu, Libraries, Purdue University, Mellon Library of Chemistry, 1538 Wetherill, West Lafayette, IN 47907-1538, Fax: 765-494-1579, syu@purdue.edu

Besides providing full text articles online, almost all electronic journal web sites offer the Search function to their users. However, few users utilize this helpful tool.

This presentation will focus on testing, analyzing, and comparing the search features among chemistry electronic journal websites. The web sites are chosen from those to which Purdue University has full-text access.

3:35 72 Text-based chemical information locator from the Internet using commercial barcodes.
M Karthikeyan1, S Krishnan1, and Christoph Steinbeck2. (1) Scientific Management Information Systems, National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India, Fax: +91-20-5893973, karthi@ems.ncl.res.in, (2) Max-Planck-Institute of Chemical Ecology, Carl-Zeiss-Promenade 10, Jena D-07745, Germany, Fax: 364-164-3665, steinbeck@ice.mpg.de

In Chemistry where most of the information is related to molecular structures it is necessary to transform those structures into an textual equivalent before defining a query for a search. Many tools are available to transform pictorial chemical structure into equivalent chemical names. Frequently, however, there is a need to search chemicals by common/trade names. This necessitates the development of internet based tools to search common names or traditional names for given structures. As a case study a tool CILI (Chemical Information Locator from Internet) was developed to retrieve structure related chemical information using a text-based approach (Fig. 1). The input for the system is performed via the keyboard or a drawing applet or barcode scanning. In any of the cases the final output is SMILES. After an error checking procedure the user is prompted to enter information for a substructure search or superstructure search. In the substructure search the core skeleton is computed by reducing noise groups. The details are presented.


Informatics Challenges with Mergers and Acquisitions
Convention Centrer 101 A
R.W. Snyder, Organizer
8:00 73 UCB pharma informatics challenges: Merging databases is also sharing culture and knowledge.
Eddy Vande Water1, Didier Berckmans1, Didier Chalon1, Anna Toy-Palmer2, and David Wei2. (1) UCB S.A. Pharma Sector, Chemin du Foriest R4, Braine-l'Alleud B-1420, Belgium, Fax: +32 2 386 27 04, eddy.vandewater@ucb-group.com, (2) UCB Research Inc, 840 Memorial Drive, Cambridge, MA 02139, Fax: +01 617 5478481, david.wei@ucb-group.com

Following the acquisition of the research group of Cytomed, Inc. in Cambridge, Massachusetts in October 1998, it became necessary to provide electronic support to all of UCB Pharma Sector’s researchers. In order to facilitate sharing data between the two sites, the creation of a global database was decided upon. The ideal database would be efficient, store information about research products, their properties and their test results, and be accessible by all of the research scientists from both Cambridge and Braine-l’Alleud, Belgium. A project team, including people from Global Research and from IS (Information Systems), was mandated by the Research Committee in February 1999 to fulfill this mission.

The first step was to try to understand the research process at both sites. After many meetings and accounting for differences in company size, work methods, and cultural habits, a common database structure was agreed upon.

As a result of this key project, a single database now holds essential scientific data, with automated local copies for better access and query performance. Moreover, data specific for each site are stored in local databases that are copied regularly between the two sites. To ensure the required performance level, a new database server and a direct network line between Boston and Braine-l’Alleud were also put into service. Besides the technical improvements, one of the most important aspects of this project was the development of a new communication and work tool for Global Research, which will help us to combine our skills and our scientific know-how. It is also a very good illustration of collaboration between R&D, IS and IT (Information Technology)!

8:30 74 The long march towards integrated research IT systems.
Dieter Poppinger, R&T INformation Services, Syngenta Crop Protection AG, WRO-1060.7.24, CH-4002 Basel, Switzerland, Fax: +41 61 323 2540, dieter.poppinger@syngenta.com

After the merger of the agrochemical businesses of Novartis and Zeneca, the new company Syngenta embarked upon a number of major integration initiatives in the IT area. To enable Syngenta Research to operate effectively across functions and locations, projects are underway to adapt or integrate all major software systems which support chemistry- and biology-related research. The talk will address the organizational, technical, and human challenges which Syngenta faces in these projects, and describe their current state.

9:00 75 A cheminformatics system for stereochemical structures.
Ping Du, Lexicon Pharmaceuticals, 279 Princeton-Hightstown Road, East Windsor, NJ 08520, Fax: 609-448-8299, pdu@lexpharma.com - SLIDES

Lexicon Pharmaceuticals, formerly Coelacanth Corporation, is a division of Lexicon Genetics, Inc. After the acquisition in July 2001, integrating informatics capabilities has become a key in building a new enterprise-wide drug discovery platform. One difficult cheminformatics problem that we worked on was to develop a solution to register chemical libraries with multiple relative chiral centers. Structure matching functions for chemical structures with more than one relative chiral centers do not exist “out of the box”. These structures represent a mixture of two or more stereoisomers. We have extended the Accelrys Accord chemistry engine to match such structures at library registration. While storing only one chemistry object in Oracle, we are able to create multiple stereoisomers using a set of structure templates. This technique is composed of a set of software components, including Oracle tables and packages, Oracle external procedures for stereoisomer generation, and a desktop manager application. This system manages the chemical structures of over 200,000 internal compound collection. Chemical registration tools are in place to register new compounds with relative chiral centers. Intranet applications have been developed to query these structures and display stereo centers in color according to their absolute or relative properties.

9:30 76 Building a unified drug discovery database within Celltech.
David M. Parry, John Bird, John Rogers, and James Petts, Celltech R&D Ltd, Granta Park, Great Abington, Cambridge CB1 6GS, United Kingdom, Fax: +44 1223 896400, David.Parry@cam.celltechgroup.com - SLIDES

In 1999 the merger of Celltech and Chiroscience created a medium sized research organisation, along with all the overheads of differing ways of working with drug discovery data. Over the last year a major project within the organisation has focussed on the building of a new Unified Drug Discovery Database, shortened to UD3. The existing research organisations within both Celltech and Chiroscience used the IDBS Activitybase and MDL ISIS products but in differing ways, finding an interim solution to enable the research work to continue and the development of a longer term solution were high on the post merger priorities. An overview of the current status of this project, in the light of the challenges posed by the merger will be presented.

10:00 77 Key factors and technologies in the successful deployment of integrated informatics systems.
Bill Langton, Mike Higgins, Ramesh Durvasula, and Denise Beusen, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241, bill@tripos.com

Any realization of the elusive economy of scale that has driven pharmaceutical mergers is critically dependent on sophisticated decision support systems for creating synergies not only between scientific disciplines, but also between business units. Key challenges in implementing integrated informatics systems include: multiple, often geographically disparate database installations; different database formats; the sheer quantity of data generated by high-throughput technologies; customizable tools for simultaneous browsing of chemical, biological, and physical data; project tracking to improve knowledge management and reporting of results; and applications that increase research productivity by mining enterprise knowledge of both earlier and later stages of research. We will discuss our experience in designing integrated informatics systems as well as crucial technologies that enhance the probability of their successful deployment

10:30 78 Managing the collection of HTS compounds through suppliers.
Nanhua Yao, Shahul Nilar, Vesna Stoisavljevic, Paul Diaz, Maja Stojiljkovic, Jean-Luc Girardet, Haoyun An, Jingfan Huang, Eugene Chang, Robert Hamatake, and Zhi Hong, ICN R&D, 3300 Hyland Avenue, Costa Mesa, CA 92606, nyao@icnpharm.com - SLIDES

The parsing and selection of compounds from commercial suppliers is an integral part of High Throughput Screening (HTS) efforts in drug discovery. In dealing with multiple suppliers and regular updates, it is necessary to develop a strategy that avoids duplicate entries not only among the suppliers, but also between the updates and the historical collection. An additional issue occurs when suppliers provide target-specific focused libraries that need specific considerations. We present a strategy that narrows the number of compounds provided by the suppliers before subjecting the collection to molecular diversity based calculations. All structures from suppliers are collected in a Master Database; which also include duplicate entries. The initial sets of criteria are modifications to Lipinski’s rule of five; with changes to the rules based on target and candidate-specific experience gathered at ICN. Structures that filter through this step are further analyzed for the presence of “reactive groups” that would make the candidate not suitable to be included. The number of heavy halogen atoms in the structures, defined as Group 7A elements heavier than Fluorine, is then employed to further refine the collection of compounds. Similar to the Lipinski’s rules modifications, the definitions for the “reactive groups” and the number of heavy halogens are based on the in-house screening experiences. The resulting set of structures are fingerprinted (in a 2-D sense) using the MACCS keys / Tanimoto matrix, clustered using the Jarvis-Patrick clustering algorithm. One entry from each resulting cluster (usually the first member in a multi-structure cluster) is chosen as being representative of each cluster. The lack of stock quantities of compounds due to the merger of suppliers or the purchase of suppliers by pharmaceutical companies can lead to voids in the diversity space of the intended compound collection. In such cases, candidates from a different supplier in the same cluster can be chosen. For singleton clusters it then is necessary to search the master database containing all compounds entries that are similar (in diversity space) to the particular structure of interest. Purity and solubility issues in the choice of suppliers for the acquisition process will also be discussed.

11:00 79 Migrating Chemical Information – a vendors perspective.
Andew Lemon, Chemical Technology Group, IDBS, 2 Occam Court, Surrey Research Park, Guildford GU2 7QB, United Kingdom, Fax: 44 1483 595001, alemon@id-bs.com

IDBS provides a flexible and integrated set of solutions for chemical registration, and reaction knowledge management. Many of our customers have migrated from existing solutions. This has required IDBS to add specific features to the design of our product offerings to support this migration process. In these times of merger and acquisition many groups are facing the problems of merging information from multiple vendor systems into one chemical database. This raises a variety of issues that must be supported by any vendor providing a solution to manage this data. We present a set of issues raised from our experience in dealing with this problem and explain some of the solutions we have generated to address these issues.

11:30 80 The changing requirements for informatics systems during the growth of a drug discovery service company.
Sally Rose, Sittingbourne Research Centre, BioFocus plc, Sittingbourne, Kent ME9 8AZ, United Kingdom, Fax: +44 1795 471123, srose@biofocus.co.uk - SLIDES

BioFocus was launched and established as a public company March 1997. The company raised circa $1.8M when it launched on the Ofex stock market in London. This is a relatively small amount of money with which to set up a company offering medicinal and combinatorial chemistry discovery services to the biopharmaceutical industry. The initial informatics systems needed to be very cost effective and appropriate to a small company. They centered on ISIS Base and Accord for Access.

The company has since grown to circa 160 staff. The majority of the growth has been organic, supported by earned income from clients, though some additional finance has been raised (e.g. circa $6M in August 2000 when the company moved to the AIM stock market in London). As the company grew, the demand for a more sophisticated, flexible informatics system increased. We started implementing an Oracle-based system in 2000 to support the chemistry informatics requirements.

BioFocus acquired a biology service company, Cambridge Drug Discovery, in 2001. This brought a new dimension to the business with the addition of HTS and biological assay development services to our portfolio. Needless to say, the informatics systems of the two companies were completely different and access to chemistry and biology information was required company-wide to support full drug discovery projects for our clients.

Major pharma companies have enormous legacy systems and mergers result in a vast amount of work for the informatics departments. Small companies, such as BioFocus, have far less data to worry about, however, they are faced with different challenges; namely, limited budgets and less in-house expertise.

This presentation will discuss the evolution of the informatics systems at BioFocus and describe the chemistry databases we developed to handle medicinal chemistry and combinatorial library data. It will also consider how we approached merging the information from the biology and chemistry groups.


General Papers
Convention Center 101 A
R.W. Snyder, Organizer
2:00 81 Managing the analytical workflow – From raw material to elucidated structure.
Euan Dean, and Andrew Lemon, Chemical Technology Group, IDBS, 2 Occam Court, Surrey Research Park, Guildford GU2 7QB, United Kingdom, Fax: 44 1483 595001

ActivityBase provides an integrated framework for managing discovery data. We describe a solution to enable scientists to utilize the test, workflow and results management capabilities of the ActivityBase environment to manage spectral and chromatographic data. This manages not only the capturing and processing of data, but also the organization of data collection. The ActivityBase test management module provides features for the generation and tracking of a set of requests for services. This has been applied to analytical services, supporting sample management, analytical requests, collection and processing of the spectral data generated, validation and elucidation of structural information leading to a confirmed structure. This includes integration of spectral management software within the ActivityBase framework.

2:30 82 Predicting reaction parameters for library synthesis accelerated by an in-house reaction database.
László Ürge, Gábor Põcze, Anna Gulyás-Forró, Ferenc Darvas, and György Dormán, ComGenex Inc, 33-34 Bem Rkp, Budapest H-1027, Hungary, dgyorgy@comgenex.hu

Until recently organic chemists have derived their knowledge about chemical reaction conditions by inductive learning from observations on a sequence or series of individual chemical reactions. During experimental design optimal reaction parameters were estimated by analogy of the individual synthesis of related compounds. Initially, for high throughput parallel synthesis of combinatorial libraries the above estimation from individual reactions was generalized applicable and in some cases stochastic simulation methods were also successfully used [1]. However, the rapid advancement of combinatorial chemistry allowed an accumulation of data on various aspects including organic reactions. The analysis of the large datasets and exploiting their predicting power is an emerging area in the combinatorial sciences. During the recent years ComGenex has applied a large number of chemical reactions and automated them using parallel synthesis stations yielding several hundred thousand compounds. All the protocols, reaction parameters including structures, experimental conditions, yield, success rate and analytical results are stored in a searchable format using ComGenex proprietary SQL based information technology system. The large datasets enable a more accurate prediction of the optimal parameter set for synthetic matrix planning taking into account the chemical nature of the reagents in various reaction types. In the presentation non-algorithmic components of the prediction as well as quantitative elements are presented. Qualitatively, for each reagent class a reagent fingerprint can be determined based on the observed reactivity with different substitution pattern in different types of reactions, and reaction families can be identified finding similar fingerprints. Quantitatively, the reaction database is appropriate to apply mathematical algorithms to calculate the optimal chemical parameter sets important for the most efficient performance of the high throughput, parallel synthesis.

[1] Darvas F and Kovács L, CMT: A solution phase combinatorial approach. Synthesis and yield prediction of phenazines. In: High-Throughput Screening (Ed. Devlin JP), pp. 223-242. Marcell Dekker Inc., New York, 1997.