American Chemical Society 223rd National Meeting
Orlando,
FL, April 7-11, 2002
Division of Chemical Information (CINF)
ABSTRACTS
![]()
Chemical Descriptors I
![]()
CINF 1: An efficient representation for chemical
descriptors
Jeffrey M. Blaney, and J. Kevin Lanctot,
Computational Sciences, Bristol-Myers Squibb Pharma Research Labs, Inc, 150
California St, Suite 1100, San Francisco, CA 94111, Fax: 415-732-7170,
jblaney@combichem.com
Abstract
3D-pharmacophores have become popular
in the last decade as descriptors for chemical similarity searching and
combinatorial library design. These pharmacophores are typically encoded in
large bitstrings, up to hundreds of millions of bits long. Research on other
classes of chemical descriptors suggests that "chemical space" requires on the
order of 10-20 dimensions. Many of the millions of dimensions are correlated or
irrelevant and their large size makes them a less efficient representation for
computation. We have developed an approach based on classical multidimensional
scaling to determine the inherent dimensionality of these large chemical
bitstrings (pharmacophore or otherwise), resulting in real-valued cartesian
coordinates. We have also developed a novel approach to calculate much smaller
bitstrings (hundreds of bits) that preserve the original pairwise similarity of
the original large bitstrings. The new smaller space is general enough that it
can be used for many common descriptor-based library design approaches:
coverage, diversity, D-optimal, or informative design.
CINF 2: A hierarchy of structure
representation
Johann Gasteiger, Thomas Kleinöder, Jens
Sadowski, Markus Wagener, and Markus C. Hemmer, Computer-Chemie-Centrum and
Institute of Organic Chemistry, University of Erlangen-Nuremberg,
Naegelsbachstr. 25, Erlangen 91052, Germany, Fax: +49-9131-85 26566,
Gasteiger@chemie.uni-erlangen.de
Abstract
Modern drug design generates massive
amounts of data that have to be related to chemical structures. Therefore,
methods are needed that encode the physicochemical effects of chemical
structures responsible for biological activity which are simultaneously
applicable to large sets of molecules. We have developed a hierarchy of
structure representations that start from the constitution of a molecule,
proceed to 3D structures by using CORINA and then calculate molecular surfaces.
A host of physicochemical effects such as charge distribution, polarizability,
inductive and resonance effects as calculated by the PETRA package can be
combined with each level of structure representation. Application of these
structure encoding schemes to the definition of diversity, analysis of
high-throughput experiments, and quantitative structure-activity relationships
will be shown. Thus, these methods have their value in lead finding and
optimization.
CINF 3: Use of molecular descriptors based on
medicinal chemistry building blocks
Paul E. Blower
Jr.1, Kevin Cross1, Michael Fligner2, and
Joseph Verducci2. (1) LeadScope, Inc, 1275 Kinnear Rd, Columbus, OH
43212, Fax: 614-675-3732, pblower@leadscope.com, (2) Ohio State
University
Abstract
LeadScope™ provides a large set of
molecular descriptors based on structural features commonly used for
experimental design in drug discovery programs, the building blocks of medicinal
chemistry. The software performs a systematic substructure analysis using
predefined structural features stored in a feature library. The features
represent a wide range of structural specificity from very specific
substructures such as benzene, 1-hydroxymethyl, 3-methoxy- to generic features
such as pharmacophores which are pairs of generalized physiochemical atom types.
At the present time, the feature library contains over 27,000 structural
features. We have also developed a new association coefficient for diversity
analysis that overcomes intrinsic biases of the Tanimoto and Hamming
coefficients. This paper will describe the content and organization of the
LeadScope™ molecular descriptors, give details of the new association
coefficient, and contrast the three coefficients for selecting diverse sets of
compounds from a large collection of known drugs.
CINF 4:
Molecular descriptors as a tool for data mining the Registry
file
Jeffrey M. Wilson, and Roger J. Schenck, Authority
Database Operations, Chemical Abstracts Service, 2540 Olentangy River Road,
Columbus, OH 43202, Fax: 614-447-3713, jwilson@cas.org
Abstract
The CAS Registry file is among the
world's largest virtual screening libraries. The addition of a variety of
calculated physical properties to Registry allows data mining of this database
in ways that were never before possible. The searchable nature of these
molecular descriptors allows the user not only to refine searches based on types
of properties and specific value ranges but also to visualize the relationships
of these values for groups of similar structures.
CINF 5: Multiresolution analysis of topological
representations of structural and physico-chemical properties of pharmacological
molecules
John Binamé, Laurence Leherte, and Daniel P.
Vercauteren, Laboratoire de Physico-Chimie Informatique, Facultés Universitaires
Notre-Dame de la Paix, 61, Rue de Bruxelles, Namur B-5000, Belgium,
john.biname@fundp.ac.be
Abstract
The 3D structure of most biological
receptors may still be difficult to obtain experimentally. Theoretical methods
present thus interesting alternatives to elucidate the corresponding
pharmacophore elements. Our goal is to develop new methods to propose
pharmacophores based on reduced representation of the electron density of potent
ligands.
Such representations are expected to be composed of few relevant element (one or two) for each chemical function in the molecule. To determine the position and the properties of such elements, we applied a topological analysis algorithm to calculated electron density maps of the molecules. Pharmacophore elements are thus identified as the critical points of the electron density.
Preliminary results are presented for small sets of molecules at different crystallographic resolutions (2.0 to 5.0 Å) to determine the best representation. Properties of the critical points are then statistically analyzed for broader sets of molecules to evaluate their transferability to various families of molecules.
![]()
Chemical Descriptors II
![]()
CINF 6: Combinatorial descriptors for virtual
screening
Victor S. Lobanov, Dimitris K. Agrafiotis, and
Huafeng Xu, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Dr., Suite
104, Exton, PA 19341, Fax: 610-458-8249, victor@3dp.com,
hxu@3dp.com
Abstract
The advent of combinatorial chemistry
has sparked renewed interest in the use of molecular descriptors for virtual
screening. Whether it is based on molecular diversity, molecular similarity or
structure-activity correlation, the design of a combinatorial experiment usually
involves the enumeration of every product in the virtual library, and the
computation of key molecular properties that are thought to be pertinent to the
application at hand. Unfortunately, this simplistic approach collapses with
large combinatorial libraries, which often defy enumeration. Recently, we
presented a machine learning approach that allows the prediction of product
descriptors from pertinent features of their respective building blocks, thus
limiting the computationally expensive steps of enumeration and descriptor
generation to only a small fraction of products. In this paper, we present the
application of this technique to several popular sets of descriptors, introduce
hybrid schemes that reveal their mutual redundancy, and present several
algorithmic enhancements aimed at improving the quality of the predictions.
CINF 7: Prediction of drug solubility: cohesive
interactions modeled by Monte Carlo simulations
Anton
Filikov, Syrrx, Inc, 10450 Science Center Drive, San Diego, CA 92121, Fax:
(858) 623-0460, anton.filikov@syrrx.com
Abstract
Monte Carlo (MC) simulations in
torsion angle space have been used to model cohesive interactions in solid
phase. Each simulation consists of sampling 3 million conformations of a
tetramer of a compound followed by sequential fully flexible MC docking of
additional 16 molecules onto the crystalline nucleus. The following descriptors
of the solid phase interactions are calculated: van der Waals and hydrogen bond
interaction energies, torsion angle strain energy, the difference between
hydrogen bond interaction energies in the solid phase simulation and in
solution, number of rotatable bonds and number of rotatable bonds without
symmetrical groups. The descriptors for the solution phase include solvation
energy calculated via atomic solvation model, surface tension solvation energy,
Poisson electrostatics, polar surface area, clogP, etc. The descriptors
calculated for an extensive set of drug molecules have been used to derive
several regression equations to predict solubility. The accuracy of this
approach will be compared to other methods.
CINF 8: Collection of chemically intuitive molecular
descriptors proven as highly effective and fast predictors of ADME
properties
Robert Fraczkiewicz1, Boyd
Steere1, and Michael B. Bolger2. (1) Life Sciences
Department, Simulations Plus, Inc, 1220 West Avenue J, Lancaster, CA 93534, Fax:
(661) 723-5524, (2) Department of Pharmaceutical Sciences, USC School of
Pharmacy
Abstract
The interest of pharmaceutical
companies in time- and cost-effective methods of in silico drug lead screening
has been growing rapidly. Methods for estimation of ADME properties before
respective molecules are actually synthesized plays a pivotal role in this
process. A majority of these methods use descriptors of molecular structure as
input variables to predictive mathematical models. This presentation illustrates
how novel algorithms developed at Simulations Plus for rapid computation of an
original set of unique molecular descriptors lead to high-performance predictive
models of ADME properties.
CINF 9: An
efficient bitmap container package for very high-dimensional
fingerprints
Peter Fox1, Lars
Naerum2, Henning Thogersen2, Robert Clark1, and
Trevor Heritage1. (1) Research Department, Tripos, Inc, 1699 South
Hanley Road, St. Louis, MO MO 63144, (2) MedChem Research, Novo Nordisk
A/S
Abstract
Most existing bitmap containers
used to store fingerprints for molecular descriptors are poorly suited for
descriptors that can span a very large solution space. We present a fingerprint
(bitmap) container that uses a compression scheme to characterize the bitmap,
while allowing on-the-fly bitmap operations on the compressed bitmap, with no
need to decompress it first. We are using this approach very successfully with
pharmacophoric multiplets, where the solution space, and consequently the bitmap
size, is extremely large. In particular, compression makes it possible to carry
out stochastic sampling of the full conformational space, obviating the need to
consider only fixed torsional increments. Examples and detailed performance
analyses will be presented
CINF 10: Controlling degeneracy with the extended
valence sequence Signature molecular descriptor
Jean-Loup
Faulon1, Carla J Churchwell1, and Donald P
Visco2. (1) Computation, Computers and Mathematics, Sandia National
Laboratories, P.O. Box 969, MS 9951, Livermore, CA 94551, Fax: 925-924-3020,
jfaulon@sandia.gov, (2) Department of Chemical Engineering, Tennessee Tech.
University
Abstract
We present a new molecular
descriptor named Signature based on extended valence sequence. The new
descriptor can be computed and store efficiently. We rigorously prove that all
topological indices (TIs) based on counts of walks, paths, and distances are
computable from signature. The degeneracy of signature and popular TIs is then
computed for homogenous series of alkanes, alcohols, fullerene-type structures,
and peptides. We believe this study to be the first where degeneracy is
systematically probed for homogenous molecular series. Results indicate that
signature is the only molecular descriptor that can fully control degeneracy. As
a general rule, we find that hydrocarbon structures comprising n non-hydrogen
atoms are uniquely characterized by signatures of height n/4, while peptides up
to 5,000 amino acids can be singled out with signatures of heights as small as 2
or 3. Aside from signature, Kier and Hall total topological index exhibits low
degeneracy as well.
![]()
ADME/Tox Informatics: Theory
![]()
CINF 11: Qualitative structure-property approach to
ADME/TOX using Idiotropic Electrostatic and Steric Field
Orientation
Philippa R. N. Jayatilleke, and Robert D.
Clark, Research Department, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO
63144, Fax: 314-647-9241, pjayat@tripos.com
Abstract
We will present a new strategy to
model the complex properties of ADME/TOX by combining three-dimensional
molecular interaction fields with soft independent modeling of chemical analogy
(SIMCA). It is necessary to consider “bulk” properties that contribute towards
solubility and penetration, as well as specific structural features when
modeling a compound’s pharmacokinetic profile. Models are constructed from data
sets aligned using a variation of the inertial field orientation (IFO) method.
The new approach uses principal axes derived from both steric and electrostatic
fields. By utilizing both fields, we can more closely relate the CoMFA or CoMSIA
field type descriptors to structural properties. We have explored several
computational schemes including testing conformational sensitivity and assessing
the influence of the molecular field types used. Models will be presented for
human intestinal absorption, blood-brain barrier penetration and oral
bioavailability, demonstrating a tool for developing leads with enhanced
therapeutic potential.
CINF 12:
Getting the ADME properties right through property-based
design
Han van de Waterbeemd, PDM, Department of Drug
Metabolism, Pfizer Global Research and Development, IPC351, Sandwich CT13 9NJ,
United Kingdom, Fax: 01304-651817, han_waterbeemd@sandwich.pfizer.com
Abstract
Physicochemical properties, including
solubility, permeability, lipophilicity and pKa, have been widely used to
optimise biopharmaceutical and pharmacokinetic properties of drug candidates.
Therefore various screens have been developed to assess such properties in high
throughput. An overview of such technologies will be given. For the evaluation
of virtual compounds and libraries computational methods have been developed and
others are currently in progress to address the various ADME properties as early
as possible. Analyses of the key molecular properties of drugs in collections
such as the World Drug Index led to the formulation of simple rules such as the
rule-of-5. These rules can now be used in property-based design of hits, leads
and drugs. Examples of relationships between molecular properties and ADME
endpoints will be presented and challenges discussed.
References
D.A. Smith, H. van de Waterbeemd and D.K. Walker, Pharmacokinetics and metabolism in drug design, Wiley-VCH (2000).
H. van de Waterbeemd, D.A. Smith, K. Beaumont and D.K. Walker, Property-based design: optimization of drug absorption and pharmacokinetics, J.Med.Chem. 44 (2001) 1313-1333.
CINF 13: In Silico Methodologies in Accelerating Drug DevelopmentAbstract
Computationally based predictive
models of various biopharmaceutical and ADME properties have been developed by
numerous academic, commercial, and industrial groups to address compound
availability and conservation, to perform virtual screening and design, and to
mitigate limitations of experimental throughput. Such properties include aqueous
solubility, cellular and intestinal permeability, protein binding, CNS
penetration, and oral bioavailability (although it is not often acknowledged
that the latter two properties are complex composite functions of many
properties and mechanisms). This provides a context for the hierarchical
development of predictive models of physicochemical and biopharmaceutical
properties, and ultimately the integration of such models into prediction of in
vivo performance. Predictive model development is critically dependent upon
several factors: experimental data, statistical methods, and the requirements of
the intended application. Examples and application of computational properties
models developed within Pharmacia will be presented in the context of these
issues.
CINF 14: Prospective Evaluation of Structure-based
ADME Predictions: Knowing When the Experiment is the Prudent Course of
Action
Troy Bremer, Jehangir Athwal, Kevin Holme, and Carleton
R. Sage, Computational Sciences, Lion Bioscience, 9880 Campus Point Dr, San
Diego, CA 92121, Fax: 858-410-6665,
carleton.sage@lionbioscience.com
Abstract
Statistically based computational
models that predict a biological output from structural inputs are interpolative
systems likely to perform poorly when used for extrapolation. The ADME
(Absorption, Distribution, Metabolism, Excretion) properties of a compound are
largely unrelated to its pharmacological target; the “space” described by ADME
models is largely unconstrained, and data representation within this space is
sparse. Therefore, methods to describe when a prediction is an interpolation or
an extrapolation should be useful in prospectively determining prediction
validity. We have used robust statistical methods to develop predictive models
for CYP2D6, CYP3A4, FDP, and Caco-2 effective permeability. To encourage the
appropriate use of these predictive tools, we have developed methods that
provide a measure of uncertainty associated for each prediction. When used in
the evaluation of completely external test sets, the measures of uncertainty
results have been useful in the identification of poor predictions, allowing
their prospective use in prediction evaluation.
CINF 15:
Exploring the relationships between chemical structure, in vitro profiles,
and in vivo behavior
Julie E. Penzotti1,
Boryeu Mao1, Dragos Horvath2, Jacques Migeon1,
Cecile Krejsa1, and Dave Porubek1. (1) Cerep, Inc, 15318
NE 95th St., Redmond, WA 98052, Fax: 425-895-8668, j.penzotti@cerep.com, (2)
Cerep
Abstract
The drive to identify development
liabilities for compounds earlier in the drug discovery process has made the
generation of reliable computational methods for predicting ADME/Tox properties
increasingly important. Consistent and reliable data for a large, diverse set of
compounds is needed to derive models that can generalize well to new chemical
series. We have screened nearly 2000 drugs and related compounds against a panel
of ~90 pharmaceutical and pharmacological assays to create a biological profile
or BioPrintTM for each compound. We are applying
computational chemistry methods and data mining techniques to investigate the
inter-relationships between this in vitro data, in vivo behavior,
and chemical feature based descriptors, to generate models that correlate
molecular features to patterns of in vitro and in vivo biological
activity. We will describe our efforts towards developing a series of
computational models addressing ADME/Tox properties such as metabolic stability,
bioavailability and toxicity effects that can be used to guide the selection of
compounds likely to have more favorable ADME/Tox profiles.
CINF 16:
CACO-2 permeability modeling: Feature selection via sparse support vector
machines
Curt M. Breneman1, Kristin P.
Bennett2, Jinbo Bi2, Mark J. Embrechts3, and
Minghu Song1. (1) Department of Chemistry, Rensselaer Polytechnic
Institute, 110-8th Street, Cogswell Bldg, Troy, NY 12180, Fax: 518-276-4045,
brenec@rpi.edu, (2) Department of Mathematics, Rensselaer Polytechnic Institute,
(3) Decision Sciences and Engineering Systems, Rensselaer Polytechnic
Institute
Abstract
We describe a methodology for
performing variable selection and ranking using support vector machines (SVM).
The basic idea of the method is simple: Construct a series of sparse linear SVMs
that exhibit good generalization, then create a subset of variables having
nonzero weights in the linear models. This subset of variables is then used in a
nonlinear SVM to produce the final regression or classification function. The
method exploits the fact that linear SVMs with 1-norm regularization (no
kernels) inherently perform variable selection as a side-effect of minimizing
capacity in the SVM model. In linear 1-norm SVMs, the optimal weight vector will
have relatively few nonzero weights with the degree of sparsity depending on the
SVM model parameters. The variables with nonzero weights then become potential
features to be used in the nonlinear SVM. In some sense, we trade the variable
selection problem for the model parameter selection problem in SVM.
The small number of molecules and descriptor collinearity makes the results of the linear 1-norm SVMs somewhat unstable -- small changes in the training and tuning data and/or model parameters may produce very different sets of nonzero weighted attributes. Our final variable selection and ranking methodology exploits this instability. For each training partition, the data is further divided to create a tuning set used in a pattern search algorithm for parameter selection. Multiple linear models are then created based on different tuning set partitions, each producing different variable weights. The final variable subset is chosen from the superset of all nonzero weighted attributes in any of the linear models. The simple strategy of selecting the entire superset works well in practice. The distribution of the linear model weight vectors provides a mechanism for ranking and interpreting the effects of variables. Starplots or stackplots are used to visualize the magnitude and variance of the weights found by the linear models for each attribute.
![]()
New Developments in Electronic Publishing I
![]()
CINF 17:
Digital Content in its New Context
Richard T.
Kaser, Vice President, Content, Information Today, Inc, 143 Old Marlton
Pike, Medford, NJ 08055-8750, Fax: (609) 654-4309,
kaser@infotoday.com
Abstract
Over the last 30 years, we have seen
the scholarly literature (and just about every other kind of information) expand
from the confines of a book spine into flexible components that can quickly
regroup into dynamic arrangements. The document still persists as the preferred
form to contain the formal expression of knowledge. But in our networked world,
we are increasingly aware that other ways to express and convey information and
share knowledge exist as well. Today's services are blending documents, data,
visual representations, and other elements with interactive elements to create
services honed to the users' needs. All the pieces that publishers, librarians,
system operators, aggregators, developers and a host of others have worked for
over a generation to perfect are now in place. The underlying technology finally
works. And it is a rich array of digital information services that now exist.
Our technology supports everything from off-the-shelf, ready-to-wear solutions
to do-it-yourself enterprise and institutional portals. The choices are mind
boggling. If we could only make full use of everything we've got, we would feel
rich. This presentation will focus on developments in electronic publishing at
large, with an emphasis on those that support research activities.
CINF 18: The all-inclusive, totally functional, super-connected
scientific information machine
Robert D. Bovenschulte,
Director, ACS Publications, American Chemical Society, 1155 16th Street NW,
Washington, DC 20036, Fax: (202) 872-6060, rbovenschulte@acs.org
Abstract
Driven by technology, scientific,
technical, and medical (STM) publishing continues to undergo unrelenting change.
Although one could have reasonably conjectured five years ago, when this wave of
technology began to transform publishing, that our industry would have
stabilized after an initial period of rapid and sweeping change, this
expectation has not come to pass. The key questions that confronted us then
continue to challenge us: where is STM publishing going, and what will it
eventually look like? A vision of an ultimate scientific information machine is
starting to emerge,however,and considerable progress toward this fuzzy ideal is
evident. The presentation will address these questions by exploring significant
developments in STM publishing from the viewpoints of various stakeholders in
the scientific enterprise. Economic and cultural issues will be examined as well
as technological ones. The presentation will cautiously advance some hypotheses
about what the future may hold. Topics will include: pricing models for online
content, creating and providing access to electronic archives, scientists'
concerns about open access to scientific information,challenges to peer review,
problems with usage statistics, and providing content to nations that cannot
afford to pay full rate.
CINF 19:
Online publishing a chance for new alliances: a report from
Springer-Verlag
Gertraud Griepke, Journals/LINK
Director, Springer-Verlag, Tiergartenstrabe 17, D-69121 Heidelberg, Germany,
Fax: 49 (0 62 21) 487-288, griepke@springer.de
Abstract
There are unique challenges involved
in setting up and running a successful online service for scientific content.
While many of the aspects handled are the same as any online information service
on the internet; the digital production workflow for publishing scientific
content needs to be rapid, of high quality with enhanced functionality which are
unique to science. The result of this effort is visible in the alliances between
publishing houses, abstracting indexing services, agencies, libraries and
information users. This talk explores the progress Springer has made so far and
looks forward to the challenges to come.
CINF 20: A
'sea change' in chemical information
William G Town,
Elsevier Science, Director of Operations, ChemWeb, Inc, 84 Theobalds Road,
London WC1X 8RR, United Kingdom, Fax: +44(0)20 7611 4301,
bill.town@chemweb.com
Abstract
In the last five years, a 'sea
change' in chemical information has occurred: community websites, publisher
websites, content aggregators, preprint servers, e-commerce market places, and
scientific search engines have all been launched in this timeframe. The rate of
innovation has shown a dramatic increase, which shows no sign of abating.
Publishers are completing the transition from print-based to
electronic/print-based businesses. New players and new partnerships characterise
the scientific information market today. Consolidation of the industry has
already begun but is this just the start or the end of the process? What will
the next five years bring?
CINF 21:
The future of the 'infomediary'
Andrea Keyhani,
Chief Operating Officer, Ingenta, Inc, 23-28 Hythe Bridge Street, Oxford OX1
2ET, United Kingdom, Fax: +44(0)1865 799111, akeyhani@ingenta.com
Abstract
Ingenta is one of the world's largest
resources of academic and professional research articles online - recognizing
subscriptions and offering document delivery of 26,000 publications and the
full-text of 5,400 journals from 180 publishers.
Incorporating UnCover, CatchWord, Dynamic Diagrams and PCG, Ingenta provides publishers' solutions to empower the exchange of research content online - from a database of journal metadata to sophisticated e-communities.
Complementary services to libraries include free access to subscribed-to journals, document delivery with cost accounting, and customized library gateways. This Fall, Ingenta launched the PCG Library Consortia Sales Program, an innovative solution to consortia site licensing of the electronic content from Ingenta publishers.
The Ingenta Institute, a non-profit organization, commissions independent research into the future of scholarly publishing. Using this research as a starting-point, Andrea will predict the future of the 'infomediary' and its role in electronic publishing.
CINF 22:
Electronic information and the innovation
challenge
Robert J. Massie, Director, Chemical
Abstracts Service, American Chemical Society, 2540 Olentangy River Road,
Columbus, OH 43202-1505, Fax: (614) 447-3713, rmassie@cas.org
Abstract
Many of today's concerns in the
information industry resolve into a single broad question: can information
providers continue to innovate? An analysis of developments in electronic
information reveals that innovation occurs along two axes: delivery platforms
and content. Each platform offers its own range of tools and possiblities and we
have experienced successive waves of technical advances in recent memory. But
the excitement of new technology and media must not obscure the role of content,
the sine qua non of value in information products. It is the dynamic interaction
of platform and content that gives rise to new products. This thesis will be
explored with concrete examples and their implications for giving consumers of
scitech information what they really want in the new era of integrated access.
![]()
ADME/Tox Informatics: Applications
![]()
CINF 23: Use of robust classification techniques for
the prediction of human cytochrome P450 inhibition
Roberta G.
Susnow, and Steve Dixon, ADMET R&D, Accelrys Inc, Box 5350, Princeton,
NJ 08543-5350, rsusnow@accelrys.com
Abstract
The ability to predict the inhibition
of the cytochrome P450’s is important because of their role in the metabolism of
xenobiotics and the consequent potential for drug-drug interactions. The human
CYP 450’s are responsible for the metabolism of more than 50% of all known
drugs. We will present our latest research into the use of robust classification
techniques for predicting the ability of molecules to inhibit the P450 isozymes.
These techniques are designed to produce models with a low sensitivity to noise
and broad applicability across chemical families.
CINF 24:
Use of predictive ADME in library profiling and lead
optimization
Osman F. Güner, and Robert D. Brown,
Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100,
osman@accelrys.com
Abstract
High-throughput in silico ADME models
can be used to select subsets of combinatorial libraries based on not only
diversity or similarity, but also a combination of various ADME properties as
well. The contribution of the ADME properties-based constraints can be weighted
against diversity assessment. We present how the drug-like properties of the
selected subset of library can be improved without compromising the diversity
and coverage of the library. The process is demonstrated with several examples.
Finally, we provide an example of how this process is used in lead optimization
while both potency and pharmacokinetic properties are simultaneously optimized
to yield potent candidates with better anticipated ADME characteristics.
CINF 25: Computational strategies in support of early
ADME drug discovery efforts
Michelle L. Lamb, Jayashree
Srinivasan, John E. Eksterowicz, Robert V. Stanton, Kelly M. Jenkins, Robyn A.
Rourick, and Peter D. J. Grootenhuis, Bristol-Myers Squibb Pharmaceutical
Research Laboratories, 150 California Street, Suite 1100, San Francisco, CA
94111, Fax: 415-732-7170, mlamb@combichem.com
Abstract
Early identification of liabilities
associated with molecular absorption, distribution, metabolism, and excretion
(ADME) accelerates the drug discovery process by identifying poor candidates
prior to large investment in their development. As the mechanisms involved in
ADME are complex, simple filters may only be applied in limited situations.
Strategies that incorporate more complex models, such as ensembles of
pharmacophores or shape descriptors may be more successful. We will describe the
computational filters and classification models that we have developed to guide
the design and selection of libraries likely to have more favorable absorption
and metabolism profiles and to assist in the prioritization of chemical series.
CINF 26:
Conceptual models for structure-nonspecific
ADME/Tox
Stefan Balaz, and Viera Lukacova, Department
of Pharmaceutical Sciences, North Dakota State University, College of Pharmacy,
Sudro Hall 108, Fargo, ND 58105, Fax: 701-231-7606,
stefan.balaz@ndsu.nodak.edu
Abstract
For most chemicals, ADME/Tox
processes except enzymatic metabolism and active transport are governed by their
overall properties including lipophilicity, amphiphilicity, acidity, and
reactivity. Structure-nonspecific (sn-)ADME/Tox processes can be analyzed using
models of subcellular pharmacokinetics, which describe the kinetics of membrane
transport, protein binding, hydrolysis, and other reactions with cell
constituents in terms of differential equations. Using the time hierarchy of the
included processes, the equations can be simplified and solved explicitly. The
solutions (called disposition functions) represent conceptual models for
sn-ADME/Tox processes in terms of chemical properties and time. The attributes
of biological systems are kept invariant during the experiments and are
collected in adjustable coefficients of disposition functions. Once calibrated
for given biosystem, the models provide a detailed recipe for structure
optimization of chemicals with regard to sn-ADME/Tox. The models have much
better predictivity outside the tested property space than empirical models as
demonstrated using the leave-extremes-out cross-validation procedure.
CINF 27:
Model for absorption of drug-like compounds based on structural features
and interfacial properties
Chihae Yang1, Ilya
Utkin1, James Rathman2, and Paul E.
Blower1. (1) LeadScope, Inc, 1245 Kinnear Rd, Columbus, OH 43212,
cyang@leadscope.com, iutkin@leadscope.com, (2) Deparment of Chemical
Engineering, The Ohio State University
Abstract
Predicting ADME properties from
structure-based models is still not reliable and is considered to be one of the
most difficult problems in the lead optimization process. ADME models are
typically based on physical properties and assay data from lipid vesicles or
monolayer experiments, where KD values can be experimentally determined. The
complex interfacial interactions of a compound with membrane lipids are
difficult to extract from the conventional set of molecular properties predicted
from a QSAR model. In this study, a set of drug-like compounds are selected and
their interfacial properties predicted based on the structural features using
available correlation methods. These predicted interfacial properties, in
conjunction with structural features selected by informatics methods for their
high association with desired physical properties, are employed to build a model
for human intestinal absorption. Various informatics methods, including genetic
algorithm, K-nearest neighbor, and partial least square methods are used to
build these models.
CINF 28: Using ADME properties with SciFinder to
target new drugs
Michael McBrien1, Robert
DeWitte1, Robin Martin1, Eduard Kolovanov1, and
Kurt Zielenbach2. (1) Advanced Chemistry Development, 600-90 Adelaide
W, Toronto, ON M5H 3V9, Canada, michael@acdlabs.com, (2) Chemical Abstracts
Service
Abstract
In the last several years, biological
chemists have begun to apply physical criteria when selecting compounds for
evaluation. By avoiding compounds with extremely high (or low) lipophilicity,
and low solubility, for example, chemists hope to focus their investigations on
compounds that are more likely to be succesfully absorbed by passive processes.
Recently, Chemical Abstracts Services and Advanced Chemistry Development have
collaborated to make predicted physical properties available for over eight
million organic substances in the CAS Registry database. This talk will explain
how these predicted properties are computed, and how the user may use them in
conjunction with SciFinder to narrow queries to compounds with suitable physical
properties.
![]()
New Developments in Electronic Publishing II
![]()
Abstract
Licensing of electronic databases and
journals in major academic research libraries represents a significant
investment of resources to activate, manage and maintain. It is in the best
interests of both academic libraries and their primary user communities to make
the most efficient use of these online resources possible. The University of
Chicago Library has been motivated recently to provide links among various
electronic resources in order to guide users to the appropriate information
using the most direct means available. The linking systems utilized included
ChemPort, OvidLinks and SFX. This report describes the implementation and some
early evaluation of these efforts.
CINF 30:
ACS Journals on the Web: A 5-Year
Retrospective
Lorrin R Garson, David P Martinsen, and
Ralph E Youngen, ACS Publications, American Chemical Society, 1155 16th Street
NW, Washington, DC 20036, Fax: (202) 872-4389, l_garson@acs.org
Abstract
The ACS journals have been available
to the scientific public on the World Wide Web since September 8, 1997. Prior to
1997, important work on core electronic delivery technology was accomplished
which made Web delivery practical. Several features of the ACS Web journals will
be discussed. Digitization of the backfile of ACS journals was accomplished in
2001, the earliest issues being from 1879 for the Journal of the American
Chemical Society. Creation of the backfile, with emphasis on engineering
aspects, will also be discussed.
CINF 31:
Building the digital research environment: a report from the construction
site
Harry F Boyle, Manager, Web Content, Chemical
Abstracts Service, American Chemical Society, 2540 Olentangy River Road,
Columbus, OH 43202-1505, Fax: (614) 447-7149, hboyle@cas.org
Abstract
Despite the lack of a blueprint, the
foundations of the digital research environment of the future are under
construction. Many organizations are building its components. Lack of a shared
vision or blueprint ensures that the pieces will not fit together optimally.
Meanwhile the traditional foundation of scholarly communication - the print
journal, is in decline. CAS and the Publications Division of the ACS, along with
many others STM publishers and service providers are working together to build
the digital research environment of the future. This presentation will provide
examples of these efforts, as seen through the eyes of research scientists and
administrators.
CINF 32: Open Meeting: Committees on Publications and
on Chemical Abstracts Service
Robert J. Massie, Director,
Chemical Abstracts Service, American Chemical Society, 2540 Olentangy River
Road, Columbus, OH 43202-1505, Fax: (614) 447-3713, rmassie@cas.org, and Robert
D. Bovenschulte, Director, ACS Publications, American Chemical
Society
Abstract
Open meeting.
![]()
ADME/Tox Informatics: Predictive
Models
![]()
Abstract
Metabolic biotransformations of
drug-like compounds depend on their structures. Dozens enzymes metabolise
xenobiotics in human organism, and sometimes toxic metabolic products are
generated. Thus, computer-aided prediction of metabolic biotransformations might
help to select the most prospective compounds at the early stage of R & D.
Computer program PASS is shown to predict with reasonable accuracy more than 700
pharmacological effects, mechanisms of action, carcinogenicity, mutagenicity,
teratogenicity and embryotoxicity of compound on the basis of its structural
formula (http://www.ibmh.msk.su/PASS). We applied PASS to prediction of
specificity for metabolism of compounds by different isoforms of cytochromes
P450 and to estimation of first step in metabolic transformation. Database
Metabolite 2001.1 (MDL Information System INC) was used as the training set. It
was shown that the average accuracy of prediction in leave one out
cross-validation is satisfactory for use this approach in practice. This talk
will focus on possibilities and limitations of metabolism prediction by PASS.
CINF 34: In Silico Models for the Prediction of
Hepatotoxicity on Human
Ailan Cheng, and Steve Dixon, ADMET
R&D, Accelrys, CN 5350, Princeton, NJ 08543, Fax: 609-919-6155,
acheng@accelrys.com
Abstract
The liver has been recognized as a
target organ for xenobiotic-induced toxicity due to its crucial role in
metabolism. Hepatotoxicity has been dose-limiting factor for many INDs. Many
drugs were withdrawn from clinical trials and even market due to hepatotoxicity.
“Fail early and fail fast” is the current paradigm of pharmaceutical industry.
Eliminating the compounds with poor ADME/Tox profile in the early stage will
lead to tremendous savings. Accurate predictive method can be used to identify
and prioritize candidates for development, to assistant designing compounds with
desirable profile, and to prioritize and even to reduce the experimental studies
and animal tests. We will present our latest in silico models for the prediction
of hepatotoxicity potential on human. The model was based on a set of diverse
compounds. The model is rather fast and allows the model to be used in data
mining and profiling of large synthesized or virtual libraries.
CINF 35: Data mining to identify structural alerts
for liver toxicity
Paul E. Blower Jr., Gulsevin Roberts,
and Ilya Utkin, LeadScope, Inc, 1245 Kinnear Rd, Columbus, OH 43212, Fax:
614-675-3732, pblower@leadscope.com
Abstract
Adverse liver findings are frequently
responsible for the failure of drug candidates and marketed drugs. We first
developed a grading scheme for liver toxicity that encompasses a range of
pathology findings and dose effects. Using data from the RTECS files and other
sources, we established a database containing structural information and liver
gradings. Structural alerts can be identified using data mining approaches for
investigating correlations between molecular structure and biological activity.
We have developed a new statistical search procedure that quickly identifies
specific combinations of structural features corresponding to compound sets with
high average activities. This study demonstrates that data mining tools can
identify a number of structural alerts for liver.
CINF 36:
The prediction of Water Solubility and of pKa-Values by Physicochemical
Descriptors
Johann Gasteiger, Ai-Xia Yan, and Thomas
Kleinöder, Computer-Chemie-Centrum and Institute of Organic Chemistry,
University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen 91052, Germany,
Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de
Abstract
Water solubility and protonation
states are two important properties to be considered in drug development. We
have used a variety of physicochemical descriptors such as charge distribution,
inductive, resonance and polarizability effects that can rapidly be calculated
by empirical methods collected in the program package PETRA. The correlation of
these descriptors with water solubility and pKa-values has been investigated
with statistical methods and with unsupervised and supervised neural networks.
CINF 37: Prediction of aqueous
solubility
Flemming Steen Jørgensen1, Jørgen
Bonefeld Kristensen1, and Inge Thøger Chistensen2. (1)
Department of Medicinal Chemistry, Royal Danish School of Pharmacy,
Universitetsparken 2, DK-2100 Copenhagen, Denmark, Fax: +45 35 30 60 40,
fsj@dfh.dk, (2) Novo Nordisk A/S
Abstract
Two new models for prediction of
aqueous solubility will be presented and compared with other known methods. The
first model is based on the atom-type weighted water-accessible surface area
(ATW WASA) approach. The water-accessible surface area is calculated for each
atom and its contribution to the aqueous solubility is weighted by multiplying
with a coefficient characteristic for each atom type. In the second method, the
group contribution method, the molecules are split up in predefined fragments
covering important functional groups and substructural units of the compounds.
The number of fragments obtained are weighted by coefficients determined by
multidimensional least-squares fitting. A set of 1292 structurally diverse
compounds was used as a training set. For this set we obtained a correlation
between experimental and predicted aqueous solubility on r=0.87 with an average
error on 0.82 log units for the ATW WASA model and r=0.93 with an average error
on 0.59 log units for the group contribution model.
CINF 38: Accurate Prediction of Aqueous
Solubility
Michael McBrien, Robert S DeWitte, and Eduard
Kolovanov, Advanced Chemistry Development, 600-90 Adelaide W, Toronto, ON M5H
3V9, Canada, michael@acdlabs.com
Abstract
Although prediction of human
absorption is confounding, it is clear that aqueous solubility is among the key
driving factors. This talk will focus on the physical chemistry of the process
of dissolving, and describe the methods used at Advanced Chemistry Development
to produce a global predictive method for the accurate prediction of aqueous
solubility. Finally, a software product will be described that makes this
prediction technology available to every medicinal chemistry in a simple and
intuitive user interface or through your company's informatics infrastructure.
CINF 39: Prediction of cytogenetic activity of
organic compounds from molecular structure
Jon R. Serra,
Chemistry Department, Pennsylvania State University, 152 Davey Lab, University
Park, PA 16802, jrs@zeus.chem.psu.edu, and Peter C. Jurs, Department of
Chemistry, Pennsylvania State University
Abstract
Computational classifiers for
cytogenetic activity are being developed with a large, diverse set of organic
compounds which have been tested with an in vitro chromosomal aberration assay
using Chinese hamster cells. Classifiers are being developed to separate active
from inactive compounds. Compounds that are common to both a 24-hour and 48-hour
exposure assay are included. Each compound is represented by descriptors
calculated from its molecular structure that encode topological, geometric,
electronic, and polar surface features. Subsets of informative descriptors are
identified with simulated annealing or genetic algorithm feature selection. The
classifiers are built with k-nearest neighbor, multiple discriminant analysis,
radial basis function neural networks, or support vector machines classifier
algorithms. In one specific investigation, classifiers working with several
hundred compounds each represented by a few topological descriptors achieve
classification rates of approximately 80 percent. The details of the study and
the classification results achieved will be described.
![]()
Sci-Mix
![]()
Abstract
Even chemical formulas per se bear important information, there are few discussions how to search chemical information in text files using chemical formulas. A difference of presentation of chemical formulas in different systems (USPTO, ESPACENET, JAPIO, MICROPATENT, Delphion, STN, DIALOG,etc.) will be discussed, along with tools used to verify a correctness of chosen search strategy (dictionary files, an expand command, an observation of known examples).Examples of search strategy for piezoelectric ceramics like (PbZrxTiy (Zn1/3Nb2/3)z (Mn1/2W1/2)tO3) where syntax of search command depends on spacing between atom fragments,. will be presented.
![]()
![]()
Abstract
The American Inventors Protection Act
of 1999 (AIPA) provides for the publication of most patent applications 18
months after the earliest filing of the application. This presentation will
address details concerning the content and volume of the publications, and
access to the United States Patent and Trademark Office's (USPTO's) text
searchable database of published applications. The discussion will include
details of how published applications are usable as prior art against other
applications as of their earliest US filing, along with prior art considerations
relating to international filing dates. The new interference bar based on
published application claims will be discussed. The USPTO is providing new
opportunities to review file wrapper histories and to access computer records
related to pending published applications. The presentation will also review new
systems of using the USPTO's records related to patent term adjustment and
extension, assignments, and maintenance fee payment information. The discussion
will also touch on new publication initiatives related to lengthy disclosures.
CINF 42: Is the sky really falling? A review of the
first two years of AIPA and its effect on inventors, patent lawyers and
businesses
Jacqueline M. Hutter, and Brian C.
Meadows, Needle & Rosenberg, PC, 127 Peachtree Street N.E, 12th Floor,
Atlanta, GA 30303, Fax: 404-688-9880, Hutter@needlepatent.com,
meadows@needlepatent.com
Abstract
The provisions of the 1999 American
Inventor's Protection Act (AIPA) markedly changed long-standing practices and
procedures of the U.S. patent laws. These changes have not only affected patent
practitioners, but also inventors and the entities that seek to profit from
their inventive activity. This presentation will provide an overview of some of
the provisions of AIPA that have modified the manner in which patent strategy
decisions are made by inventors, businesses and patent practitioners. In
particular, this presentation will address AIPA's effect on matters related to
patent application filing and patent application prosecution decisions, as well
as new post-issuance considerations.
CINF 43: IFI indexing of pre-grant publications: Opportunities and
challenges
Darlene Slaughter, IFI CLAIMS Patent Services,
3202 Kirkwood Highway, Wilmington, DE 19808, Fax: 302-998-0733,
darlene.slaughter@aspenpubl.com
Abstract
Publication of US patent applications
by the USPTO presents an opportunity for IFI to offer fast, accurate access to a
whole new category of patent documents. IFI has been indexing US chemical
applications since they began to publish in March 2001, with the goal of
providing index term access to chemical patent documents as quickly as possible.
We are adjusting to some associated challenges, including untested and sparsely
documented source data, absence of an Official Gazette, larger records requiring
more indexing terms, and a significant increase in overall volume of chemical
patent publications, both granted and pre-grant, requiring IFI chemical
indexing. This presentation focuses on IFI's approach to meeting those
challenges, and current status of the indexed database.
![]()
Analysis and Visualization of Chemical Information: I
![]()
CINF 44:
Information management for research in the chemical
industry
L. David Rothman, Materials Science &
Information Research, The Dow Chemical Company, 1776 Building, 2nd Floor,
Midland, MI 48674, LDROTHMAN@dow.com
Abstract
Wherever one looks, people are
increasing the rate of data acquisition and seeking to use those data to answer
more complex questions. The world of chemistry is no exception. High-throughput
experimental and virtual science, growth of the public literature, demands for
ever-improved materials and manufacturing processes and the overlap of
industrial chemistry with the life sciences all create new challenges and
opportunities in the analysis of data to drive decision-making in chemical
research. Among the challenges with data are acquisition, management,
integration and analysis, with the extra complication of the culture change
researchers may undergo as these data challenges are addressed. There is much to
learn from other industries, but chemistry information certainly has its unique
problems. This talk will discuss these subjects and the needs that arise from
them.
CINF 45: Realizing
the dream: Analysis and visualization tools for today, problems and issues for
tomorrow
William F. Bartelt III, CAS and Web Content,
CAS, 2540 Olentangy River Road, Columbus, OH 43202-1505,
wbartelt@cas.org
Abstract
Ways to deal with the ever-growing
tidal wave of chemical information are sorely needed. Software tools for
analysis and visualization of data are frequently cited as holding the most
promise for solving the problem of information overflow. Different classes of
tools are needed to deal with different types of data. How are we doing? This
survey focuses on analysis and visualization tools that are commercially
available and how well they help stem the tide. In addition, technical and
economic issues affecting advances in the state-of-the art will be explored.
CINF 46:
Command and control of the drug discovery factory: Putting researchers in
the driver's seat
Christopher Ahlberg, Spotfire, Inc,
212 Elm Street, Somerville, MA, MA 02115, Fax: 617-702-1700,
ahlberg@spotfire.com
Abstract
The last decade has seen an abundance
of novel technologies, methodologies, and research content coming into the
domain of drug discovery. High throughput technologies have the possibility of
significantly improving the results of pharmaceutical research.
However - the results have not yet been shown. The output of novel products in the market place has decreased rather than increased while these new technologies have been implemented in current processes.
Much of the blame for this has been put on how research organizations have not been ready for dealing with the data explosion from novel technologies. Researchers have had to deal with 100x more data - in terms of number of compounds as well as in number of properties. Novel visualization and analytic technologies have been successful in battling this explosion - allowing researchers who otherwise would be confined to spreadsheets to rapidly browse data searching for trends and outliers.
While these novel visualization and analytic technologies have had big impact I will argue that to see real improvements in research productivity we need to see a discontinuous change in how research organizations deal with data and decision-making.
Chemists need to be able to see their results in the context of biology; biologists need to be able to see their results in the context of chemistry, etc. Decisions need to be made cross functionally - taking every aspect of chemistry and biology into consideration. Every decision need to be continuously monitored and updated as new data becomes available.
This is easier said than done. As much as such decision-making indeed would be a discontinuous change, a discontinuous change in software infrastructure for decision-making will be needed to enable a change in methodology - and put researchers in the driver's seat.
I will outline a novel architecture for analytical software for the world of drug discovery - building on previous success in data visualization - and showing how integrated decision-making can be made possible, though improvements at every level from the UI to the database. The presentation will include architecture as well as user interface issues - and discuss impact on pharmaceutical research.
CINF 47:
Integrating analysis of chemical information from diverse sources and data
types
Jeffrey D. Saffer, OmniViz, Inc, Two Clock Tower
Center, Suite 600, Maynard, MD 01754, jsaffer@omniviz.com
Abstract
Today's chemist deals with very large
collections of information from diverse sources. Textual information from
literature and patents, high throughput screening results, structures,
descriptors and fingerprints, and ADME/Tox results represent just a sampling of
the different types of information used by the chemist. Being able to integrate
these varied data into a cohesive understanding can lead to improved
decision-making. One of the best instruments for this integration is the human
mind, but this tool can only be fully engaged when the diverse information is
presented in a context that is easy to assimilate. To this end, we have
developed a visualization framework that integrates analysis of experimental and
computational data with conceptual analysis of textual information. The
application of these approaches to very large (hundreds of millions of data
points) chemistry data sets will be discussed in the context of discovery
research.
![]()
Living With AIPA: Impact of the American Inventors Protection Act After a Year: II
![]()
CINF 48:
Trends and impacts in chemical patent
information
Matthew J. Toussant, Chemical Abstracts
Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: 614-447-3906,
mtoussant@cas.org
Abstract
CAS has observed two significant
trends in the chemistry-related patents it monitors: 1) Major patent offices are
issuing more applications and granted patents than ever before; and 2)
Bioscience patents are becoming more complex and dense with information--and
this could affect the currency and completeness of secondary databases. The
enactment of the American Inventors Protection Act heightens the need to access
patent information as soon as possible. Statistics from five major patent
offices will be reviewed and the impact of the USPTO patent application release
beginning in March 2001 will be examined. At CAS, these developments change the
requirements for chemical patent information, while trends in intellectual
property disclosure and protection are affecting chemical and pharmaceutical
companies. This presentation will discuss how a secondary information provider
adjusts to these changes, in the interest of serving researchers' evolving
needs.
CINF 49:
One change in law; a myriad of industry effects
Sarah
Hamer, Editorial Manager, Chemistry and Life Sciences, Derwent Information
Ltd, 14 Great Queen Street, London, WC2B 5DF, United Kingdom, Fax: 44 20 7344
2900, Sarah.Hamer@derwent.co.uk
Abstract
Passage of the American Inventors
Protection Act (AIPA) was greeted at Derwent Information Ltd. with a mixture of
eagerness and apprehension. Given the importance of the USPTO as a patenting
authority, we needed from day 1 to provide full coverage and value-added data
for each of the published applications in the Derwent World Patents
Index® (WPI), Derwent GENESEQ, Derwent Patents Citation Index® and
other products. To ensure achievement of this commitment, a major project was
established to recruit and train additional staff required to process the data,
as well as to secure office space for them. A brief overview of the issues
Derwent faced, and steps taken to address them, will be followed by statistics
and a trend analysis demonstrating the impact of the law changes on the
structure of Derwent WPI patent families. This will include first-to-file issues
and the proportion of patent families containing only US applications. The range
of technologies covered in US applications published during the first year will
also be reviewed.
CINF 50: Living with AIPA: A patent vendor
perspective
David T. Dickens, Questel.Orbit, 8000
Westpark Drive, McLean, VA 22102, ddickens@questel.orbit.com, and Linda
Williams, Questel S.A
Abstract
Questel-Orbit offers traditional
online and internet access to a large collection of intellectual property
databases. The pre-grant publication (PGP) of US patent applications as of 15
March 2001 has posed quality and design issues for patent searchers, database
producers, and patent vendors alike. Difficulties for producers include such
design issues as formatting of the new 11-digit patent number format for
crossfile searching, the merging of PGPs and grants in a single database and/or
document, optimal handling of claims from different stages, managing
continuations, CIPs, and divisionals, and patent family definition. Other
technical issues include the handling of missing data elements and non-standard
formatting of priority numbers. This paper discusses the ways that
Questel-Orbit, as both vendor and database producer, has implemented four very
different databases: Questel-Orbit's USAPPS fulltext database, IFI's CLAIMS
IFIPAT, Derwent's World Patents Index, and Questel-Orbit's PlusPat.
CINF 51: Comparing US and European
early-publication practices
Stephen R. Adams, Magister Ltd,
Crown House, 231 Kings Road, Reading RG1 4LS, United Kingdom, Fax: +44 118 929
9516, stevea@magister.co.uk
Abstract
The United States is a late adopter
of the early publication system, which has been well-established in Western
Europe since the early 1960's. Consequently, most European patent information
specialists have long been familiar with the characteristics, content and
usefulness of such documents and the corresponding search databases. At first
sight, the US legislation results in an analogous publication. However, this
paper provides a more detailed examination of the operation of US 18-month
publication, which reveals a number of significant differences when compared to
European practice. These variations can have an impact upon the expectations of
the searcher, the nature of their search tools and the results which can be
obtained from them.
CINF 52: A
patent searcher looks at the American Inventors Protection
Act
Stuart M. Kaback, Information Research &
Analysis, Research Support Services, ExxonMobil Research & Engineering Co,
1545 Route 22 East, Annandale, NJ 08801, Fax: 908-730-3230,
stuart.m.kaback@exxonmobil.com
Abstract
Unexamined published patent
applications have been with us for a long time. Non-examining countries such as
Belgium and South Africa were early-publishing countries even before the
Netherlands started, in 1964, to publish all applications 18 months after their
priority dates. Most major patenting authorities eventually followed suit, and
one might have supposed that when the US began to publish pre-grant applications
in March of 2001 things wouldn't have changed that much, beyond the availability
of additional documents in the English language. That turns out to have been an
oversimplification; the US pre-grants are quite different from earlier published
applications, in a number of ways. This presentation will examine a number of
changes in patent information availability brought about by the American
Inventors Protection Act.
![]()
Analysis and Visualization of Chemical Information: II
![]()
CINF 53:
Evolving techniques to analyze and visualize chemical
information
Kim S. Dunwoody1, John L.
Macko1, William F. Bartelt2, and Kurt W.
Zielenbach3. (1) Research Department, Chemical Abstracts Service,
P.O. Box 3012, Columbus, OH 43210, kdunwoody@cas.org, (2) CAS and Web Content
Department, Chemical Abstracts Service, (3) Online Services Department, Chemical
Abstracts Service
Abstract
Techniques for analyzing and
visualizing chemical structures and text have become instrumental in exploring,
problem-solving, and decision-making. The purpose of evolving tools is to
organize and elevate information, so that an investigator can gain insights that
would otherwise be out of reach. This presentation includes practical tips for
using STN Express and SciFinder and for launching other products. The examples
include (1) an overview of developments in the field of chemical data analysis
and visualization and (2) a case relating activities to a structural class of
substances.
CINF 54: Integrating
chemical information & visualizations to support scientific
decisions
Mark C. Surles, MDL Information Systems, Inc,
5910 Pacific Center Blvd., Suite 310, San Diego, CA 92121, Fax: 858-658-9463,
surles@mdli.com
Abstract
Discovery projects are under pressure
to reduce the time to get compounds to clinical trials while improving their
likelihood of success. At their disposal are high throughput data, predictive
ADME-T tools, and historical, corporate know-how. Numerous applications exist
that separately either integrate data, build predictive models, visualize
chemical information, or exchange information among project members. This
non-integrated, modular approach results in few users and poor exchange of
information, because scientists must be proficient with numerous applications.
Alternatively, systems that have integrated too much into a single interface
have also had limited success because they either became unwieldy while trying
to solve too much, or too simple because they eliminated domain specific
features.
Software advances including multi-tier architectures and XML provide the tools to address this problem. This talk discusses some of the challenges of providing tools for scientific members of discovery projects that incorporate multi-disciplinary, disparate data into a usable, collaborative environment. Examples show a hybrid approach that provides a common denominator of visualizations, analyses, and data in a single front end for scientists, while supporting data exchange with other analysis applications and supporting databases. This approach can provide competitive advantage by including more scientists from more disciplines in the creative discovery process, while shortening the time to clinical trial by communicating project advances to all users in real time.
CINF 55: Visualization of chemical patents: Source titles and abstracts vs. enhanced titles and abstractsAbstract
In recent years text-mining software
has been developed that allows analysts to organize and visualize large
collections of documents without having to read and manually place each
individual document. In particular, ThemeScape from Aurigin Systems allows large
collections of documents to be clustered based on co-occurance of subject topics
or themes. Once similar documents are group based on shared content they are
visualized using a topographical map representation. These maps allow for
relative document density to be measured based on the height of the content
peaks and they allow for secondary relationships to be identified within a set
by observing the relative distance between document clusters that are spatially
close together.
In previous work, a map created from original patent titles and abstracts was compared to a map created using intellectually assigned, hierarchical classification. These maps were similar to one another but it appeared that the original title and abstract was a better analysis source for clustering documents by their function or use.
The current study will continue to explore this area by comparing a map created from the original titles and abstracts to the identical collection of chemical patent documents using enhanced titles and abstracts produced by Chemical Abstracts Service and Derwent Information. The discussion will revolve around differences and similarities in each approach and will attempt to provide information on which source provides the most valuable insight under different circumstances.
CINF 56: Identification and visualization of chemical series: Finding structural series in HTS dataAbstract
The talk will present Structural Unit
Analysis, a new method to identify and visualize relevant structural series in
chemical data. The method was developed for the analysis of HTS data, but is not
limited to this. In addition to identifying the molecules that are members of a
certain series, the method also identfies the structural features that the
compounds share and are relevant to explain the observed activity.
CINF 57:
Visualization and data analysis with VIDA
Joseph J
Corkery, OpenEye Scientific Software, 80 Kinnaird Street, #2, Cambridge, MA
02139, jcorkery@eyesopen.com
Abstract
VIDA is a graphics program designed
to visualize, manage, and manipulate large sets of molecular data such as vendor
or corporate collections, multiconformer virtual libraries, or the results of
computational experiments -- such as docking. Facilities such as visual list
management, data filtering, SMARTS matching, spreadsheet and graphing utilities,
pharmacological property calculation, and clustering are tightly integrated to
3-D and 2-D visualization. On a PC with 2 GB of RAM, VIDA can read and
manipulate 1 million structures. Chemical property calculation features included
are log P, 2-D polar surface area, rotatable bond count, number of heavy atoms,
molecular weight, and presence/count of SMARTS patterns. Electrostatics
calculations are done by an internal Poisson-Boltzmann solver. The spreadsheet
supports typical formula creation (including chemical properties), sorting, and
graphing. Versions are available for Windows, Linux, and SGI.
![]()
Text-Based Retrieval in Chemistry: I
![]()
CINF 58: Overview of 15th Collective Index changes in policy at
CAS
Kathy J. Wolfgram, Ida Copenhaver, Mark E. Prince,
and Linda Toler, Editorial Operations, CAS, 2540 Olentangy River Rd, Columbus,
OH 43209, Fax: 614-461-7140, kwolfgram@cas.org
Abstract
At the start of the 15th Collective
Index (15CI) Period (2002-2006), CAS introduced several changes to indexing
policies and practices to better serve customers. The presentation will
summarize these changes. The implications of these changes to search strategies
and search results will be addressed through several examples.
CINF 59: Searching
the CA and CAplus files with the enhanced CAS Role
Indicators
Eva M. Hedrick1, Maria G.V.
Rosenthal2, and Sandra L. Augustine2. (1) Database Quality
Engineering, CAS, 2540 Olentangy River Rd., Columbus, OH 43210, Fax:
614-461-7140, ehedrick@cas.org, (2) Editorial Operations, CAS
Abstract
CAS Role Indicators have been
enhanced to provide additional access points in the leading fields of scientific
research. As CAS scientists analyze the literature, they assign pertinent Role
Indicators to each substance that is indexed. Role Indicators allow searchers to
break down large answer sets into smaller groupings and to link these groupings
to search for common trends. As the databases grow in size, this intellectual
assignment by a specialist in the field is of great value to database searchers.
CINF 60: Creating a customized report
in STN Express 6.0 with Discover
Steven W. Yang, Olga
Grushin, and Luray Minkiewicz, Leveraged Information Technologies, DuPont Inc,
Experimental Station, Wilmington, DE 19880, Fax: 302-695-7731,
steven.w.yang@usa.dupont.com
Abstract
We have exploited the new report and
table features in STN Express 6.0 with Discover to create customized search
reports. We are able to prepare search reports with optional content from
selected single or multiple session transcripts. Patent family information and
patent application information can be displayed in table format for easy
retrieve . The efficiency and effectiveness of postprocess of search results
have been improved by using these tools. The new report features also enable the
collaborations in multiple activities and information leverage. The Statistics
function has facilitated the competitive analysis.
CINF 61:
Uncertainty in retrieval from large databases
Andrew
Berks, Patent Dept, Merck & Co, RY 60-35, 126 E. Lincoln Ave, Rahway, NJ
07065, Fax: 732-594-5832, andrew_berks@merck.com
Abstract
A recent talk by Sandy Lawson of
Beilstein discussed a concept “Question-query-response, pick any two.” This
presentation will further develop this idea and discuss query complexity and
retrieval of records from large databases. Relationships between database
structure, including indexing, query complexity, and retrieval are shown to
depend on the relationship of the original question to the secondary indexing of
the database. Queries can be closely related to the original question or the
database structure. A query closely related to the database structure is
addressed by the indexing of the database, but this limits the nature of the
questions that can be posed directly to a database. A complex query, not
directly addressed by database indexing, is shown to have limits to completeness
of retrieval. An uncertainty equation is developed, relating retrieval, the
original question, and a variable based on the complexity of the question and
indexing in the database. Natural language interfaces provide a solution to the
problem of query complexity, but at a cost of relevant retrieval.
CINF 62:
DIALOG in a dot com labyrinth: Text based information retrieval in a
graphical user interface culture
James J. Heinis, 11000
Regency Parkway #10, DIALOG Corporation, Cary, NC 27511,
jim_heinis@dialog.com
Abstract
Chemical information may be accessed
through graphical, text or combination interfaces which conceal the underlying
database structure and strategies from nonspecialist searchers. In a traditional
online service (e.g. DIALOG), the crux of search and retrieval depend on
system-wide context breadth, indexing consistency and the ease of coordinating
search results between databases (e.g. MAP command or equivalents). Consistent
indexing is the cornerstone for multivariate cluster analysis which is the
foundation for bibliometrics and data mining. Retrieval from web sources rely on
the initial credibility of the source (e.g. governmental, international or
technical societies are considered more reliable) and effectiveness of the
search engine. Spider web crawler based search engines retrieve only static web
pages that are linked to other pages but do not index content which is not in
flat HTML format (e.g. image, audio, video or Adobe PDF) or is dynamically
generated in response to a query. These non-indexed material forms the "deep or
invisible web." Web based search engine ratings of ranking or relevance may be
skewed by design or economic considerations. In contrast, traditional online
searvices offer access to an orderly selection of well defined databases with a
well defined search engine, indexing structure and standardized search language
with interfaces which may be implemented to ease users into the full
capabilities of the search language as implemented on the system. This paper
will outline the merits of textual based retrieval on a commercial online system
by providing examples of DIALOG searches in pharmaceutical data, use of lesser
known files with kinetic information, isolation of prior art data, linkage to
patent information, retrieval of records and generation of summary data.
![]()
Analysis and Visualization of Chemical Information: III
![]()
CINF 63:
Chemical information is more valuable in context: DiscoveryCenter(TM) as a
chemical information scaffold
Mitchell A. Miller, and
Andrew Payne, NetGenics, Inc, 955 Ridge Hill Lane, Suite 30, Midvale, UT 84047,
mmiller@netgenics.com
Abstract
In the current research environment,
data is certainly not in short supply. Every organization has an abundance of
chemical structure, screening and property data. Putting data of these types of
together in a coherent way so researchers can make the best-informed decisions
about which compounds to pursue is more of a challenge. To this end, NetGenics
has developed DiscoveryCenterTM, a software environment that provides an
integrated view of chemical and biological information held in both internal and
external repositories. DiscoveryCenter allows researchers to search on and view
chemical structures in the context of screening data, biological sequences,
analytical testing data, etc. What's more, its flexible architecture allows us
to plug in the user's choice of data sources, including molecular property
calculators. The right information in the right context gives researchers just
what they need.
CINF 64: Molecular shape graphs
W.
Todd Wipke, John Lawton, and Holly Hendrick, Molecular Engineering
Laboratory, Department of Chemistry and Biochemistry, University of California,
Santa Cruz, CA 95064, wipke@chemistry.ucsc.edu
Abstract
Molecular shape comparison is a
complicated undertaking. A graph-based representation of molecular shape is
attractive in that it may be possible to leverage preexisting graph-theoretical
algorithms to simplify molecular shape comparison. In this paper, we present
methodology for deriving topographical graphs, a graph-like, high-level
representation of molecular shape, which are considerably simpler than the
molecules from which they were derived. The nodes in a topographical graph
correspond to surface segments that possess a given topography, while edges
denote the adjacency of the surface segments. In addition to the
graph-theoretical properties, the nodes have three-dimensional position and the
edges have length. We will present examples of topographical graphs generated
for a variety of molecules and will illustrate the potential benefits of this
representation.
CINF 65:
Nonlinear mapping of massive combinatorial libraries: beyond
enumeration
Dimitris K. Agrafiotis, Victor S. Lobanov,
and Huafeng Xu, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Drive, Exton,
PA 19341, Fax: 610-458-8249, dimitris@3dp.com
Abstract
Nonlinear mapping (NLM) is a
collection of statistical techniques that embed a set of patterns described by a
dissimilarity matrix into a low-dimensional display plane in a way that best
preserves their original pairwise relationships. Unfortunately, current NLM
algorithms are notoriously slow, and their use is limited to small data sets. In
this paper, we present a family of algorithms that combine iterative nonlinear
mapping techniques with neural networks, which makes it possible to handle very
large data sets that are intractable with conventional methodologies. The method
employs a multidimensional scaling algorithm to project a small random sample
set, and then 'learns' the underlying transform using one or more multi-layer
perceptrons. The distinct advantage of this approach is that it captures the
nonlinear mapping relationship in an explicit function, and allows the scaling
of additional patterns as they become available, without the need to reconstruct
the entire map. This methodology is broadly applicable and can be used with a
wide variety of input data representations and similarity functions. It is shown
that in the case of combinatorial libraries, it is possible to predict the
coordinates of the products on the nonlinear map from pertinent features of
their respective building blocks, and thus limit the computationally expensive
steps of virtual synthesis and descriptor generation to only a small fraction of
products. In effect, the method provides an explicit mapping function from
reagents to products, and allows the vast majority of compounds to be projected
without constructing their connection tables.
CINF 66: Interactive exploration of high volume
datasets using HiVol and HiStats
David Baker, and Ralph
Walden, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144,
dabaker@tripos.com
Abstract
HiVol and HiStats are new
software tools for analyzing and visualizing the large datasets typical of
high-throughput synthesis and screening efforts. Chemical and property data for
over a million compounds can be readily calculated, filtered, sorted, and
graphed. Datasets can be interactively and iteratively partitioned into subsets
based on 2D structure searching, diversity/similarity, registration IDs, and
property range. Multiple databases and subsets are simultaneously accessible,
each displayed in a spreadsheet complete with 2D structures and associated
properties. Additional visualization tools include scatter plots, histograms,
and dendrograms. HiStats calculates univariate statistics, performs hierarchical
clustering, and builds regression models that profile the properties of large
datasets in order to guide follow-up experiments.
CINF 67:
Structural class-based analysis, reasoning, and
visualization
Terence K. Brunck, Bioreason, Inc, 150
Washington Ave Ste. 303, Santa Fe, NM 87501, terry.brunck@bioreason.com
Abstract
Given the rapidly growing body of
data being generated by automated synthesis and screening technologies, analysis
and decision-making processes are becoming over-whelmed. One approach to the
analysis, reasoning, and visualization of such large amounts of data is the use
of homogeneous structural classes as the basis for analysis. This approach
enables the characterization and prioritization of groups of compounds rather
than individual compounds. Methods to generate and use such classes will be
presented. Benefits resulting from class-based analysis, including noise
detection, predictive modeling, and similarity screening will be
described.
![]()
Text-Based Retrieval in Chemistry: II
![]()
CINF 68: Support tool for searching the CA and CAplus files
Ida L. Copenhaver, and Alan E. Amos, Editorial
Operations, Chemical Abstracts Service, P.O. Box 3012, Columbus, OH 43210, Fax:
614-447-3906
Abstract
With completion of the addition of
bibliographic and abstract data to the CA and CAplus Files, CAS recognizes the
need for providing support tools that facilitate access to information in the
1907 to current time period. CA production policies and practices have been
modified over time. Changes have occurred in the key focus areas of science. A
new support tool to aid in developing broad subject-based searching strategies
will be discussed.
CINF 69: Lexicon-enhanced text searching for biological
information in the CAS databases
Sabine P. Kuhn, Al E.
Amos, Margaret T. Haldeman, and Cynthia Liu, CAS, Columbus, OH 43202-1505, Fax:
614 447 3713, skuhn@cas.org
Abstract
CAS is best known for its
chemistry, but its databases contain a wealth of biological information. Today
nearly 40% of the document references in the CAS databases focus on biology,
life sciences and medical sciences. The key to the CAS databases, and thus to
accessing the world's literature, is CAS' controlled vocabulary. In the CA
Lexicon on STN the controlled vocabulary is organized under specific topic
areas, and provides relationships between synonyms, broader, narrower, and
related terms. CAS is keeping pace with science, evolving with developing
terminology in the literature and always providing users with accurate access
points to the abundance of scientific knowledge. Meaningful indexing of
biological chemistry requires the accurate recording of sources and targets of
this chemistry. The CA Lexicon on STN captures this information in a detailed
and precisely organized collection of headings: anatomical, taxonomic,
biological process, biological property, and biological activity. Additional
access to biological subjects is provided by an exhaustive collection of
synonyms. The CA Lexicon links these biological headings to relevant chemicals.
In-depth hierarchical arrangements of these headings make it easy to search for
specific substances across broad classifications. The biological information
collected in the CA Lexicon is an indispensable complement to its chemical
content.
CINF 70: Keeping up with the Jones’s. Text based
searching for competitor intelligence
Richard W
Neale1, Paul Sayer1, Gez Cross1, and
Steve Hajkowski2. (1) Product Development Group Chemistry & Life
Sciences, Derwent Information UK, 14 Great Queen Street, Holborn, London, United
Kingdom, Fax: +44 207 344 2911, richard.neale@derwent.co.uk,
gez.cross@derwent.co.uk, (2) Online Training Department, Derwent
Information
Abstract
The chemical industry is one of
the world’s largest and most competitive industries with a total turnover in
2000 in the region of US$ 1,200 Billion. Industry organisations and governments
recognise that the key to improve competitiveness is to pursue sound Research
and Technical Development (R&TD) programmes. As a consequence, the chemical
industry is one of the world’s largest sponsors of industrial R&TD.
With information providers producing a wide range of valuable databases, offering various text searching methodologies, text based searching has proved to be an essential tool in aiding competitor intelligence. However, with increasing volumes and varying types of chemical information, how can searchers be sure of obtaining high quality results?
This paper will concentrate on the use of varying classification & indexing systems for the search and retrieval of patent information. It will also examine their efficiency in the retrieval of chemical information when used individually and in combination.
CINF 71: Not just full text articles: A study for the Search function among chemistry electronic journal websitesAbstract
Besides providing full text articles
online, almost all electronic journal web sites offer the Search function to
their users. However, few users utilize this helpful tool.
This presentation will focus on testing, analyzing, and comparing the search features among chemistry electronic journal websites. The web sites are chosen from those to which Purdue University has full-text access.
CINF 72: Text-based chemical information locator from
the Internet using commercial barcodes
M
Karthikeyan1, S Krishnan1, and Christoph
Steinbeck2. (1) Scientific Management Information Systems,
National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India, Fax:
+91-20-5893973, karthi@ems.ncl.res.in, (2) Max-Planck-Institute of Chemical
Ecology, Carl-Zeiss-Promenade 10, Jena D-07745, Germany, Fax: 364-164-3665,
steinbeck@ice.mpg.de
Abstract
In Chemistry where most of the
information is related to molecular structures it is necessary to transform
those structures into an textual equivalent before defining a query for a
search. Many tools are available to transform pictorial chemical structure into
equivalent chemical names. Frequently, however, there is a need to search
chemicals by common/trade names. This necessitates the development of internet
based tools to search common names or traditional names for given structures. As
a case study a tool CILI (Chemical Information Locator from Internet) was
developed to retrieve structure related chemical information using a text-based
approach (Fig. 1). The input for the system is performed via the keyboard or a
drawing applet or barcode scanning. In any of the cases the final output is
SMILES. After an error checking procedure the user is prompted to enter
information for a substructure search or superstructure search. In the
substructure search the core skeleton is computed by reducing noise groups. The
details are presented.
![]()
Informatics Challenges with Mergers and Acquisitions
![]()
CINF 73: UCB pharma informatics challenges: Merging
databases is also sharing culture and knowledge
Eddy Vande
Water1, Didier Berckmans1, Didier Chalon1,
Anna Toy-Palmer2, and David Wei2. (1) UCB S.A.
Pharma Sector, Chemin du Foriest R4, Braine-l'Alleud B-1420, Belgium, Fax: +32 2
386 27 04, eddy.vandewater@ucb-group.com, (2) UCB Research Inc, 840 Memorial
Drive, Cambridge, MA 02139, Fax: +01 617 5478481, david.wei@ucb-group.com
Abstract
Following the acquisition of the
research group of Cytomed, Inc. in Cambridge, Massachusetts in October 1998, it
became necessary to provide electronic support to all of UCB Pharma Sector’s
researchers. In order to facilitate sharing data between the two sites, the
creation of a global database was decided upon. The ideal database would be
efficient, store information about research products, their properties and their
test results, and be accessible by all of the research scientists from both
Cambridge and Braine-l’Alleud, Belgium. A project team, including people from
Global Research and from IS (Information Systems), was mandated by the Research
Committee in February 1999 to fulfill this mission.
The first step was to try to understand the research process at both sites. After many meetings and accounting for differences in company size, work methods, and cultural habits, a common database structure was agreed upon.
As a result of this key project, a single database now holds essential scientific data, with automated local copies for better access and query performance. Moreover, data specific for each site are stored in local databases that are copied regularly between the two sites. To ensure the required performance level, a new database server and a direct network line between Boston and Braine-l’Alleud were also put into service. Besides the technical improvements, one of the most important aspects of this project was the development of a new communication and work tool for Global Research, which will help us to combine our skills and our scientific know-how. It is also a very good illustration of collaboration between R&D, IS and IT (Information Technology)!
CINF 74: The long march towards integrated research
IT systems
Dieter Poppinger, R&T INformation Services,
Syngenta Crop Protection AG, WRO-1060.7.24, CH-4002 Basel, Switzerland, Fax: +41
61 323 2540, dieter.poppinger@syngenta.com
Abstract
After the merger of the agrochemical
businesses of Novartis and Zeneca, the new company Syngenta embarked upon a
number of major integration initiatives in the IT area. To enable Syngenta
Research to operate effectively across functions and locations, projects are
underway to adapt or integrate all major software systems which support
chemistry- and biology-related research. The talk will address the
organizational, technical, and human challenges which Syngenta faces in these
projects, and describe their current state.
CINF 75: A
cheminformatics system for stereochemical structures
Ping
Du, Lexicon Pharmaceuticals, 279 Princeton-Hightstown Road, East Windsor, NJ
08520, Fax: 609-448-8299, pdu@lexpharma.com
Abstract
Lexicon Pharmaceuticals, formerly
Coelacanth Corporation, is a division of Lexicon Genetics, Inc. After the
acquisition in July 2001, integrating informatics capabilities has become a key
in building a new enterprise-wide drug discovery platform. One difficult
cheminformatics problem that we worked on was to develop a solution to register
chemical libraries with multiple relative chiral centers. Structure matching
functions for chemical structures with more than one relative chiral centers do
not exist “out of the box”. These structures represent a mixture of two or more
stereoisomers. We have extended the Accelrys Accord chemistry engine to match
such structures at library registration. While storing only one chemistry object
in Oracle, we are able to create multiple stereoisomers using a set of structure
templates. This technique is composed of a set of software components, including
Oracle tables and packages, Oracle external procedures for stereoisomer
generation, and a desktop manager application. This system manages the chemical
structures of over 200,000 internal compound collection. Chemical registration
tools are in place to register new compounds with relative chiral centers.
Intranet applications have been developed to query these structures and display
stereo centers in color according to their absolute or relative properties.
CINF 76:
Building a unified drug discovery database within
Celltech
David M. Parry, John Bird, John Rogers, and
James Petts, Celltech R&D Ltd, Granta Park, Great Abington, Cambridge CB1
6GS, United Kingdom, Fax: +44 1223 896400,
David.Parry@cam.celltechgroup.com
Abstract
In 1999 the merger of Celltech and
Chiroscience created a medium sized research organisation, along with all the
overheads of differing ways of working with drug discovery data. Over the last
year a major project within the organisation has focussed on the building of a
new Unified Drug Discovery Database, shortened to UD3. The existing research
organisations within both Celltech and Chiroscience used the IDBS Activitybase
and MDL ISIS products but in differing ways, finding an interim solution to
enable the research work to continue and the development of a longer term
solution were high on the post merger priorities. An overview of the current
status of this project, in the light of the challenges posed by the merger will
be presented.
CINF 77: Key factors and technologies in the
successful deployment of integrated informatics systems
Bill
Langton, Mike Higgins, Ramesh Durvasula, and Denise Beusen, Tripos, Inc,
1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241,
bill@tripos.com
Abstract
Any realization of the elusive
economy of scale that has driven pharmaceutical mergers is critically dependent
on sophisticated decision support systems for creating synergies not only
between scientific disciplines, but also between business units. Key challenges
in implementing integrated informatics systems include: multiple, often
geographically disparate database installations; different database formats; the
sheer quantity of data generated by high-throughput technologies; customizable
tools for simultaneous browsing of chemical, biological, and physical data;
project tracking to improve knowledge management and reporting of results; and
applications that increase research productivity by mining enterprise knowledge
of both earlier and later stages of research. We will discuss our experience in
designing integrated informatics systems as well as crucial technologies that
enhance the probability of their successful deployment.
CINF 78:
Managing the collection of HTS compounds through
suppliers
Nanhua Yao, Shahul Nilar, Vesna
Stoisavljevic, Paul Diaz, Maja Stojiljkovic, Jean-Luc Girardet, Haoyun An,
Jingfan Huang, Eugene Chang, Robert Hamatake, and Zhi Hong, ICN R&D, 3300
Hyland Avenue, Costa Mesa, CA 92606, nyao@icnpharm.com
Abstract
The parsing and selection of
compounds from commercial suppliers is an integral part of High Throughput
Screening (HTS) efforts in drug discovery. In dealing with multiple suppliers
and regular updates, it is necessary to develop a strategy that avoids duplicate
entries not only among the suppliers, but also between the updates and the
historical collection. An additional issue occurs when suppliers provide
target-specific focused libraries that need specific considerations. We present
a strategy that narrows the number of compounds provided by the suppliers before
subjecting the collection to molecular diversity based calculations. All
structures from suppliers are collected in a Master Database; which also include
duplicate entries. The initial sets of criteria are modifications to Lipinski’s
rule of five; with changes to the rules based on target and candidate-specific
experience gathered at ICN. Structures that filter through this step are further
analyzed for the presence of “reactive groups” that would make the candidate not
suitable to be included. The number of heavy halogen atoms in the structures,
defined as Group 7A elements heavier than Fluorine, is then employed to further
refine the collection of compounds. Similar to the Lipinski’s rules
modifications, the definitions for the “reactive groups” and the number of heavy
halogens are based on the in-house screening experiences. The resulting set of
structures are fingerprinted (in a 2-D sense) using the MACCS keys / Tanimoto
matrix, clustered using the Jarvis-Patrick clustering algorithm. One entry from
each resulting cluster (usually the first member in a multi-structure cluster)
is chosen as being representative of each cluster. The lack of stock quantities
of compounds due to the merger of suppliers or the purchase of suppliers by
pharmaceutical companies can lead to voids in the diversity space of the
intended compound collection. In such cases, candidates from a different
supplier in the same cluster can be chosen. For singleton clusters it then is
necessary to search the master database containing all compounds entries that
are similar (in diversity space) to the particular structure of interest. Purity
and solubility issues in the choice of suppliers for the acquisition process
will also be discussed.
CINF 79: Migrating Chemical Information – a vendors
perspective
Andew Lemon, Chemical Technology Group, IDBS, 2
Occam Court, Surrey Research Park, Guildford GU2 7QB, United Kingdom, Fax: 44
1483 595001, alemon@id-bs.com
Abstract
IDBS provides a flexible and
integrated set of solutions for chemical registration, and reaction knowledge
management. Many of our customers have migrated from existing solutions. This
has required IDBS to add specific features to the design of our product
offerings to support this migration process. In these times of merger and
acquisition many groups are facing the problems of merging information from
multiple vendor systems into one chemical database. This raises a variety of
issues that must be supported by any vendor providing a solution to manage this
data. We present a set of issues raised from our experience in dealing with this
problem and explain some of the solutions we have generated to address these
issues.
CINF 80:
The changing requirements for informatics systems during the growth of a
drug discovery service company
Sally Rose,
Sittingbourne Research Centre, BioFocus plc, Sittingbourne, Kent ME9 8AZ, United
Kingdom, Fax: +44 1795 471123, srose@biofocus.co.uk
Abstract
BioFocus was launched and established
as a public company March 1997. The company raised circa $1.8M when it launched
on the Ofex stock market in London. This is a relatively small amount of money
with which to set up a company offering medicinal and combinatorial chemistry
discovery services to the biopharmaceutical industry. The initial informatics
systems needed to be very cost effective and appropriate to a small company.
They centered on ISIS Base and Accord for Access.
The company has since grown to circa 160 staff. The majority of the growth has been organic, supported by earned income from clients, though some additional finance has been raised (e.g. circa $6M in August 2000 when the company moved to the AIM stock market in London). As the company grew, the demand for a more sophisticated, flexible informatics system increased. We started implementing an Oracle-based system in 2000 to support the chemistry informatics requirements.
BioFocus acquired a biology service company, Cambridge Drug Discovery, in 2001. This brought a new dimension to the business with the addition of HTS and biological assay development services to our portfolio. Needless to say, the informatics systems of the two companies were completely different and access to chemistry and biology information was required company-wide to support full drug discovery projects for our clients.
Major pharma companies have enormous legacy systems and mergers result in a vast amount of work for the informatics departments. Small companies, such as BioFocus, have far less data to worry about, however, they are faced with different challenges; namely, limited budgets and less in-house expertise.
This presentation will discuss the evolution of the informatics systems at BioFocus and describe the chemistry databases we developed to handle medicinal chemistry and combinatorial library data. It will also consider how we approached merging the information from the biology and chemistry groups.
![]()
General Papers
![]()
CINF 81: Managing the analytical workflow – From raw
material to elucidated structure
Euan Dean, and Andrew
Lemon, Chemical Technology Group, IDBS, 2 Occam Court, Surrey Research Park,
Guildford GU2 7QB, United Kingdom, Fax: 44 1483 595001
Abstract
ActivityBase provides an integrated
framework for managing discovery data. We describe a solution to enable
scientists to utilize the test, workflow and results management capabilities of
the ActivityBase environment to manage spectral and chromatographic data. This
manages not only the capturing and processing of data, but also the organization
of data collection. The ActivityBase test management module provides features
for the generation and tracking of a set of requests for services. This has been
applied to analytical services, supporting sample management, analytical
requests, collection and processing of the spectral data generated, validation
and elucidation of structural information leading to a confirmed structure. This
includes integration of spectral management software within the ActivityBase
framework.
CINF 82: Predicting reaction parameters for library
synthesis accelerated by an in-house reaction database
László
Ürge, Gábor Põcze, Anna Gulyás-Forró, Ferenc Darvas, and György Dormán,
ComGenex Inc, 33-34 Bem Rkp, Budapest H-1027, Hungary,
dgyorgy@comgenex.hu
Abstract
Until recently organic chemists have
derived their knowledge about chemical reaction conditions by inductive learning
from observations on a sequence or series of individual chemical reactions.
During experimental design optimal reaction parameters were estimated by analogy
of the individual synthesis of related compounds. Initially, for high throughput
parallel synthesis of combinatorial libraries the above estimation from
individual reactions was generalized applicable and in some cases stochastic
simulation methods were also successfully used [1]. However, the rapid
advancement of combinatorial chemistry allowed an accumulation of data on
various aspects including organic reactions. The analysis of the large datasets
and exploiting their predicting power is an emerging area in the combinatorial
sciences. During the recent years ComGenex has applied a large number of
chemical reactions and automated them using parallel synthesis stations yielding
several hundred thousand compounds. All the protocols, reaction parameters
including structures, experimental conditions, yield, success rate and
analytical results are stored in a searchable format using ComGenex proprietary
SQL based information technology system. The large datasets enable a more
accurate prediction of the optimal parameter set for synthetic matrix planning
taking into account the chemical nature of the reagents in various reaction
types. In the presentation non-algorithmic components of the prediction as well
as quantitative elements are presented. Qualitatively, for each reagent class a
reagent fingerprint can be determined based on the observed reactivity with
different substitution pattern in different types of reactions, and reaction
families can be identified finding similar fingerprints. Quantitatively, the
reaction database is appropriate to apply mathematical algorithms to calculate
the optimal chemical parameter sets important for the most efficient performance
of the high throughput, parallel synthesis.
[1] Darvas F and Kovács L, CMT: A solution phase combinatorial approach. Synthesis and yield prediction of phenazines. In: High-Throughput Screening (Ed. Devlin JP), pp. 223-242. Marcell Dekker Inc., New York, 1997.