#224 - Abstracts

ACS National Meeting
August 18-22, 2002
Boston, MA

9:00 1 Ultra High Throughput Screening using THINK on the Internet
E Keith Davies, Department of Chemistry, Oxford University, Central Chemistry Laboratory, South Parks Road, Oxford, United Kingdom, Fax: +44 1865 275905, Keith.Davies@Chem.ox.ac.uk, and Catherine J Davies, Treweren Consultants Ltd - SLIDES

The growth in the collections of small molecules available for experimental testing prompts selection of subsets and has stimulated the question "how many drug-like molecules are there?". In the CAN-DDO project we harnessed the power of over 1 million volunteered PCs to screen 3.5 billion drug-like molecule against 12 protein targets of relevance to cancer therapy. The development of the THINK software, its adaptation to run as a screen-saver and some of data management issues will be described in this paper.

9:30 2 Next steps for virtual screening and massively distributed computing.
Davin M. Potts, United Devices, Inc, 12675 Research Blvd., Building A, Austin, TX 78759, Fax: 512-331-6235, davin@ud.com

The recent massive distributed computing project led by W. Graham Richards' team (Oxford) to perform virtual screening of 3.5 billion drug-like molecules against a series of protein targets has identified a significant number of promising, novel small molecules which warrant further investigation and refinement into prospective drug candidates. The successful involvement of the general public (1.5 million PCs participating on the internet to date capable of producing a sustained compute power in excess of 60 teraflops) in this pioneering scientific endeavor has demonstrated the magnitude and viability of available untapped compute power for drug discovery efforts. With the recently announced availability of state of the art screening tools (e.g. LigandFit) on such distributed computing platforms comes the opportunity and challenge for pharmaceutical and drug discovery companies to apply this combination of tools to their internal development efforts. We will discuss the next steps in improving the quality of the findings from the first stage of the project led by Oxford, the continuing need for and role that distributed computing will play, and the relevance to commercial pharmaceutical discovery.

10:00 3 Evaluating protein-ligand interactions through flexible docking.
Tad Hurst, ChemNavigator, 6166 Nancy Ridge Drive, San Diego, CA 92121, Fax: 858-625-2377, thurst@chemnavigator.com

The recent massive distributed computing project led by W. Graham Richards' team (Oxford) to perform virtual screening of 3.5 billion drug-like molecules against a series of protein targets has identified a significant number of promising, novel small molecules which warrant further investigation and refinement into prospective drug candidates. The successful involvement of the general public (1.5 million PCs participating on the internet to date capable of producing a sustained compute power in excess of 60 teraflops) in this pioneering scientific endeavor has demonstrated the magnitude and viability of available untapped compute power for drug discovery efforts. With the recently announced availability of state of the art screening tools (e.g. LigandFit) on such distributed computing platforms comes the opportunity and challenge for pharmaceutical and drug discovery companies to apply this combination of tools to their internal development efforts. We will discuss the next steps in improving the quality of the findings from the first stage of the project led by Oxford, the continuing need for and role that distributed computing will play, and the relevance to commercial pharmaceutical discovery.

10:30 4 Docking of diverse ligands to diverse protein sites: six degrees of application.
Teresa A. Lyons1, Michael Dooley2, Anne-Goupil Lamy1, Sunil Patel3, Remy Hoffmann4, Hughes-Olivier Bertrand4, and Marguerita Lim-Wilby5. (1) Accelrys Inc, 200 Wheeler Road, South Tower, 2nd Floor, Burlington, MA 01803-5501, Fax: (781) 229-9899, txl@accelrys.com, (2) Accelrys KK, (3) Accelrys Ltd, (4) Accelrys, (5) Lead Identification and Optimization, Accelrys Inc

The utility of a docking application in the virtual screening of libraries prior to biological assay or custom synthesis is dependent on its ability to predict ligand affinity over a wide pKi range at a specific binding site. The definition of the binding site of interest thus becomes the most critical step in setting the stage for docking.

Examples will be presented of straightforward docking/screening cases, as well as difficult cases, such as proteins with extremely large potential binding sites, proteins with induced fit or allosterism, protein/ligand complexes with alternate ligand conformations from Xray crystal structures. In between are the “tunable” cases where user settings and protein preparation are critical: highly flexible ligands, steric problems or clashes, and local flexibility in the binding site. Finally we will summarize the classes of proteins and types of ligands for which LigandFit will perform suitably as a vHTS tool.

11:00 5 eHiTS: Novel algorithm for fast, exhaustive flexible ligand docking and scoring.
Zsolt Zsoldos1, A. Peter Johnson2, Aniko Simon1, Irina Szabo1, and Zsolt Szabo1. (1) Research and Development, SimBioSys Inc, 135 Queen's Plate Dr, Unit 355, Toronto, ON M9W 6V1, Canada, Fax: 416-741-5083, zsolt@simbiosys.ca, (2) ICAMS, School of Chemistry, University of Leeds

The flexible ligand docking problem is often divided into two subproblems: pose/conformation search and scoring function. For virtual screening the search algorithm must be fast; must provide a manageable number of candidates; and be able to find the optimal pose/conformation of the complex. Algorithms employing stochastic elements or crude rotomer samplings fail to satisfy the last criterion. The eHiTS (electronic High Throughput Screening) software offers new approaches to both subproblems. The search algorithm is based on exhaustive graph matching that rapidly enumerates all possible mappings of interacting atoms between receptor and ligand. Then dihedral angles of rotatable bonds are computed deterministically as required by the positioning of the interacting atoms. Consequently, the algorithm can find the optimal conformation even if unusual rotomers are required. The scoring function contains novel treatment of weak hydrogen bonds, aromatic pi-stacking and penalties for conflicting interactions. Validation results on over 300 complexes will be presented.

11:30 6 Effect of electrostatic models on the accuracy of ligand docking.
Philip W. Payne, Consultant in Computational Chemistry, 660 Santa Paula Avenue, Sunnyvale, CA 94085-3416, Fax: none, PAYNES@PACBELL.NET

Clustered ensembles of various ligands bound to the estrogen receptor were built by systematic search for energetically favored ligand positions. Four different electrostatic models were employed: point charge with constant dielectric, point charge with cubic spline cutoffs in the range 8-10 Å, point charge with cubic spline cutoffs in the range 10-12 Å, and a sigmoidal dielectric screening model.

Compared to the constant dielectric model without cutoffs, dramatic shifts in the energy spectra of ligand clusters and the positions of bound ligands were observed when calculations were done with either Coulomb distance cutoffs or the sigmoidal screening model. Subsequent analysis demonstrated that the distance cutoffs or sigmoidal screening cause chaotic instability of the electric field in the binding site. Distance cutoffs for Coulomb interactions should therefore be avoided in the study of protein-ligand interactions unless such cutoffs are uniformly applied to all atoms in each polar bond.

9:00 7 Integration Continuum...different strokes for different folks.
Kirk Schwall, Manager, Authority Database Operations, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: (614) 447-5471, kschwall@cas.org, and Eileen M. Shanbrom, Manager, CAS and Web Content, Chemical Abstracts Service

Scientists are challenged in today’s environment to locate the right information in a sea of information that incorporates both traditional and web services offered by government agencies and others. Information providers should move away from viewing this changing environment as a conflict of "traditional" resources versus the web. Information consumers want both available in integrated services. These services must provide access to the right information at the right time in the context of scientific research. This is especially relevant for producers of STM database and search-and-retrieval services, because scientists will be the biggest users of services that integrate web content with professionally built databases. Recent developments by a number of information providers offer good examples of what is now possible and necessary. BLAST searching for identifying sequences was originally a free service provided by the U.S. government but BLAST has now been incorporated into proprietary services that integrate the identification of genes, proteins and other biological entities with the retrieval of related literature and patents. To an increasing extent, a solid foundation for building the new "digital research environment" rests on three building blocks: professionally produced databases, value-added search tools, and the web.

9:30 8 Hindsight is an exact science.
Jeremy N Potter1, Chris Hardy1, Robert D. Brown2, and Julian Hayward2. (1) Accelrys, Inc, 9685 Scranton Road, San Diego, CA 92121-3752, jeremyp@accelrys.com, (2) Accelrys Inc

The value of knowing about work done by others in the field of organic synthesis goes without saying, and providing compilations of such information is the basis of the many reaction databases available on the market today. These range in size from a few hundred to over 10 million reactions and can typically be characterised as selective, thematic or comprehensive in nature. Almost without exception, however, such databases focus only on those reactions that have a successful outcome, with the goal of allowing chemists to search for tried and trusted methods that have been presented in the literature. However, our own experience of life tells us that it can be just as valuable to know in advance that something will not work. This paper will outline the ways in which such knowledge of synthetic 'failures' is used in the pharmaceutical industry, and will introduce a database of such reactions.

10:00 9 Bridging the gap between published and proprietary spectroscopic databases: an informatics system case study.
Gregory M. Banik, Informatics Division, Sadtler Software & Databases, Bio-Rad Laboratories, 3316 Spring Garden Street, Philadelphia, PA 19104-2596, Fax: 215-662-0585, gregory_banik@bio-rad.com, and Ty Abshear, Informatics Division, Sadtler Software and Databases, Bio-Rad Laboratories

A new informatics system, the KnowItAllTM Informatics System, is described that bridges the gap between published and proprietary spectroscopic information. KnowItAll offers the world's largest published collection of analytical information and the ability to manage multiple spectra and chromatograms, including 13C and 1H NMR, IR, Raman, MS, GC, and UV/Vis, along with the corresponding chemical structures and related property information. KnowItAll allows users to create their own databases and search them seamlessly with databases of published spectra as well as databases of reference spectra. Cross referencing from one analytical technique to another is also seamless, as is the use of user-assigned NMR spectra in the database-based prediction of NMR spectra. Finally, web addresses or file names can be added, either to published databases or proprietary databases, to permit linking to related documents that are outside the KnowItAll system.

10:30 10 Developing value-added organic chemistry databases from traditional print products.
Darla Henderson, and Colleen Finley, John Wiley & Sons, Inc, 605 Third Avenue, New York, NY 10158, dhenders@wiley.com

Various chemical databases, primarily abstracted databases, have existed in the chemical information business for three-plus decades. This presentation discusses the development and features offered by John Wiley & Sons in their newly released and developing chemical reaction databases. Wiley chemical reaction databases focus on offering the full content of a product, as opposed to abstracted data found in most other reaction databases, yet including the value-added features customers prefer, such as reaction searching and interoperability among databases. Critical issues, such as developing a product amenable to both the academic and corporate customers are discussed.

11:00 11 Linking reaction information from different sources.
Guenter Grethe1, Peter Loew2, Hans Kraut2, and Josef Eiblmaier2. (1) Marketing/Scientific Applications, MDL Information Systems, Inc, 14600 Catalina Street, San Leandro, CA 94577, Fax: 510-614-3616, guenter@mdli.com, (2) InfoChem GmbH

Collecting required relevant information to solve a synthetic problem is a formidable task. Unless the chemist is interested in the preparation of a known compound, it almost never is straightforward. The process usually involves consulting more than one source and going back and forth between different sources to find the most relevant answers. This can be a very time-consuming process in the hardcopy world and confusing when available electronic sources require the use of different programs. Today’s technology allows linking of information using point-and-click rather than cut-and-paste methodology. In the reaction world, the linking must foremost be based on reaction type rather than the similarity of participating molecules. As a first step in this direction we have developed a system that seamlessly links information from reactions of similar type described in reaction databases and major reference works. The latter provide important complementary information, including discussions about reaction mechanism, stereochemistry, the most suitable reagent or catalyst, and others. Linking the references from both databases and books to the primary literature augments the integration. We will describe the underlying concept of the system and demonstrate the usefulness with examples from the recent literature.

  12 Reoptimization of MDL keys for use in drug discovery.
Keith T Taylor, Joseph L. Durant Jr., Burton A Leland, Douglas R Henry, and James G Nourse, MDL Information Systems, 14600 Catalina Street, San Leandro, CA 94577, keitht@mdli.com - SLIDES

The use of keysets based on a variety of different descriptors has an established place within the drug discovery workflow. MDL’s keysets were optimized for substructure searching, however, they do have performance for clustering and diversity analysis comparable with keysets based on feature trees. We will present an overview of the underlying technology supporting the definition of features in MDL’s keysets, and encoding them into keysets. Construction of a keyset containing all possible combinations of our set of defined features with occurence counts of one or more has been carried out. Standard deviations of a few percent were observed in the clustering performance of populations of similarly sized keysets. Additionally, performance is seen to be relatively insensitive to keyset size, especially for keysets larger than 1000 bits. We have also examined a variety of strategies to construct keysets, the performance and relative merits of these strategies will be discussed.

  13 Strategies for Lead Discovery Oriented Virtual Screening.
Tudor I. Oprea, EST Chemical Computing, AstraZeneca R&D Molndal, Molndal S-43183, Sweden, Fax: 46 (0)31-776-3792, tudor.oprea@astrazeneca.com

Large numbers of virtual compounds can be evaluated in silico via Virtual Screening (VS). Some properties can be readily evaluated prior to enumeration from reactants. However, binding affinity estimates require enumerated structures. The Lipinski rule of five, the standard property filtering protocol for VS, was derived from drugs (not leads). For lead discovery oriented VS, this protocol needs to be shifted toward lower molecular weight, lower hydrophobicity and higher solubility, in order to capture high quality leads. Possible VS strategies with respect to optimizing binding affinity and pharmacokinetic properties are discussed.

  14 Application of pharmacophore fingerprint keys to structure-based design and data mining.
Marvin Waldman, Moises Hassan, Chien-Ting Lin, Shashidhar N. Rao, and C. M. Venkatachalam, Accelrys, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, marvin@accelrys.com

By combining technology from Ludi and Catalyst, we are conducting studies on the use of active site based pharmacophores for mining databases of compound collections for the purpose of lead identification. In contrast to more conventional approaches using 3D pharmacophore searching techniques, we explore the use of similarity comparisons of 3D fingerprint maps of the active site and candidate ligands as a means of prioritizing ligands for real or virtual high throughput screening. Various alternative approaches will be examined including the effects of using binary vs. occurrence counts representation for pharmacophore keys, the use of different similarity metrics, and the use of different pharmacophoric feature types including donor and acceptor projected points. Data mining studies conducted on several protein systems will be presented and analyzed in terms of the effectiveness of recovering known seeded actives from a larger ligand pool using the various approaches outlined above.

  15 Quasi2: Virtual site model derivation and application to lead identification.
David G. Lloyd, Nicholas C. Perry, Nikolay P. Todorov, Iwan J. P. de Esch, and Ian L. Alberts, De Novo Pharmaceuticals, Compass House, Vision Park, Histon, Cambridge CB4 9ZR, United Kingdom, Fax: +44-(0)1223-238088, david.lloyd@denovopharma.com - SLIDES

Traditional pharmacophore models define the minimum requirements for activity, but not necessarily the optimum conditions. Quasi2 produces virtual site models by optimising the molecular similarity within a set of ligands with respect to those features known to be important in binding to biomolecular targets as a function of ligand conformation, ionisation state and tautomeric state. The use of virtual site models in database searching bridges the gap between pharmacophore screening and high-throughput docking for targets on which structural information is limited or unavailable. Quasi2 virtual site models have been validated experimentally, through the design of active compounds ‘tailored’ to the virtual site features and computationally, through accurate binding mode predictions for known actives and enhanced hit-rates in high-throughput database screening.

  16 Identification of Potent and Novel a4b1 Antagonists using In Silico Screening.
Juswinder Singh1, Steve Adams1, Wen-Cherng Lee1, and Herman van Vlijmen2. (1) Structural Informatics, Biogen Inc, 12 Cambridge, Cambridge, MA 02142, Fax: 617-679-2616, juswinder_singh@biogen.com, (2) Biogen, Inc

a4b1(VLA-4) plays an important role in the migration of white blood cells to sites of inflammation, and has been implicated in the pathology of a variety of diseases. We describe a series of potent inhibitors of a4b1 that were discovered using computational-based screening for replacements of the peptide region of an existing tetrapeptide-based a4b1 inhibitor (1; 4-[N'-(2-methylphenyl)ureido]phenylacetyl-Leu-Asp-Val) derived from fibronectin. The search query was constructed using a model of 1 that was based upon the X-ray conformation of the related integrin-binding region of VCAM-1. The 3D search query consisted of the N-terminal cap and the carboxyl side chain of 1 since based upon existing structure-activity data on this series, these were known to be critical for high-affinity binding to a4b1. The computational screen identified 12 reagents from a database of 8624 molecules as satisfying the model and our synthetic filters. All of the synthesized compounds tested inhibit a4b1 association with VCAM-1, with the most potent compound having an IC50 of 1 nM, comparable to the starting compound. Using CATALYST, a 3-D QSAR was generated that rationalizes the variation in activities of these a4b1 antagonists. The most potent compound was evaluated in a sheep model of asthma, and a 30mg nebulized dose was able to inhibit early and late airway responses in allergic sheep following antigen challenge, and prevented the development of nonspecific airway hyper-responsiveness to carbachol. Our results demonstrate that it is possible to rapidly identify non-peptidic replacements of integrin peptide antagonists. This approach should be useful in identification of non-peptidic a4b1 inhibitors with improved pharmacokinetic properties relative to their peptidic counterparts.

  17 Unified virtual ADME/Tox using a hierarchy of machine learning models.
Guido Lanza, and William Mydlowec, Pharmix Corp, 200 Twin Dolphin Drive, Suite F, Redwood Shores, CA 94065, guido@pharmix.com

We present a unified virtual ADME/Tox system based on a hierarchy of machine learning models. All compounds are initially subjected to 3-D multi-conformer analysis, and numerous molecular descriptors are calculated, both conformationally-specific and not. A hierarchy of models based on these descriptors is then used to predict various physiochemical and pharmacokinetic properties. We describe a series of models, including: solubility, octanol/water partition coefficient, human intestinal passive absorption, intestinal transporter binding, P450 and related enzyme interactions, blood-brain barrier permeability, plasma protein binding, and serum transporter binding. We also describe predictive models of oral bioavailability, volume of distribution, and clearance, as well as limited models involving mechanism-of-action.

  18 Application of 1D-similarity analysis to predict plausible modes of CYP-450 metabolism.
Chaya Duraiswami, Molecular Modeling, Pharmacopeia, Inc, CN 5350, Princeton, NJ 08543, Fax: 732-422-0156, cduraisw@pharmacop.com, Steven L. Dixon, ADMET R&D Group, Accelrys, and John J. Baldwin, Concurrent Pharmaceuticals, Inc

A computationally fast, semi-quantitative, visual method to predict the plausible mode of CYP 450 metabolism based on 1D-Similarity Analysis to known inhibitor, inducers and substrates of CYP-3A4, CYP-2C9 and CYP-2D6 will be presented. The advantages of this method include rapid detection of the possibility of drug-drug interactions, as well as predicting a plausible mode of metabolic degradation for each test compound. Since this method is semi-quantitative and fast, predictions for large combinatorial libraries as well as virtual libraries can be made in a predictive and timely fashion, making this approach a useful computational ADME filter. The results of this method as applied to a set of chemokine inhibitors will be presented.

  19 Exact chemical structure batch mode searches.
Christopher A. Lipinski, Exploratory Medicinal Sciences, Pfizer Global Research and Development, Groton Laboratories, Eastern Point Road, mail stop 8200-36, Groton, CT 06340, Fax: 860-715-3149, christopher_a_lipinski@groton.pfizer.com

Chemistry structure searching tools lag behind those of biology and genomics. Specifically, chemical structures can easily be searched within corporate databases but it is very difficult for chemists to perform structure searches on the external literature. Currently a chemist cannot simply copy a structure from an ISIS/Base corporate database and use it to search chemical abstracts service (CAS) SciFinder. The same holds for a chemical structure from a virtual library sdf file. The search has to be performed by manually drawing in a chemical structure as a search query. Exact chemical structure searches cannot be done in batch mode. For example, one cannot search SciFinder for twenty-five chemically unrelated structures at a time. It is generally unrecognized that the tools are in place for chemists to solve this problem. Three software licenses are required: SciFinder from CAS; Accord for Excel from Accelrys and Name from Advanced Chemistry Development.

  20 Integration of disparate data sources from genomics to chemistry.
Robert D. Brown, and David Benham, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121, rbrown@accelrys.com

Abstract text not available.

  21 How to build and deploy chemoinformatics applications.
Louis J. Culot Jr., CambridgeSoft Corporation, 100 Cambridge Park Drive, Cambridge, MA 02140, lculot@cambridgesoft.com - SLIDES

Rapid development tools and practices have been used by many industries to develop and deploy informatics applications. However, the chemical community has been slow to adopt these tools because of dependencies on specialized technology for handling chemical data. With the recent availability of new technologies for chemistry, such as Java and Active-X clients, ODBC chemical drivers, and chemical Oracle Cartridges, these barriers are removed, and the chemical community can take advantage of the rapid development tools and practices available to the broader market. I review the technologies and practices, provide a framework for managing rapid-development projects, and provide a case study and example of building such an application.

  22 Hybrid methodologies for pKa prediction and database selection.
Mark J. Rice1, Ryan T. Weekley1, William K. Ridgeway2, and Paul A. Sprengeler1. (1) Structural Group, Celera Therapeutics, 180 Kimball Way, South San Francisco, CA 94080, Fax: 650-866-6654, mark.rice@celera.com, (2) University of California, Berkeley

We have developed a new methodology for pKa prediction combining empirical prediction methods with an experimental database. For any compound, the nearest experimental values from the database are used to correct the predicted value. In order to quantify similarity, we have developed a novel site-specific fingerprint based in chemical graph theory. We believe this approach offers a trainable pKa predictor especially suited to series of compounds.

8:30 23 Rule-based two-layer model for virtual high throughput screening.
Ruediger M. Flaig1, Thomas F. Kochmann2, and Roland Eils2. (1) Institute for Pharmaceutical Technology and Biopharmacy, University of Heidelberg, Im Neuenheimer Feld 366, Fax: +49 4075110-17171, flaig@sanctacaris.net, (2) Intelligent Bioinformatics Systems, German Cancer Research Center, Im Neuenheimer Feld 280, Fax: +49-6221-42-3620, t.kochmann@dkfz-heidelberg.de

Science is producing vast amounts of data from which relevant knowledge has to be extracted, a process for which suitable tools still have to be developed. A universal tool to this end would use a set of rules which it can extend on its own. It requires two layers of processing: (1) subsymbolic processing (implemented in C, C++ or Java) for transforming raw data into information, (2) symbolic processing (implemented in Haskell, Miranda or ML) for extracting knowledge from the “predigested” information. Subsymbolic processing consists largely of deconstructing source data into patterns to be distributed over multiprocessor systems, yielding an array of summary lists (abstraction). Symbolic processing evaluates these lists by further application of the underlying rules. To start, we need a primary set of rules, the bootstrap rules (Kant: “a priori”), as opposed to the deduced rules (“a posteriori”) identified by the system. The rules can be extended by employing the knowledge gathered before, leading to a “rising spiral”.

9:00 24 DNA decompiler for the establishment of bootstrapping rules.
Thomas F. Kochmann1, Ruediger M. Flaig2, Christian Busold3, and Roland Eils1. (1) Intelligent Bioinformatics Systems, German Cancer Research Center, Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany, Fax: +49-6221-42-3620, t.kochmann@dkfz-heidelberg.de, (2) Institute for Pharmaceutical Technology and Biopharmacy, University of Heidelberg, Im Neuenheimer Feld 366, D-69120 Heidelberg, Germany, Fax: +49 4075110-17171, flaig@sanctacaris.net, (3) Functional Genome Analysis, German Cancer Research Center

Generally, DNA-analysis is based on empirically gathered knowledge („deduced rules“), especially sequence-sequence comparisons. By contrast, the possibility of identifying rules from single sequences without resorting to empirical knowledge has not been fully exploited yet. We propose a tool for extracting knowledge purely from a single DNA sequence. In the decompiler algorithm, any dependency is estimated stochastically, by calculating its relative information content. Such a dependency may consist of specific nucleotide arrangements and neighborhood relationships. It can be determined for any given sequence, thus providing a universal mechanism for bootstrapping autonomous knowledge systems. This knowledge can be extended by deductive evolutionary algorithms, self-organizing into higher-level systems. Here categorical dependency relations between subparts determine Darwinian selection of the most relevant interactions. These autonomous virtual systems can be integrated into the actual scientific process, thus initializing a superimposed knowledge extraction spiral.

9:30 25 Application of chemometric and QSAR approaches to scoring ligand-receptor binding affinity.
Alexander Tropsha1, Jun Feng2, Alexander Golbraikh1, Curt Breneman3, Wei Deng4, and Nagamani Sukumar3. (1) Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, alex_tropsha@unc.edu, (2) Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill, (3) Department of Chemistry, Rensselaer Polytechnic Institute, (4) Department of Chemistry, RPI - SLIDES

59 diverse ligand receptor complexes have been analyzed in multidimensional chemical descriptor space. TAE/RECON descriptors of steric and electronic properties were calculated for active site atoms and ligand atoms independently. For all pairs of ligand receptor complexes, the Euclidean distances between active sites in TAE/RECON descriptor space correlated linearly with the distances between complementary ligands (R2=0.8). Concurrently, k-nearest-neighbor (kNN) variable selection QSAR procedure was applied to ligands only using binding affinity as a target property and normalized MolconnZ descriptors as independent variables. Training and test sets of different size were generated, and multiple models have been built. The best model afforded leave-one-out cross-validated R2 (q2)=0.74 for the training of 50 compounds and predictive R2=0.85 for the test set of 9 compounds. Chemometric and QSAR approaches to the analysis of ligand-receptor interactions provide an important addition to current methodologies that rely on direct use of 3D molecular structures.

10:00 26 Evaluation of ligand-receptor binding affinity with a novel statistical scoring function derived from Delaunay tessellation of protein-ligand interface.
Alexander Tropsha, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, alex_tropsha@unc.edu, and Jun Feng, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill - SLIDES

A novel statistical contact scoring function for calculating ligand receptor binding affinity has been derived by the means of Delaunay tessellation. Given the full atom representation of protein ligand interface, Delunay tessellation generates a set of non-overlapping, space-filling tetrahedra or simplices, which rigorously define nearest neighbor atoms in sets of four vertices. For every quadruplet composition of ligand and receptor atom types found at the protein-ligand interface, a log likelihood factor is obtained from the statistical geometry analysis of 317 complexes. For 67 diverse protein-ligand complexes, the linear regression correlation between four-body scoring function and experimental binding affinity is characterized by R of 0.67. The combination of four-body contact scoring and two-body distance dependent potential of mean force affords R of 0.84. This novel scoring function can be used for rapid evaluation of binding affinity of ligand orientations obtained with various docking algorithms.

10:30 27 Massive Virtual Library (MVL) Screening at Biogen: An Integrated Approach From Medicinal Chemistry Design to Decision.
Donovan N. Chin, Claudio Chuaqui, Herman van Vlijmen, Xin Zhang, Russell Petter, and Juswinder Singh, Structural Informatics, Biogen, 14 Cambridge Center, Cambridge, MA 02142, donovan_chin@biogen.com

This talk will describe our integrated approach to virtual chemistry design, screening, and analysis of very large small-molecule libraries. We are developing an enterprise wide system that puts virtual-chemistry design capabilities on the desktops of medicinal chemists; links these designs with high throughput parallel computing methods for docking, shape-based screening, and statistical modeling; and finally presents the promising “hits” on the web through a series of custom pattern recognition methods and binding mode visualizations. By integrating the medicinal chemist into the virtual screening process, we are combining their ability to design new drug like compounds with molecular modeling and high performance computing. While throughput can be increased with more compute resources, we have also designed a system to handle the massive amount of information from the virtual screens and arrive at decisions quickly, which is essential for impacting projects with tight timelines. As the system evolves, we are developing “smart” library design rules that further enhance the value of the MVL at Biogen. The MVL is a key component that integrates and maximizes information and technologies from medicinal chemistry, structural biology, screening, and pharmacology. We will discuss the successes and failures, and the lessons learned in developing the MVL system in a pharmaceutical environment.

11:00 28 Fuzzy logic based focused libraries (FL/FL) for HTS screening: application to anti-carcinogenic compounds.
Jacques R. Chretien1, Marco Pintore1, Nadège Piclin1, and Frederic Ros2. (1) BioChemics Consulting, Centre d'Innovation, 16, rue Leonard de Vinci, Orleans cedex 2 45074, France, Fax: + 33 2 38 41 72 21, jacques.chretien@univ-orleans.fr, (2) Chemometrics & BioInformatics, University of Orleans


11:30 29 Moore's Law and the future of virtual screening.
William Mydlowec, Pharmix Corp, 200 Twin Dolphin Drive, Suite F, Redwood Shores, CA 91898, Fax: (650) 637-0199, bill@pharmix.com

This talk discusses virtual screening in the context of Moore’s Law, which projects that the number of transistors on an integrated circuit will double approximately every 18 months. We first discuss the implications of exponentially-increasing computing power on current-generation virtual screening technologies. For example, computers are more than 1000x faster than they were in 1987, yet algorithms of that era continue to dominate in simulation, optimization, and modeling in computational chemistry. We propose future directions and new algorithms based on recent advances in computer science and electrical engineering. We then project the impact of Moore’s Law on virtual screening several decades into the future, using metrics such as: cost/time/number of screens, volume/complexity/duration/accuracy of atomistic simulations, etc. We also consider relevant engineering issues, including development of multimillion-line software codebases, construction of >10,000 CPU supercomputers and multi-petabyte databases, and other large-scale issues.

8:35 30 100 years Houben–Weyl Methods of Organic Chemistry: Entering the New Millennium.
Guido F. Herrmann, Rolf Hoppe, and Kristina Kurz, Thieme Publishers, Rüdigerstrasse 14, Stuttgart 70469, Germany, Fax: +49 711 9831 777, guido.herrmann@thieme.de

The availability of scientific information in electronic format has significantly changed the way we select relevant information sources. Time matters! Information that is not accessible at the researcher’s desk-top will be overlooked simply because the library is a walk away and other resources compete for a researcher's attention. But a highly competitive environment in industry and academia makes knowledge and efficient access to it an important performance driver. Houben-Weyl is the standard reference work in synthetic chemistry since 1909 and comprises four editions, 140 volumes and roughly 160,000 pages.

Thieme (www.thieme-chemistry.com) chose to accept the challenge to convert 100 years of methodology information in the field of organic chemistry into a convenient and user-friendly online system. The complete series is now available in electronic format, featuring an interactive table of contents, key word search, using a controlled vocabulary, as well as a graphical interface.

9:00 31 Building digital archives for scientific information.
Leah R. Solla, Physical Sciences Library, Cornell University, 293 Clark Library, Cornell University, Ithaca, NY 14853-2501, Fax: 607-255-5288, lrm1@cornell.edu

Researchers, librarians, and publishers have valid concerns about the long-term preservation of digital information. There are many issues to be addressed in the formation of a trusted digital archive. Some parallel the more familiar preservation of print material, such as duplication and sustainability. LOCKSS (Lots Of Copies Keeps Stuff Safe) is a new acronym for an old practice in the print world of independently maintained and widely distributed collections. Digital preservation requires duplication; managed and distributed duplication is even better. Effective digital preservation models need to be self-sustaining, and adhere to format standards. The digital world does not respect traditional borders (political, corporate, publisher, content, etc.). The roles of stakeholders are changing in the digital realm. Publishers have often been the sole controllers of information, but increasingly there are authors, government agencies and other players in control. Until recently the library has been the archive and access provider, but publishers and other players are now active participants in digital preservation and access. The academic research library community is investigating a digital preservation role akin to their traditional role in print, subject based archiving. Archiving across subject areas in the academic environment complements the archiving approach of publishers in the competitive market environment. This paper will review a variety of digital preservation projects in the sciences.

9:25 32 Digital Archiving: Experiences of a major commercial publishing house.
C. Amanda Spiteri, ScienceDirect, Elsevier Science, Molenwerf 1 1014 AG, Amsterdam, Netherlands, c.spiteri@elsevier.com

Assuring the preservation of digital information is one of the highest priorities for libraries and publishers alike, particularly as more and more libraries go "electronic only" and the accessibility of traditional paper copies is reduced. For part of the life cycle of scientific information, commercial publishing practices support the most cost efficient means of maintaining electronic access to current information. For other parts of the cycle, digital preservation and access responsibilities must be supported by a designated agent. Elsevier Science has been a leader in the digital archiving of electronic journals through development of services like ScienceDirect. We continue to develop our experience in archiving issues such as policy, partnership relations, technology and creation of the digital archive itself. This presentation will cover some leading initiatives in these areas and give examples of how Elsevier Science currently addresses these challenges.

9:50 33 DSpace: MIT's Digital Repository.
Margret Branschofsky, MIT Libraries, Massachusetts Institute of Technology, Bldg. 10-500, MIT, Cambridge, MA 02139, Fax: 617-452-3000, margretb@mit.edu

DSpace, an MIT Libraries project sponsored by Hewlett-Packard Labs, is a digital repository that captures, stores and distributes the various digital products of MIT faculty and researchers. The repository will collect preprints, articles, working papers, technical reports, datasets, images, video and audio content. This web-based system provides 1)a flexible submission process for MIT contributors that captures both metadata and content files, 2)storage and preservation services for a variety of file formats, and 3)powerful search and retrieval capabilities for end users. The presentation will review DSpace design features, organizational issues surrounding development of the system in an institutional setting, and policy issues arising from implementation of the system. A review of the beta-testing experience with early adopters will also be provided.

10:15 34 Implementing the Physical Review Online Archive (PROLA)..
Mark D. Doyle, Journal Information Systems, American Physical Society, 1 Research Road, P. O. Box 9000, Ridge, NY 11961, Fax: 631-591-4147, doyle@aps.org

The American Physical Society has recently completed digitizing all of our journal content back to its start in 1893. This content is available online as the Physical Review Online Archive (PROLA) at http://prola.aps.org/. The archive contains 1.6 million scanned pages for almost 300,000 articles. All bibliographic information and all reference sections have been captured in XML allowing PROLA to offer all of the features expected in a modern electronic journal. We describe the history, building, and implementation of the archive as well as some of the business concerns in making it available.

10:40 35 Journey from books to analytical informatics.
Marie Scandone, Informatics Division, Bio-Rad Laboratories, Inc, 3316 Spring Garden Street, Philadelphia, PA 19104, Fax: 215-662-0585, marie_scandone@bio-rad.com, and Deborah Kernan, Informatics Division, Bio-Rad Laboratories


11:05 36 LOCKSS: Lots of copies keeps stuff safe.
Vicky Reich, and Grace Baysinger, HighWire Press, Stanford University Libraries, 1454 Page Mill Rd, Stanford, CA 94305-8400, Fax: 650-725-4902, vreich@stanford.edu, graceb@stanford.edu

LOCKSS has the potential to become a sustainable, affordable, preservation tool and archiving system for web delivered information. LOCKSS software systematically caches content in a self-correcting P2P network. The current beta test has demonstrated that the underlying LOCKSS technology works, and in a production environment is likely to allow libraries to maintain high integrity persistent caches of electronic content from journal subscriptions. The beta test includes 60 caches at 50 libraries and two scholarly journals. The system has been in continuous operation for over ten months. The fault-tolerance of the system has been amply demonstrated: two beta caches suffered catastrophic disk failures. Both were able to restart with new, empty disks and recover their content automatically. 41 publishers have expressed strong support for the LOCKSS project. The system shows the potential to preserve digital materials with current publishing systems, the cost of entry is low, the payoffs promise to be high.

11:30 37 Combining heterogeneous physical property data sets.
Peter J. Linstrom, Physical and Chemical Properties Division, NIST, Building 221, Room A111, 100 Bureau Drive, Stop 8380, Gaithersburg, MD 20899-0830, Fax: 301-896-4020

The lack of standards for electronic storage of physical property data often makes it difficult to merge data from different data sets. Data sets often employ different conventions for identifying chemical systems, data accuracy, and data quality. It is a challenge for the archivist to insure that the combined data set represents all data in an appropriate manner.

This talk will discuss lessons learned from the development of the NIST Chemistry WebBook (http://webbook.nist.gov/). The data set for this archive consists of the combination of work from several independent contributors. Efforts were made to produce a set that appears homogeneous to users despite its origins. This required the design of a database that was flexible enough to support the various conventions used by contributors. Examples of problems encountered and their solutions will be discussed.

38 Evaluation, Comparison and Successful Application of Virtual Screening Tools.
Romano T. Kroemer1, Joe McDonald2, Douglas Rohrer3, Anna Vulpetti1, Jean-Yves Trosset1, Shashidhar Rao4, John Irwin5, Brian Shoichet6, Colin McMartin7, and Pieter Stouten1. (1) Molecular Modelling & Design, Pharmacia, Discovery Research Oncology, Viale Pasteur, 10, Nerviano 20014, Italy, Fax: ++39-02 4838 3965, romano.kroemer@pharmacia.com, (2) Discovery Research, Pharmacia, (3) Computer-Aided Drug Discovery, Pharmacia, (4) Accelrys Inc, (5) Department of Molecular Pharmacology and Biological Chemistry, Northwestern University, (6) Department of Pharmacology and Biological Chemistry, Northwestern University, (7) Thistlesoft - SLIDES

The latest Pharmacia efforts in validating and comparing virtual screening tools are presented. Two studies were carried out in order to assess the performance of docking programs with respect to reproducing correct binding modes. The first of these studies contained 20 publicly available crystal structures of protein-inhibitor complexes belonging to different protein classes. The second study focused on 20 complexes with the same protein (CDK2/Cyclin A). The docking programs evaluated and compared comprise the latest versions of DOCK (Brian Shoichet’s NWU incarnation), Colin McMartin's QXP, Tripos’ FlexX, CCDC’s Gold, Accelrys’ LigandFit, MolSoft's ICM and the in-house Mosaic2 program. We also present a case study where docking was used in order to identify hits for a project in the absence of HTS. After pre-selection, 3,000 compounds were docked to the target. The top-scoring compounds were inspected visually and 22 molecules were selected. The best binding compound, as verified by NMR screening and isothermal titration calorimetry, had a Kd of 450 nM.

39 Assessing the quality of virtual screening results for combinatorial libraries.
Dennis G. Sprous, Robert D. Clark, Josepph M. Leonard, and Trevor W. Heritage, Research, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241, dsprous@tripos.com

Recent developments in virtual screening tools now make it possible to do enough experiments on the same library to allow critical evaluation of the quality of the results. CombiFlexX incorporates both OptiDock and FlexX(c) methods, and takes advantage of structural redundancies in combinatorial libraries to dramatically speed up docking. Numerous computational experiments can be done in a reasonable period of time, allowing an investigation of the thoroughness of conformational and positional sampling under different protocols and parameters. Metrics and strategies for assessing the quality of the virtual screening results will be presented.

40 Virtual high throughput screening using LigandFit as an accurate and very fast tool for docking, scoring, and ranking.
Marguerita Lim-Wilby1, Jeff Jiang2, Marvin Waldman2, and C. M. Venkatachalam2. (1) Lead Identification and Optimization, Accelrys Inc, 9685 Scranton Rd, San Diego, CA 92121, rwilby@accelrys.com, (2) Rational and Combinatorial Drug Design, Accelrys Inc

The imperative for virtual high throughput screening arises from the availability of multiple targets, millions of compounds in screening libraries, and limited resources for even the best-endowed pharmaceutical enterprises. The docking application LigandFit has been developed to address this need. A suite of algorithms is provided that (1) aids the user in the detection and definition of binding sites, (2) provides various docking modes with user-defined options, and (3) scores dock poses using proprietary and published scoring functions. We will present considerations that affect accuracy in docking & in scoring, as well as the effects of disproportionately large binding sites, extremely flexible ligands, metal ions, and the presence of flexible protein side chains. Recent advances have allowed reasonably large (~50k) ligand libraries to be screened in a matter of hours, such that the bottleneck in virtual screening is no longer docking, but the preparation and analysis of the datasets.

41 EasyDock: a new docking program for high-throughput screening and binding-mode search.
Nikolay P. Todorov1, Ricardo L. Mancera1, Per Kallblad1, and Philippe Monthoux2. (1) De Novo Pharmaceuticals, Compass House, Vision Park, Chivers Way, Histon, Cambridge CB4 9ZR, United Kingdom, Fax: 1223-238088, nikolay.todorov@denovopharma.com, ricardo.mancera@denovopharma.com, (2) Department of Physics, University of Cambridge

We have implemented the stochastic tunneling global optimization method within a ligand docking application software, easyDock. By using a novel multiple ligand copy approach and adding a new hydration penalty function, we have optimized various scoring functions and have achieved excellent results in the prediction of protein-ligand binding modes. We have run easyDock on the GOLD data set of protein-ligand complexes and nearly always found the correct ligand binding mode as observed in the corresponding crystal structures. Furthermore, we have achieved a 76% success rate when searching for the correct binding mode using an energy score criterion. These results show that easyDock can be used effectively both for the high-throughput screening of large datasets of compounds and for searching for the correct binding mode of a given ligand.

42 Glide: a new paradigm for rapid, accurate docking and scoring in database screening.
Thomas A. Halgren1, Robert B. Murphy2, Jay Banks1, Daniel Mainz1, Jasna Klicic2, Jason K. Perry2, and Richard A. Friesner3. (1) Schrödinger, 120 West Forty-Fifth Street, New York, NY 10036, Fax: 646-366-9550, halgren@schrodinger.com, (2) Schrodinger, Inc, (3) Department of Chemistry, Columbia University - SLIDES

Glide uses a novel algorithm for rapid conformation generation that allows an efficient systematic search of conformational space to be performed during docking. A second key to Glide's efficiency is a series of "filters" that rapidly reduce the possible ligand positions and orientations in the search space to a manageable number for detailed examination. In addition, Glide uses a novel GlideScore function for scoring that ensures chemical sensibility by penalizing docked poses that include non-physical juxtapositions of polar and nonpolar groups.

In tests of docking accuracy, Glide achieves root-mean-square deviations between docked and co-crystallized ligand geometries that are half those reported for Gold and FlexX for test sets of 100-200 co-crystallized complexes defined by the developers of these methods. In addition, Glide achieves enrichment factors ranging from 12 to 91 in database screens for 9 diverse receptor systems. Such a high level of reliability is not typical of current-generation docking programs and scoring functions.

43 RACHEL: A new tool for structure-based lead optimization.
Chris M.W. Ho, Drug Design Methodologies, LLC, 700 S. Euclid Ave., St. Louis, MO 63110

Lead optimization is still something of an art. Structural modifications that logically should enhance affinity can decrease it. The time lines can be long, the process uncertain and frustrating, and the progress hit-or-miss. RACHEL is software designed to streamline lead optimization by automated combinatorial optimization of substituents on a lead scaffold. Starting from a ligand/receptor structure, substitutions are systematically done at user-defined points on the ligand core. Custom substituent databases based on in-house sources can be used, allowing the incorporation of enterprise and project experience. The impact of these substitutions on affinity is assessed using RACHEL's general scoring function or a custom scoring function generated by PLS analysis of user-supplied ligand/receptor affinity data. This presentation will discuss RACHEL's unique capabilities along with specific applications that demonstrate its value in lead optimization.

44 HostDesigner: a program for the de novo structure-based design of molecular receptors with binding sites that complement metal ion guests.
Timothy K. Firman, and Benjamin P. Hay, W. R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, PO BOX 999, Richland, WA 99352, Fax: 509-375-6631, Timothy.Firman@pnl.gov

To bring the powerful concepts embodied in de novo structure-based drug design to the field of coordination chemistry, we have devised computer algorithms for building millions of potential host structures from molecular fragments and rapid methods for prioritizing the resulting candidates with respect to their complementarity for a targeted metal ion guest. The result is HostDesigner, the first structure-based design software that is specifically created for the discovery of novel metal ion receptors. In this talk we describe the molecular structure building and scoring algorithms, and provide several examples to demonstrate their usage.

1:00 45 Collaborative eR&D - what is it and how do electronic notebooks fit into it ?
Rich Lysakowski Jr., The Collaborative Electronic Notebook Systems Association, 800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113, rich@censa.org

"Collaborative eR&D" is a new computing paradigm for scientific research, engineering, product development, and testing. Collaborative eR&D has two major aspects to it: 1) collaborative software environments, and 2) cultural support for collaboration with these tools. The software infrastructure or environmental aspect of Collaborative eR&D is that software applications have standardized, intelligently self-integrating interfaces. Software components in this paradigm may require some configuration, but no extra programming, to integrate into new business processes. Integration becomes a dynamic, end-user driven process, rather than one that requires custom coding. The cultural aspect of Collaborative eR&D beckons R&D teams and enterprises to use collaborative tools (collaborative electronic notebooks, meeting tools, and others) to be more effective and efficient. This talk will define and explain CENSA’s new work beyond “Collaborative Electronic Notebooks” to catalyze the markets to deliver “Collaborative eR&D” environments and their huge impact on the practice and productivity of Research and Development.

1:30 46 Components of Research Laboratory Notebooks Policy.
Sylvia C. Diaz, Knowledge Integration Resources, Bristol-Myers Squibb, P.O. Box 4000, Princeton, NJ 08543-4000, Fax: 609-252-6743, sylvia.diaz@bms.com

Records Management has long been a core function in a Pharmaceuticals' management of records. The management of the research laboratory notebooks and its ancillary supporting data is essential for establishing priority of invention, uphold the validity of a patent, and memorializing scientific practices and work. A good laboratory notebook policy sets the boundaries for preparing, signing, witnessing, protecting and storing the research notebooks. A thorough policy establishes the fundamentals and standards of good records management practices for the storage of the paper, hardcopy version of the research notebook. These same principles translate in to the electronic laboratory notebook world.

This presentation will outline the essential parts of a good research laboratory notebook management policy.

2:00 47 An E-Notebook success story, a roadmap for future trips.
Christopher J. Ruggles1, Jim Rizzi2, and Jorge Manrique1. (1) CambridgeSoft Corp, 100 CambridgePark Drive, Cambridge, MA 02140, Fax: 617-588-9190, cruggles@cambridgesoft.com, (2) Array BioPharma

A successful Electronic Laboratory Notebook in a drug discovery company inventing new small-molecule drugs through the integration of chemistry, biology and informatics, has been deployed. We report a methodology where legal, technological, and scientific issues were addressed.

Through the use of directed discussion, needs analysis, and process abstraction, many seemingly insurmountable problems were resolved. The result is that a fully functional Electronic Notebook has been deployed throughout the enterprise, and is acting as the primary repository of scientific data for Array BioPharma Inc., dovetailing appropriately with preexisting protocols. We believe that this methodology, when properly applied, is scalable to organizations of varied sizes and complexities. We report here the results of our implementation of this methodology, and explore suggestions for modifications to optimize the methodology for future implementation.

2:30 48 LabBook incorporated's eLabBook knowledge management solution.
Tom Tom Zupancic, LabBook, Inc, 2501 9th Street, Suite 102, Berkeley, CA 94710, Fax: 614-846-2243, thomas.zupancic@labbook.com

LabBook's eLabBook solution is a flexible integration system designed to facilitate knowledge management within an organization by simplifying the processes required to access, capture, organize and manipulate information. This capability creates an environment within the organization where knowledge is generated at an enhanced rate and captured with a high degree of efficiency. The eLabBook environment provides a versatile computer interface between people and information so that it becomes much easier to create a layer of "knowledge" (understanding, interpreted information, rationales for decisions, actions and plans) and to superimpose this layer on an organized information collection. The accessible, user configurable presentation and delivery of this knowledge integrates the intellectual assets of the organization and accelerates knowledge transfer. That is, the system by design makes organized, interpreted information widely and effectively available and actionable.

3:00 49 Roundtable discussion focused on implementation successes and issues for collaborative electronic notebooks and collaborative eR&D environments.
Rich Lysakowski Jr., Executive Director, Collaborative Electronic Notebook Systems Association, 800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113, rich@censa.org

This last session will be a Facilitated Roundtable Discussion focused on implementation successes and open issues using electronic notebooks, collaborative applications, standardized software application interfaces, agents, component integration tools and frameworks to tie together the many software packages in common usage in constantly changing R&D environments. This roundtable discussion will raise issues, identify the problems and introduce prudent paths forward for their elimination. It will provide a panel of experts for the audience to get many of their questions answered.

4:00 50 CINF Division Business Meeting.
Andrew Berks, Patent Dept, Merck & Co, RY 60-35, 126 E. Lincoln Ave, Rahway, NJ 07065, Fax: 732-594-5832, andrew_berks@merck.com

This is the open meeting for discussion of CINF business.

4:30 51 Open Meeting: Committees on Publications and on Chemical Abstracts Service.
Robert J. Massie, and Robert D. Bovenschulte, Director, Chemical Abstracts Service, American Chemical Society, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: (614) 447-3713, rmassie@cas.org, rbovenschulte@acs.org

This is an open meeting for the Committee on Publications and for the Chemical Abstracts Service.

8:00pm 52 Development of a polymer property database from traditional print products.
Maggie Johnson, Science and Engineering Libraries, University of Kentucky, 150 C/P Bldg, Lexington, KY 40506-0055, Fax: 859-257-4074, mjohnson@uky.edu, and Darla Henderson, John Wiley & Sons, Inc

The polymer community has for years depended on the value and reliability of data found in The Polymer Handbook, a print product containing data about polymers and their properties. Moving forward with Wiley’s chemical databases, we have developed a polymer property database from The Polymer Handbook, adding features such as the capability to search by full text or fielded searches, search the entire database for properties by polymer name, and search the entire database for polymers by property ranges. Additionally, cross-reference and linking capabilities have been added. This presentation will focus on the development and useability of this database to the polymer academic and corporate communities.

8:00pm 53 Teaching and learning of strucural organic chemistry with nomenclature/structure software.
Bert Ramsay1, Antony John Williams2, Andrey Erin2, and Robin Martin2. (1) Department of Chemistry, Eastern Michigan University, Ypsilanti, MI 48197, Fax: 734-487-1496, Bert.Ramsay@emich.edu, (2) Scientific Development, Advanced Chemistry Development

Many organic chemistry students have difficulty in determining and "seeing" the configuration about a stereogenic carbon presented in 2-d structures. A true understanding comes when these diagrams are converted to 3-D pictures or models that can be rotated to correspond to the diagram's perspective. Much of this confusion can be avoided if students would use Nomenclature/Structure software programs to compare 2- and 3-D renderings and names of chemical structures. A Student Guide to the Use of Nomenclature/Structure software has been developed for inclusion with ACD's ChemSketch and ACD/Name software. The Guide also helps students recognize the location and naming of functional groups.

8:00pm 54 Homogenizing analytical data from multiple vendors into a unified workspace.
Antony John Williams, Scientific Development, Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596, tony@acdlabs.com

Today a plethora of analytical techniques are used to characterize a particular chemical compound or material as it migrates from research and discovery through scale-up to manufacturing. These techniques include the multiple forms of spectroscopy and chromatography, hyphenated techniques and other analytical techniques that produce “curves” including electrochemistry and thermal analysis. The lifecycle of any particular compound can originate with spectra to identify the structure, chromatograms to separate the material and other technologies to characterize its performance. To date it has not been possible to manage all this associated analytical data, together with associated chemical structure information, in a single unifying interface and the need for an integrated system for processing and management of all associated data persists. This talk will provide an overview of how to address the diverse needs in processing and data management for multiple forms of analytical data and make the results available across an enterprise.

8:00pm 55 Aventis Competitor Tracking Database.
Christine Rudolph, DI & A Lead Generation Chemoinformatics, Aventis Pharma Deutschland GmbH, Industrial Park, Building G879, D-65926 Frankfurt/Main, Germany, Holger Heitsch, DI & A, Medicinal Chemistry, Aventis Pharma Deutschland GmbH, and Raul Munoz-Sanz, DI & A Information Solutions, Aventis Pharma Deutschland GmbH

Aventis Competitor Tracking Database

A Competitor Tracking Database has been designed to facilitate and accelerate the task of disease program chemistry experts to track the activities of Aventis' competitors. The arduous task of extracting information from online-publications and transferring the interesting details (text and structure) by manual selection and putting them into report documents has been replaced by a simple flagging selection procedure of relevant competitor records in a central raw data pool which is fed by our selected news providers (currently: IDDB3, Prous).

The system has been designed to be flexible enough not also to store and annotate the information from various providers but also to store the knowledge about our own compounds such that we can inspect them in a common view with the structures of our competitors. Annotations by mechanism, target, and target families with controlled vocabularies enable us to link this data repository with other sources of information within Aventis.

Currently, this database covers the following Aventis Pharma Frankfurt disease programs: thrombosis, osteoarthritis, heart failure, vascular disease, arrhythmia, diabetes, obesity and lipid disorders. We estimate that we may be covering upto 80% of the relevant competitor information by this tool, expecting to include more information providers in the future.

This application has been designed with standard client/server tools (ISIS/Oracle). In a second phase, the content of the database will be made available through a web front-end to enable the integration into Aventis information portals.

8:00pm 56 Knowledge management in the spectral laboratory.
Marie Scandone, Informatics Division, Bio-Rad Laboratories, Inc, 3316 Spring Garden Street, Philadelphia, PA 19104, Fax: 215-662-0585, marie_scandone@bio-rad.com, and Gregory M. Banik, Bio-Rad Laboratories, Informatics Division

In a spectral laboratory, knowledge management is the identification, collection and active management of analytical information. The goal is to make existing knowledge resources available to everyone and the effective management of that data. In managing analytical data, we have moved from the archiving and warehousing of spectral data to tools that help identify and evaluate information. This approach is necessitated by the business need to effectively analyze all available data as rapidly as possible to facilitate decision-making and to provide required information for regulatory compliance. There has been strong impetus, especially from the pharmaceutical industry, to share information from diverse analytical disciplines. This need has arisen from the realization that escalating costs for drug development dictate a “fail early, fail often” new paradigm. Some companies have come to realize that parallel efforts in analytical chemistry, for instance, the use of NMR and Mass Spectrometry, could have yielded earlier, more cost effective decisions on drug candidates if these data types could have been combined earlier into a single knowledge management system. As the amount of spectral data increases, so does the need for accessing, processing, and examining that data.

8:00pm 57 Molecular docking for generating peptides inhibitors for thrombin.
Cristina C. Clement1, Julian Gingold2, and Manfred Philipp1. (1) Chemistry Department, Lehman College and Biochemistry Ph.D. Program, City University of New York, 365 Fifth Avenue, New York City, NY 10016-4309, Fax: 212-817-1503, cclement_us@yahoo.com, (2) New Rochelle H.S

A promising method of rational drug design involves the molecular modeling of peptides or small molecules that might bind to the active site of a target protein. The goal of this investigation is to discover peptides that reversibly inhibit thrombin. The approach combines in silico docking using Sculpt (from MDL) with automated chemical synthesis of candidate compounds using standard Fmoc chemistry. Initial molecular docking experiments were used to generate candidate compounds (with both L- and D- amino acids) that were characterized by predicted free interaction energies that range from –20 to -50 kcal/mol. Candidate competitive inhibitors were selected from two classes of sequences: X-Pro-Arg-dPro-Y and X-dPhe-Pro-dArg-Y. The experimental results showed that D-Phe-Pro-D-Arg-Gly-Asp and D-Phe-Pro-D-Arg-Gly-Asn have Ki values of 156 µM and 112 µM, respectively. D-Phe-Pro-D-Arg-Gly has a Ki of 6 µM. A library of tetrapeptides with other L- and D-amino acids at P1’ position (Y=P1’) is under study.

8:00pm 58 Visualization of results in markush structure database searches.
Andrew H. Berks, Merck & Co, 126 E. Lincoln Ave RY60-35, Rahway, NJ 07065-0900, Fax: 732-594-5832

Visualizations of Markush structures in Markush database search results is problematic because results are often complex and difficult to interpret. This talk presents a method for representing Markush structures in database search results, involving overlaying a representation of the query structure on the search results, and providing a Markush analysis for each database hit, so that each substituent in the database record that corresponds to a part of the query structure is displayed in a distinctive manner, for example by using colors, in the overlaid query structure.

9:05 59 Digging Deeper: from holes in cards to whole structures - indexing chemistry at Derwent.
Peter Norton, (retired), 17 Woodstock Road, Balby, DN4 0UF Doncaster, England

This paper gives the author’s personal reminiscences about the trials and tribulations involved in the evolution of the various Derwent retrieval systems, beginning with the Farmdoc codes, which provided simple manual and punch card retrieval of Pharmaceutical and Veterinary patents. It then moves on to the extension of coverage to non-patent pharmaceutical literature (RINGDOC) and the various patent services (Agdoc, Plasdoc, Chemdoc, CPI and WPI). The paper concludes with the author’s involvement in the start-up of the Markush DARC graphics retrieval system.

9:35 60 Polymer searching: a capability in progress.
Stuart M. Kaback, Information Research & Analysis, Research Support Services, ExxonMobil Research & Engineering Co, 1545 Route 22 East, Annandale, NJ 08801, Fax: 908-730-3230, stuart.m.kaback@exxonmobil.com

From time to time this speaker has had the privilege of reporting to a session of the ACS Division of Chemical Information on the capabilities and shortcomings of systems used to search for information about polymers. One notable instance was the 1984 Herman Skolnik Award Symposium honoring Monty Hyams. Another was a 1991 symposium on Polymer Information Storage and Retrieval. This presentation examines progress that has been made, and points to areas in which further advances would be desirable.

10:05 61 Polymer indexing by IFI – past, present, and future.
Harry M. Allcock, and Darlene Slaughter, IFI CLAIMS Patent Services, 102 Eastwood Road, Wilmington, NC 28403, Fax: 910-392-0240, allcock@ificlaims.com, darlene.slaughter@aspenpubl.com

IFI has been indexing polymer chemistry in US patents since 1955, and since that time has developed a powerful retrieval system for polymers. Patent searchers currently use the IFI polymer indexing and associated search tools to locate polymers by structure, modification, and component monomers. Future enhancements to the system will be driven by searchers’ needs, and IFI’s intellectual and technological solutions to those needs.

10:35 62 Broadening horizons, sharpening the focus: The challenges of searching multiple datasets to obtain focused recall.
Richard W Neale1, Steve Hajkowski2, Linda Clark3, and Gez Cross1. (1) Product Development Group Chemistry & Life Sciences, Derwent Information UK, 14 Great Queen Street, Holborn, London, United Kingdom, Fax: +44 207 344 2911, richard.neale@derwent.co.uk, (2) Online Training Department, Derwent Information, (3) IT R&D Group, Derwent Information UK

The chemical industry continues to be one of the largest investors in R&TD. In today’s market place the R & TD budget can extend beyond the £1million per day value. It is therefore imperative that patented inventions are not duplicated. With R&TD spends continuing to spiral upwards the industry has become dependant on the provision of precise patent information to aid the development of effective R&TD strategies.

As the Information Professional’s requirements broaden, the information provider must evolve to meet the user needs. Searching chemical structure and text data in combination has become a necessity, for accurate retrieval and to limit results within larger databases. This paper will examine combination search approaches currently used in the chemical information industry and investigate how Thomson Scientific as an information vendor is developing future products and content with the combination search in mind.

63 Chemical patent indexing and Gresham's Law.
Edlyn S. Simmons, SourceOne-Business Information Services, Procter & Gamble Co, 5299 Spring Grove Ave., Cincinnati, OH 45217, Fax: 513-627-6854, simmons.es@pg.com

For many years, fragmentation coding was the gold standard of patent indexing. Fragmentation coding schemes, such as the one applied to Derwent's Chemical Patents Index, are applied to both specific and generic chemical structures and serve as keys to retrieval of documents through searches for chemical structures or substructures. By providing codes for structural fragments, they allow the searcher to find molecular structures rather than chemical names.

In recent years, value-added patent databases have been joined by many databases for which indexing is generated automatically from the original text. Gresham's Law tells us that bad money drives out good money. As searchers begin to substitute full text searching for the use of value-added indexing, they lose the capacity to search for chemistry expressed in Markush structures and other structural diagrams. If this is true, Gresham's Law may also tell us that bad indexing drives out good indexing.

64 Chemical structures and reactions in CAS databases – searching for prior art.
Matthew J. Toussant, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-150, Fax: 614-447-3906, mtoussant@cas.org

Chemical information in CAS databases takes many forms. Structure information is one form that links many databases through a connection table identifier system, the CAS Registry Number. The nature of prior art information in the CAS Registry, CASREACT, and CHEMCATS databases will be described, and the pivotal role of the CAS chemical identifier system in linking those collections will be detailed. Further, the MARPAT database will be examined. CAS approaches to covering chemical information and the effect of these approaches on efforts to create exhaustive prior art collections, including from patents, journals, chemical supply catalogs, and web disclosures, will be assessed.

65 Biotechnology patent searching: past, present and future.
Sandy Burcham, Service Is Our Business, Inc, 111 Lincoln Terrace, Norristown, PA 19403-3317, Fax: 610-630-0863, cass123@earthlink.net - SLIDES

In the last 2 decades, the importance of biotechnology has increased dramatically, moving from straightforward enzyme catalysed reactions to the complexities of the human genome project. Similarly, the application of biotechnology has spread from simple fermentation processes to many complex previously non-biological technologies. During this time the number of biotechnology patents has also increased dramatically as organisations have sought to protect their research and discoveries.

To cope with the increasing importance of biotech and the increasing volume of patent and journal literature, various abstracting and indexing services together with software suppliers and online hosts, have developed resources providing increasingly powerful retrieval and display capabilities.

This paper will discuss the searching of biotech patents - where we were, where we are and where we seem to be going.

66 Back for the future: making coding cool.
Gez Cross, and Katharine Hancox, Product Development Group Chemistry & Life Sciences, Derwent Information UK, 14 Great Queen Street, Holborn, London, United Kingdom, Fax: +44 207 344 2911, gez.cross@derwent.co.uk

The chemical indexing systems introduced at Derwent by Peter Norton have been used for many years by Information Professionals to retrieve chemical information from the patent literature. When structure indexing of Markush compounds was made available, discontinuation of the structural codes was proposed – and strongly opposed by professional searchers. However, despite the introduction of software to help generate the strategies for searching these codes, they remain a tool used mainly by experienced, professional patent searchers.

With the advent of inhouse and online browser based information retrieval tools, a new generation of information users has arisen – scientists, who formerly relied on IPs for their searching requirements. To encourage these new users, intuitive, user-friendly interfaces have been created, which have further raised the expectations of both old and new users. This paper will examine attempts to bring the older, code-based systems into the internet era with new user-friendly tools and interfaces.

9:00 67 Developing HT Information Systems, a modular design.
Steve Coles, Database Applications Developer, Tripos Receptor Research, Bude-Stratton Business Park, Bude EX23 8LY, United Kingdom, Fax: +44 1288 359222, stcoles@tripos.com

It is possible to develop information systems for high-throughput design, chemistry, analysis and purification by incorporating a modular approach using best of breed scientific and information technologies. Working iteratively in close collaboration with users of the system it is possible to streamline integration projects, reconcile process issues, and provide customer-facing support. A modular approach encapsulates domain knowledge, permits easier introduction of new modules and increments, and can be shared between different applications.

9:30 68 Automating Library Design.
Mark J. Duffield, and Kevin Daniels, EST Lead Informatics, AstraZeneca R&D Boston, 35 Gatehouse Drive, Waltham, MA 02451, Fax: 781-839-4580, mark.duffield@astrazeneca.com

The library design process is generally performed differently by every participant. Each chemist has a number of "favorite" parameters with which to evaluate a potential library. The process usually involves a large number of manual steps including the reformatting, collating, and integration of data from disparate sources. This process is time consuming and requires the chemist to perform complex computing tasks, often across multiple environments. The end result is that the chemist must spend significant time away from the bench planning their library.

This session will summarize our work in the area of streamlining the library design process through automation. We will describe our library design workflow and present the details of how we have automated many of the steps in the process. The chemist is now able to get the computational aspects done side by side with the actual synthetic work, while maintaining control over the end result.

10:00 69 On a new model for cheminformatics: Learning the classes of compounds.
Dmitry Korkin, Faculty of Computer Science, University of New Brunswick, 540 Windsor St., Fredericton, NB E3B 5A3, Canada, dkorkin@unb.ca

We have outlined a radically new approach to cheminformatics called ChemETS model. It is based on the first general formalism for structural (or symbolic) object representation and classification proposed by us, called the evolving transformations system (ETS) framework. The main central features of the ETS framework are: 1) a new structural form of class representation that can be constructed (and modified) inductively and 2) a new structural form of object representation, which incorporates the constructive (or synthetic) history of object and is directly related to the above representation of the corresponding class of objects (containing this object).

I will, first, outline the basic principles of the ChemETS model, together with the central problem of inductive approach to cheminformatics and computer-aided drug design (CADD). Then, I will discuss the application of the ChemETS model to the basic problems in cheminformatics and CADD, such as virtual lead discovery, design and screening of virtual combinatorial libraries of compounds, and others. In particular will be discussed: construction of the class of androgene-like compounds (based on a small set of known androgenes), construction of the new androgene-like compounds (based on the above class representation), and the resulting classification of compounds as either belonging or not to this class.

10:30 70 Choosing the proper grid resolution for cell-based diversity estimation.
Dmitrii N. Rassokhin, and Dimitris K. Agrafiotis, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Drive, Exton, PA 19341, rassokhin@3dp.com

Although cell-based methods are becoming increasingly popular for diversity analysis, the choice of grid resolution is still guided primarily by intuition and lacks any theoretical or empirical support. Here we present a systematic analysis of several typical chemical data sets, and propose a simple technique for identifying a suitable bin size for cell-based diversity estimation using an algorithm inspired from the field of fractal analysis. We demonstrate that the relative variance of the diversity score as a function of resolution exhibits a characteristic bell shape that depends on the size, distribution and dimensionality of the data set under consideration, and whose maximum represents the optimum resolution for a given data set. Even though box counting can be performed in an algorithmically efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.

11:00 71 Quantification of drug-likeness and similarity for combinatorial follow-on libraries.
Mark J. Rice, Ryan T. Weekley, and Paul A. Sprengeler, Structural Group, Celera Therapeutics, 180 Kimball Way, South San Francisco, CA 94080, Fax: 650-866-6654, mark.rice@celera.com

Striking a balance between good physical properties and similarity to the initial hit often poses a problem in the design of follow-on libraries. Good physical properties are needed to improve both ADME characteristics and drug-likeness, while similarity is needed to maintain an adequate pharmacophore for binding. These requirements are often at odds and difficult to quantify. Therefore, we have developed a site-specific fingerprint based on chemical graph theory as a basis for sidechain similarity. We have also developed a continuous drug-likeness metric, using multivariate statistical analysis. We combine these measures to suggest sidechain selection and more efficiently develop follow-on libraries.

11:30 72 Predicting generic methods and retention times for high-throughput chromatography.
Daria Jouravleva, Scott Macdonald, Michael McBrien, and Eduard Kolovanov, Advanced Chemistry Development, Inc, 90 Adelaide St.West, Suite 600, Toronto, ON M5H 3V9, Canada, daria@acdlabs.com

In experimental validation of combinatorial libraries, speed and high-throughput are the key. For chromatographic separation or LCMS of the newly synthesized compounds, generic chromatographic methods have been designed to accommodate a widest possible diversity of samples. However, when the sample is not suited to the method, costly instrument downtime slows the analytical process, and often results in rejection of the whole plate or series of compounds. New ACD/ChromGenius software will advise if methods are viable, and select between available multiple methods. This presentation describes retention time and method selection algorithms used to power the new software computational tool, as well as physicochemical parameters used to model the chromatographic separation.

9:00 73 Copyright and the EU Database Directive: Issues for chemistry.
John R. Rumble Jr., Office of Measurement Services, National Institute of Standards and Technology, 100 Bureau Drive MS 2310, Gaithersburg, MD 20899-2310, Fax: 301-926-0416, john.rumble@nist.gov

The computerization of scientific information continues to change the scientific communication process. As we approach the end of the first decade of the Internet era, ownership issues still loom large with respect to the communication process itself as well as the economics of the process. In this presentation, I will review some of the issues related to traditional ownership of authored material (copyright) as well as new ownership rights (sui generis) as created by the European Union. Both rights are under review, and possible changes could affect the communication process in many ways. This talk also provides an introduction to more detailed talks on this subject later in this session.

9:30 74 Pressures on the public domain in scientific data and information.
Paul F. Uhlir, Office of International S&T Information, The National Acacemies, 2101 Constitution Avenue NW, Washington, DC 20418, Fax: 202-334-2231, puhlir@nas.edu

The public domain in scientific and technical data and information (STI) is massive and has played a major role in the success of the research enterprise in the United States. The "public domain" may be defined in legal terms as sources and types of data and information whose uses are not restricted by statutory intellectual property regimes or by other legal constraints, and that are accordingly available to the public without authorization. Various legal, economic, and technological pressures in recent years have narrowed the scope of the public domain in STI, with poorly understood and perhaps significantly under-appreciated consequences to our nation's preeminent research capabilities. This presentation will discuss the background of public-domain information in research and review some of the many constraints that are being placed on open access to and use of such resources.

10:00 75 IPR and modern scientific society publishing.
Eric S. Slater, Publications Division, Copyright Office, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112, e_slater@acs.org

This presentation will provide basic information about United States Copyright Law and its application to modern scientific publishing. Included will be the major issues surrounding publishing today such as protecting content against piracy, protecting works that appear online, and how recent court decisions have shaped the copyright landscape.

10:30 76 Copyright and the information industry.
Dan Duncan, Executive Director, NFAIS, 1518 Walnut Street, Suite 307, Philadelphia, PA 19102, Fax: 215-893-1564, danduncan@nfais.org

A review of major developments in copyright and related law, with particular emphasis on U.S. activities, that are of special importance to informaton database providers. The presentation will focus on how policy developments may affect the delivery and use of online information databases.

11:00 77 Database protection and academic research.
Harlan J. Onsrud, Department of Spatial Information Science and Engineering, University of Maine, 5711 Boardman Hall, Room 340, Orono, ME 04469-5711, Fax: 207-581-2206, onsrud@spatial.maine.edu

Many economic and legal scholars argue that the current, relatively open, access to data environment in the United States is beneficial to advancing knowledge and the economy. If so, the traditional method of scientific advancement by extending from and building upon the data and works of others may be substantially burdened if the U.S. moves to a database protection legal environment similar to that instituted recently throughout much of Europe. This talk explores evidence to date of the effect of the European Database Directive including its effect on scientific and technical databases. Provisions of the Directive and the implications for expanding or constraining scientific discourse are discussed. Likely responses of the scientific community to similar legislation in the U.S. are hypothesised. Several alternatives for working around such a default law are suggested and several illustrative examples already being pursued are highlighted.

11:30 78 An academic chemist looks at copyright.
S. Scott Zimmerman, Department of Chemistry and Biochemistry, Brigham Young University, C205 BNSN, Provo, UT 84602-5700, Fax: 801-422-5474, scott_zimmerman@byu.edu

Most academic chemists think little about copyright issues. They treat copyrighted materials like their mentors and colleagues do, often without questioning the legality of their actions. But academicians should know the answers to a few common copyright questions, for example: Can I photocopy book chapters and research papers for my personal files? Can I photocopy these materials, include them in a course packet, and pass them out to my classes? Can I use copyright materials in my PowerPoint presentations at meetings and in classes? When my students write a paper describing research done in my laboratory, who owns the copyright? Can my students publish research results in theses and dissertations, and then publish the same materials in a journal? If I prepare and publish a graph in a journal article, can I re-publish the same graph in another journal or review article? Can I post my published research papers on my Web page? This presentation will try to answer these and other questions about copyright in academia.

79 Integration of Combinatorial Chemistry Analyses with Other Relevant Information.
Jeff Saffer, OmniViz, Inc, Two Clock Tower Place, Suite 600, Maynard, MA 01754, saffer@omniviz.com

Today's chemist deals with very large collections of information from diverse sources. Integration of the analysis of textual information (patents or scientific literature), high throughput screening results, structures, descriptors and fingerprints is prerequisite for the comprehensive understanding required for improved decision-making. One of the best instruments for this integration is the human mind, but this can only be fully engaged when the diverse information is presented in a context that is easy to assimilate. To this end, we have developed a visualization framework that integrates analysis of experimental and computational data with conceptual analysis of textual information while maintaining the data in the context in which it was generated. Tools enabling exploration across the multiple data types and detailed exploration within specific data types increase understanding and decrease the time required to reach decisions. The application of these approaches to very large (hundreds of millions of data points) chemistry data sets will be discussed in the context of discovery research.

80 Barriers to effective integration in chemical experiment management software.
J. Christopher Phelan, Marketing, MDL, 1550 Bryant St., Suite 739, San Francisco, CA 94103, Fax: 415-252-8610, phelan@mdli.com

During the past twenty years, computers have become ubiquitous in chemical research, for instrument control and for data collection, management, and analysis. However, despite a pressing need, general software solutions that integrate these functions are not yet widely available. We present an analysis of several significant obstacles to the implementation of effective integrated software solutions in the chemical experiment management arena. Specific topics will include: compartmentalization of domain specific expertise, lack of a consistent data model for chemical information beyond simple structure data, idiosyncratic workflows in the research environment, and complexity issues in design and architecture.

81 Application of statistical design tools for improved efficiency in chemistry development for high-throughput parallel synthesis.
Jean E. Patterson, and Robb Nicewonger, Department of Library Optimization, ArQule, 19 Presidential Way, Woburn, MA 01801, Fax: 781-994-0677, jpatterson@arqule.com

Although there are multiple techniques to select structurally diverse subsets of virtual library products, there remains a need for a practical method to identify reagents that represent the range of reactivity needed to build a library. Chemical intuition has been the predominant driver for selection of such reagents, but it has a number of shortfalls. Chemical intuition is not consistently predictive, it is not an automated process, and it is not possible to quantitatively describe the process to enhance the chemistry development of future projects. This presentation will focus on ArQule’s statistics-based approach to the selection of experimental test reactions using a commercially available software package from Umetrics. Identification of chemical descriptors that most closely describe reagent reactivity using multivariate statistics followed by experimental design techniques to choose a diverse sampling of reagents that represents the reactivity of the entire virtual library will be described.

82 Library design using multi-dimensional SAR analysis: Incorporating structure-based predictions.
Carleton Sage, Kevin Holme, and Manish Sud, Cheminformatics Research, LION Bioscience Inc, 9880 Campus Point Drive, San Diego, CA 92121, carleton.sage@lionbioscience.com

After screening results are available for a compound library, SAR analysis is often used to determine which R-Groups add favorably to activity. After a chemical core and R-groups positions are specified, SAR analysis involves identifying R-Groups and generating a SAR table. We have implemented a system that takes this analysis one step further. In addition to activity data, we have integrated structure-based models to predict the ADME and specificity properties of compounds and have developed methods to simultaneously consider multiple properties in R-Groups analysis. A critical component of these analyses is the number and weighting of the properties when they are combined and how changes in these parameters affects the final prioritization of compounds and R-Groups. We will present results from using different strategies for simultaneous parameter combination.

83 Use of recursive partitioning/simulated annealing (RP/SA) for mining combinatorial libraries.
Paul Blower, LeadScope, Inc, 1245 Kinnear Rd, Columbus, OH 43212, pblower@leadscope.com, and Petr Kocis, Enabling Science & Technology, Chemistry, AstraZeneca R&D Boston

Recursive partitioning is a powerful tool for mining large, diverse data sets encountered in drug discovery. It is useful for explaining a complex, nonlinear response, and it can handle very large descriptor sets with continuous, discrete, or categorical variables. At each node, we use simulated annealing to optimize several variables simultaneously and find good combinations of descriptors. The search is incorporated into a recursive partitioning design to produce a regression tree on the space of descriptors. We used RP/SA for mining combinatorial libraries to identify combinations of structural features and reaction parameters that give superior yields. In this talk, we will describe statistical techniques used in this new method and illustrate its application in mining a combinatorial library.

84 NMR Prediction Software and Applications to the Screening of Combinatorial Libraries.
Antony John Williams, and Sergey Golotvin, Scientific Development, Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596, tony@acdlabs.com

Coupling automation with flow NMR technology now allows NMR spectra to be acquired on materials populating a combinatorial plate in only a few hours. This routine acquisition of large amounts of spectral data can indeed increase the rate of throughput for such analyses but the technology can lead to an inordinate amount of data with no appropriate manner to track and database the information in a facile manner. We will present software which allows the user to process NMR data directly from the spectrometer and display in a 96 well plate format. H1 NMR prediction algorithms allow spectra to be generated for each of the suggested structures and displayed on screen for direct visual comparison with the experimental spectra. Verification algorithms for matching experimental and predicted spectra can be performed based on the differences in shifts, integrals and multiplicities between the spectra.

2:00 85 Computational proteomics: Genome-scale analysis of protein structure, function, & evolution.
Mark Gerstein, P Harrison, J Qian, R Jansen, V Alexandrov, P Bertone, R Das, D Greenbaum, W Krebs, Y Liu, H Hegyi, N Echols, J Lin, C Wilson, A Drawid, Z Zhang, Y Kluger, N Lan, N Luscombe, and S Balasubramanian, MB&B Department, Yale University, Bass Building, 266 Whitney Avenue, New Haven, CT 06520, Fax: 360-838-7861, Mark.Gerstein@yale.edu - SLIDES

My talk will address two major post-genomic challenges: trying to predict protein function on a genomic scale and interpreting intergenic regions. I will approach both of these through analyzing the properties and attributes of proteins in a database framework. The work on predicting protein function will discuss the strengths and limitations of a number of approaches: (i) using sequence similarity; (ii) using structural similarity; (iii) clustering microarray experiments; and (iv) data integration. The last approach involves systematically combining information from the other three and holds the most promise for the future. For the sequence analysis, I will present a similarity threshold above which functional annotation can be transferred, and for the microarray analysis, I will present a new method of clustering expression timecourses that finds "time-shifted" relationships. In the second part of the talk, I will survey the occurrence of pseudogenes in several large eukaryotic genomes, concentrating on grouping them into families and functional categories and comparing these groupings with those of existing "living" genes. In particular, we have found that duplicated pseudogenes tend to have a very different distribution than one would expect if they were randomly derived from the population of genes in the genome. They tend to lie on the end of chromosomes, have an intermediate composition between that of genes and intergenic DNA, and, most importantly, have environmental-response functions. This suggests that they may be resurrectable protein parts, and there is a potential mechanism for this in yeast.

2:30 86 Federated databases: The next level.
Peter M. Smith, Discovery Research Applications, Wyeth Ayerst Research, CN 8000, Princeton, NJ 08543, Fax: 732-274-4733, smithp@war.wyeth.com

Accessing data across diverse databases is a major issue in Pharmaceutical research, and several solutions have been proposed. They range from the creation of large data warehouses to a federation of separate databases. In this talk we will present a new approach to the federated data model, based on distributed computing and a network-centric applications server engine. It is based on Java components, J2EE servers, and Oracle data sources. By moving the business logic to a middle tier, a new level of generalization can be realized which provides flexible, adaptable, and richly functional access to the various data sources. For example, in scientific areas, the databases we need to federate can include chemical structures, reactions, biological activity results, proteins, and genetic sequences. A set of “rich objects” in the middle tier can map these complex data types and be queried to provide a cross-database view. The practical implementation of this model will be discussed in the cheminformatics domain. A demo of such a system will be given.

This technology is also the foundation of the next generation of scientific applications. It provides modular, “plug-and-play” functionality. The implications of this new approach for scientific software development will be discussed.

3:00 87 Practical meta data solutions for the large data warehouse.
Tom Gransee, Paul Vosters, and Ronda Duncan, Knightsbridge Solutions LLC, 500 W. Madison Street, Suite 3100, Chicago, IL 60661, Fax: 413-669-2358, rduncan@knightsbridge.com, rduncan@knightsbridge.com - SLIDES

For enterprises with large data warehouses, implementing a comprehensive meta data solution can seem like a formidable task. There are no industry standards and no off-the-shelf tool suites that can meet all of an enterprise's meta data objectives. However, by carefully gathering requirements, mapping them to meta data sources, and choosing a solution that achieves the right balance between standardization and customization, an enterprise can develop and approach to meta data that meets its business and technical needs. Enterprises that implement successful meta data solutions will benefit from reduced development costs, user acceptance of the data warehouse, and the ability to make faster business decisions.

3:30 88 So you have a data warehouse - Now What?
William Langton, Ramesh Durvasula, and Julie Pitney, Software Consultant Manager, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314 647 9241, jamih@tripos.com

In the current informatics-enabled research environment, drug companiesoften pursue data warehousing as a one-stop solution to data management. However, data warehousing alone may not return the value expected. We believe proper tools and processes are essential for leveraging the warehouse to extract meaningful relationships and create knowledge. In this talk, several examples of such tools and processes will be presented based on Tripos' experience in designing and deploying successful informatics systems.

4:00 89 Using OLAP and data mining technologies for trending, knowledge discovery, and collaborative commerce.
Jane Griffin, Data Management Group, Arthur Andersen, LLP, 225 Peachtree Street, Suite 1800, Atlanta, GA 30303, Fax: 404-954-7980, s.jane.griffin@us.andersen.com - SLIDES

Data mining tools can facilitate knowledge discovery and construction of predictive models that reveal new opportunities across the value-chain and facilitate greater knowledge of customers. This presentation will cover how to extend Business Intelligence beyond traditional boundaries to: 1)Discover and pursue new business opportunities 2)Enhance the relationships between value-chain partners 3) Using real-time intelligence to monitor the value-chain Architectures required to build predictive models to: a) Enhance revenue growth, customer and product profitability b) Recognize fraud and take immediate action.

9:05 90 Challenges of information provision in a dynamic genomic landscape.
Rachel V. Buckley, Head of Product Development (Life Sciences), Derwent Information, 14 Great Queen Street, London WC2B 5DF, United Kingdom, rachel.buckley@derwent.co.uk, and Giles Stokes, Product Manager (Life Sciences), Derwent Information

Significant intellectual and monetary investment in therapeutic and diagnostics research means that keeping completely up to date with technology trends is critical to both scientific and commercial success. As well as the publication of increasingly large sequence patents and the questions raised of the patenting strategies that lie behind these activities, another challenge posed by the growth of genomics is simply the new kinds of information available and the ability to track and monitor business information in a such highly dynamic industry. Another impact has been the changes to searching skills required as the “-omics” has meant the introduction of a new scientific language - new technical terms, new subject areas and new relationships. The post genomic landscape continues to change. We believe there will be a need for more attention to detail - greater focus on qualitative information, and the need for the interrelation of different data sources. Information providers will need to accept the requirement for better linking between sources to allow thought processes to flow more naturally in research. Information provision will need to keep up with the developments of a rapidly evolving discipline.

9:35 91 Integration of genomic, biological, and chemical data in drug discovery.
Thomas Laz, Bioinformatics, Schering-Plough, 2015 Galloping Hill Rd, Kenilworth, NJ 07033, thomas.laz@spcorp.com

There are many database mining strategies in place at the Schering-Plough Research Institute (SPRI) that are being used to support programs in the therapeutic areas. The common element in these mining strategies is that they generate lists of human genes that, while satisfying the initial criteria of the mining strategy, require further prioritization before the initiation of detailed biological evaluation. To accurately evaluate these projects, scientists must be able to analyze large amounts of disparate information relevant to the gene sequences under investigation. To facilitate the accumulation and analysis of this wide range of data at SPRI, the Bioinformatics group has developed the Discovery Data Library (DDL). We began with the optimized catalog of human genes and then developed a strategy to locate and integrate genomic, chemical, and biological information. We have developed a Web Browser-based user interface that allows the database to be queried in a variety of ways and delivers the results in a concise and comprehensive set of reports. The DDL allows SPRI scientists to rapidly obtain information on human genes in a context that is most relevant to their drug discovery programs.

10:05 92 Genomics gorilla....handling sequence overload.
Dr. Bernard French, Manager, Molecular Biology and Genetics, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: 614-447-3713, bfrench@cas.org, Dr. Balvinder Sidhu, Product Manager, Life Sciences, Chemical Abstracts Service, and Eileen M. Shanbrom, Marketing, CAS

Advances in the genomics arena has spurred an impassioned debate on the daunting challenge to manage sequence information overload. Questions concern how to process the explosion of genomic data, including intellectual analysis, packaging, and delivering the information in an efficient manner. Information providers should not only keep changing to an ever evolving bioinformatic needs, but also design and maintain databases which can handle complex and large sequence data sets. CAS has created a digital research environment for genomic information, and accomplishes this by creating algorithmic sequence feeds from public data sources including patents, integrating sequence data from multiple data sources, creating intellectual content, and providing value-added search tools. More and more, a solid foundation for the building of a comprehensive digital research environment rests on bridging biology, chemistry and information technology.

10:35 93 Managing and providing biosequence information in the STN host environment.
Ilka Schindler, and Rainer Stuike-Prill, FIZ-Karlsruhe, Karlsruhe, Germany, ilka.schindler@FIZ-karlsruhe.de

Biosequence information as part of intellectual property has become a rapidly developing field recently. The number of patent publications containing sequence information has increased substantially as well as the number of sequences published with a single publication. The almost exponential increase of published biosequence data poses a challenge for any supplier of such information. FIZ Karlsruhe in its role as STN Europe has established itself as provider of biosequence information. Our objectives as a host are to provide comprehensive, high-quality information implemented in a unified and integrated way. The provision of biosequence information in an host environment requires specific solutions with respect to the retrieval system, the user interfaces, and the integration with other related databases. In particular the sophisticated homology search functionality needed to be enhanced to provide the excellent performance required by our customers. Furthermore, questions related to the appropriate hardware equipment, database systems and update procedures needs to be addressed.

11:05 94 Multiscale hierarchical classifications of genes for genomics HTS analysis
Chihae Yang, and Limin Yu, LeadScope, Inc, Columbus, OH 43212, cyang@leadscope.com

Although the application of high-throughput technologies to genomics has greatly increased the amount of available information, it has not yet led to dramatic increases in productivity in the drug discovery process. The challenge of inferring biological target information is formidable given the size of the data set. Conventional data handling techniques include clustering of the gene sets for sub-categorization and mapping the classifications for visualization. In this paper, a unique gene hierarchy, based on annotations from the Gene Ontology Consortium, is used to differentiate gene expression patterns of various cell types. The gene hierarchy dynamically queries individual genes for classification and annotation based on a relational database for gene, EST, mRNA, clones, protein, enzyme, receptor, and pathways. The gene family classes by hierarchical classification correlate gene functions to expression levels. The result from the biological hierarchy analysis is compared to other computational methods for extracting subsets of genes for differentiation. For example, the same expression patterns were also differentiated using a recursive-partitioning (RP), a well-known tree splitting method for classifying complex non-linear data. A novel approach is presented in which the hierarchical classification is used to provide a rational gene order for subsequent multiscale principal component analysis (MSPCA), which is essentially an integration of wavelet and PCA methods. Identifying subsets of genes is discussed in the context of identifying specific targets in the drug discovery process.

11:35 95 Management, integration and cross-referencing of genomic information.
Anthony Caruso, LION bioscience Inc, 141 Portland Street, 10th floor, Cambridge, MA 02139, Fax: 617 245 5401, acaruso@lbri.lionbioscience.com

Through the genomics revolution the quantity of data generated in the biological sciences has been clearly overwhelming. However, these data are in many cases redundant, ambiguous and of varying quality. In addition, with the accelerated data output it is quite often that they end up in data graves. Some data sources are known for their high quality, but they also tend to be of low relative quantity, whereas other sources are of lower quality but are much more plentiful. When such data are parsed and organized with various filtering and management techniques, all of these data can be merged, presented and interpreted in a much more useful manner. For example, the use of low throughput, high quality PCR based expression data can be used to help validate lower quality, high throughout gene-chip expression data. Another, much simpler example, associating the highly regarded SwissProt protein data with the first-pass sequence reads of dbEST to look for alternatively spliced forms of proteins of interest. We've developed an integration scaffold coupled with a management and decision support system to make better sense of the data in the early, yet crucial stages of the drug discovery pipeline, target identification and target validation.

9:00 96 Life after the lab (or how to never leave university).
Patricia E. Meindl, Department of Chemistry, University of Toronto, 80 St George St., Toronto, ON M5S 3H6, Canada, Fax: 416-946-8059, pmeindl@chem.utoronto.ca

Are you scared of the librarians at your university library? You shouldn't be! The main job of a good librarian is make sure you find the information you need. The stereotype of the prissy old librarian who only goes "Shhhhhhh..." doesn't fit our new electronic age. No longer is it just a matter of pointing people to the stacks. With such a wealth of information, the librarians must also have some subject knowledge to help guide them. My chemistry degree has proven invaluable to me in answering questions on subjects as diverse as clinical medicine to high-energy physics. As an academic librarian the focus is slightly different than in industry. We are teaching students how to find the information they need. The types of patrons range from high school students with vague requests to undergrads and graduate students with more specific needs and short timetables to faculty who have very detailed queries and no time at all. Learning to meet all these needs and keep up with all the new resources available is a very challenging but rewarding task.

9:30 97 What to expect in a small corporate R&D library.
Scott C. Boito, North American Library, Rhodia, Inc, 259 Prospect Plains Rd, Cranbury, NJ 08512-7500, Fax: 609-860-0165, scott.boito@us.rhodia.com - SLIDES

Are you adept at finding literature references that your colleagues can't? Do you love chemistry, but feel as comfortable in the library as you do in the lab? Making the jump from bench chemist to information specialist is a dramatic transition, but the rewards are many and your career can be very fulfilling if you commit to the change. I will discuss a little about my conversion to the information profession, including some of the reasons I chose to and how I did it successfully. I will also try to highlight some of the differences between academic and corporate libraries to help you decide on your correct path. The goal of the talk is to give some idea of what to expect in your new exciting career and how to prepare for it.

10:00 98 Chemical information careers in the government.
John R. Rumble Jr., Office of Measurement Services, National Institute of Standards and Technology, 100 Bureau Drive MS 2310, Gaithersburg, MD 20899-2310, Fax: 301-926-0416, john.rumble@nist.gov

The United States Government is intimately involved in virtually every aspect of the chemical sciences. It funds research and developement, operates laboratories, manufactures a wide variety of chemical substances, issues chemical-related regulations and maintains large chemical databases. In all these efforts, modern chemical informatics plays an important role. The need for chemical information specialists within the government has never been higher. In this talk, I will describe some of the career opportunities that are available to chemists and chemical informatics experts within the government, with emphasis on emerging needs for the future.

10:30 99 Searching Patents: Background, careers and the future.
Ron Kaminecki, Dialog, Suite 2930, 180 North LaSalle Street, Chicago, IL 60601, Fax: 312-726-3550 - SLIDES

Patent information involves the best of chemistry and the law. Patents are legal documents that are made to be defended in court and are thus written with that intent in mind. Patents also contain in-depth technical discussions that incorporate the leading edge of chemistry though written under the rules of statutory and case law. Thus, searching the prior art involves the best of chemical and legal skills to find the appropriate information to obtain, enforce, or invalidate a patent. This session will cover the skills and background that are needed in the search profession, the career path and typical role in industry, and the typical salary and expectations of patent search professionals.

11:00 100 Creating content and selling it: a career in publishing.
Kristina Kurz, Thieme Publishers, 333 7th Avenue, New York, NY 10001, Fax: 212 947 1112, kkurz@thieme.com

Chemists have many skills that are in high demand in various industries. Publishing is one of them.

Scientific information is valuable only when it is read and used by fellow scientists. Being part of the process to capture, select, edit, archive and distribute information is challenging and fun. Especially at smaller publishing houses one is exposed to many aspects of the business and every skill you picked up at graduate school will be used. On the editorial side a thorough understanding of the scientific content of the publication and a deep insight into the scientific community with a good working network is crucial. It will allow the selection of material that is of high quality and of special interest to the readers. On the business development part, out of the box thinking, healthy skepticism and analytical thinking are the most looked after skills.

11:30 101 Some novel perspectives with a computational chemistry degree.
Jeffrey L. Nauss, Accelrys, Inc, 9685 Scranton Road, San Diego, CA 92121-3752, Fax: 858-799-5100, jnauss@accelrys.com - SLIDES

A degree in computational chemistry is often considered to be narrow and specialized. In some regards, it is; yet there are many other opportunities with such a background in commercial, academic, government, and non-profit organizations. This talk will examine several of these opportunities. Drawing from personal experience spanning nearly two decades, the speaker will paint a story of a multifaceted career and one that is still evolving. The goal for the presentation is to show that varied opportunities are out there; you just need to be open-minded when searching.

1:00 102 Teaching and learning of strucural organic chemistry with nomenclature/structure software.
Bert Ramsay1, Antony John Williams2, Andrey Erin2, and Robin Martin2. (1) Department of Chemistry, Eastern Michigan University, Ypsilanti, MI 48197, Fax: 734-487-1496, Bert.Ramsay@emich.edu, (2) Scientific Development, Advanced Chemistry Development - SLIDES

Many organic chemistry students have difficulty in determining and "seeing" the configuration about a stereogenic carbon presented in 2-d structures. A true understanding comes when these diagrams are converted to 3-D pictures or models that can be rotated to correspond to the diagram's perspective. Much of this confusion can be avoided if students would use Nomenclature/Structure software programs to compare 2- and 3-D renderings and names of chemical structures. A Student Guide to the Use of Nomenclature/Structure software has been developed for inclusion with ACD's ChemSketch and ACD/Name software. The Guide also helps students recognize the location and naming of functional groups.

1:30 103 Application integration: Providing coherent drug discovery solutions.
Mitchell Miller, and Manish Sud, Cheminformatics Research, LION Bioscience Inc, 9880 Campus Point Drive, San Diego, CA 92121, Fax: 858-410-6501, mmiller@netgenics.com - SLIDES

Over the last couple of decades, the number of computational tools available for drug discovery has underdone rapid growth. Most of these tools are designed to address a specific drug discovery task. In addition to the need to learn multiple software packages with very different user interfaces, transferring data between the various applications can be difficult or impossible. To address these issues, we have developed an application integration framework that interconnects a variety of third-party and in-house applications to support drug discovery efforts. This allows users in a single application to perform a variety of tasks and seamlessly transfer data from one to another. We will present solutions developed to support lead identification and optimization efforts which help discovery scientists identify analogs and optimize their ADME properties using structure-based models.

2:00 104 The APRILSTM (Automated Plate Re-Mapping and Integrated Library Services) System: Using Open Source Tools to Solve Thorny Informatics Problems Inexpensively.
Manton R Frierson III1, Boliang Lou2, and Shawn Beltz1. (1) Computational Chemistry and Informatics, Advanced SynTech, LLC, Louisville, KY 40299, Fax: 561-258-5783, m.frierson@advsyntech.com, (2) Deaprtment of Chemistry, Advanced SynTech, LLC

Within many small companies (and even large ones), the expense of proprietary software solutions to cheminformatics problems can often be prohibitive. The "Open Source" revolution has provided many tools to give highly functional and robust systems on inexpensive hardware platforms. In our own organization, we have used an Apache webserver in conjunction with the open source scripting languages Perl and PHP to develop many tools accessible to both our informatics group and our bench chemists for the purpose of constructing and manipulating the data of their combinatorial library syntheses. This paper will discuss the construction and capabilities of the APRILS system which integrates functions like plate re-mapping (generating a variety of formats for different HTS instrumentation), dispense lists for automated synthesizers, as well as filters for tracking "drug-like" properties of proposed or newly synthesized libraries.

2:30 105 Homogenizing analytical data from multiple vendors into a unified workspace.
Antony John Williams, Scientific Development, Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596, tony@acdlabs.com - SLIDES

Today a plethora of analytical techniques are used to characterize a particular chemical compound or material as it migrates from research and discovery through scale-up to manufacturing. These techniques include the multiple forms of spectroscopy and chromatography, hyphenated techniques and other analytical techniques that produce “curves” including electrochemistry and thermal analysis. The lifecycle of any particular compound can originate with spectra to identify the structure, chromatograms to separate the material and other technologies to characterize its performance. To date it has not been possible to manage all this associated analytical data, together with associated chemical structure information, in a single unifying interface and the need for an integrated system for processing and management of all associated data persists. This talk will provide an overview of how to address the diverse needs in processing and data management for multiple forms of analytical data and make the results available across an enterprise.

3:00 106 Effective chemical information.
Jonathan M Goodman, Department of Chemistry, Cambridge University, Lensfield Road, Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, jmg11@cam.ac.uk - SLIDES

We have more chemical information than we can handle well. How can we use it most effectively? Databases are hard work to create and maintain, because: (i) they need constant curation to keep up to date, (ii) the information within them needs to be validated, (iii) a rationale for trusting it must be available, and (iv) the information must be accessible. A series of information sources will be presented, which break some of these rules, but remain useful. Most of the data are available on the Cambridge Department of Chemistry web site (http://www.ch.cam.ac.uk/MMRG/CIL/ ; http://www.ch.cam.ac.uk/c2k/ ; http://www.ch.cam.ac.uk/magnus/ ; http://www.ch.cam.ac.uk/today/)

3:30 107 Snapshot of content, retrieval, and quality of some chemical information systems.
Dieter Rehm, Department of Chemical and Pharmaceutical Sciences, Johann Wolfgang GOETHE University, Marie-Curie-Strasse 11, Frankfurt am Main D - 60439, Germany, Fax: ++49-69-798-29248, REHM@chemie.uni-frankfurt.de - SLIDES

Traditional primary printed information is rapidly supplemented or will be replaced in future by primary e-information. Secondary e-information systems make available access to primary information by multidimensional retrieval profiles. Despite the possibilties of full text searching and chemical compound searching by structure precise procedures to excerpt and index the primary information (print and/or e) can be improved on a programing level by taking into account the actual content of the database to increase the precision of a search. Quality of the content as well as the experience of persons determine finally the result of a retrieval session. Examples for deficits in chemical information systems are given. Necessary consequences are shown: To improve the quality of secondary information is not only the obligation of producers but also of editors and - last not least - the publishing scientists. This has likewise a feedback to the education of students.

4:00 108 Battling the data avalanche – a chemical data management solution for the smallcap company.
Kevin K Turnbull, Advanced Chemistry Development, 90 Adelaide St. W., Suite 702, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596, kevin@acdlabs.com - SLIDES

The pharmaceutical industry is well acquainted with the challenges of managing various forms of chemical data across an organization. These challenges are augmented when considering the plight of smallcap companies, whose monetary and human resources are often severely out of sync with the volumes of chemical data they are generating.

This talk will discuss the emergence of a novel database software system designed for standardizing and consolidating chemical information company-wide. The software integrates chemical structures with images, reaction diagrams, documents, and text in a manner that is customizable to the user, and thus is malleable to the specific data management needs of an organization. Databases that are built in this system are searchable by chemical structure, sub-structure, text, and other user-defined data fields. Such databases are easily accessible by all beneficiaries in the company, and can be connected to commercial tools for physical property prediction, chemical naming, and analytical data management (NMR, MS, IR, UV, HPLC, and GC).

4:30 109 Command and control of the drug discovery factory: Putting chemists in the driver's seat.
David Hadfield, Chemistry, Spotfire, 212 Elm Street, Somerville, MA 02144

The last decade has seen an abundance of novel technologies, methodologies, and research content coming into the domain of drug discovery. High throughput technologies have the possibility of significantly improving the results of pharmaceutical research. However - the results have not yet been shown. The output of novel products in the market place has decreased rather than increased while these new technologies have been implemented in current processes. Much of the blame for this has been put on how research organizations have not been ready for dealing with the data explosion from novel technologies. Researchers have had to deal with 100x more data - in terms of number of compounds as well as in number of properties. Novel visualization and analytic technologies have been successful in battling this explosion - allowing researchers who otherwise would be confined to spreadsheets to rapidly browse data searching for trends and outliers. While these novel visualization and analytic technologies have had big impact I will argue that to see real improvements in research productivity we need to see a discontinuous change in how research organizations deal with data and decision-making. Chemists need to be able to see their results in the context of biology; biologists need to be able to see their results in the context of chemistry, etc. Decisions need to be made cross functionally - taking every aspect of chemistry and biology into consideration. Every decision need to be continuously monitored and updated as new data becomes available. This is easier said than done. As much as such decision-making indeed would be a discontinuous change, a discontinuous change in software infrastructure for decision-making will be needed to enable a change in methodology - and put researchers in the driver's seat. I will outline a novel architecture for analytical software for the world of drug discovery - building on previous success in data visualization - and showing how integrated decision-making can be made possible, though improvements at every level from the UI to the database. The presentation will include architecture as well as user interface issues - and discuss impact on pharmaceutical research.



Newspaper template for websites