Final Program, 224th ACS National Meeting, Boston, MA, Fall, 2002

CINF 1 :  Ultra High Throughput Screening using THINK on the Internet
E Keith Davies, Department of Chemistry, Oxford University, Central Chemistry Laboratory, South Parks Road, Oxford, United Kingdom, Fax: +44 1865 275905, Keith.Davies@Chem.ox.ac.uk, and Catherine J Davies, Treweren Consultants Ltd

Abstract
The growth in the collections of small molecules available for experimental testing prompts selection of subsets and has stimulated the question "how many drug-like molecules are there?". In the CAN-DDO project we harnessed the power of over 1 million volunteered PCs to screen 3.5 billion drug-like molecule against 12 protein targets of relevance to cancer therapy. The development of the THINK software, its adaptation to run as a screen-saver and some of data management issues will be described in this paper.



CINF 2 :  Next steps for virtual screening and massively distributed computing
Davin M. Potts, United Devices, Inc, 12675 Research Blvd., Building A, Austin, TX 78759, Fax: 512-331-6235, davin@ud.com

Abstract
The recent massive distributed computing project led by W. Graham Richards' team (Oxford) to perform virtual screening of 3.5 billion drug-like molecules against a series of protein targets has identified a significant number of promising, novel small molecules which warrant further investigation and refinement into prospective drug candidates. The successful involvement of the general public (1.5 million PCs participating on the internet to date capable of producing a sustained compute power in excess of 60 teraflops) in this pioneering scientific endeavor has demonstrated the magnitude and viability of available untapped compute power for drug discovery efforts. With the recently announced availability of state of the art screening tools (e.g. LigandFit) on such distributed computing platforms comes the opportunity and challenge for pharmaceutical and drug discovery companies to apply this combination of tools to their internal development efforts. We will discuss the next steps in improving the quality of the findings from the first stage of the project led by Oxford, the continuing need for and role that distributed computing will play, and the relevance to commercial pharmaceutical discovery.



CINF 3 :  Evaluating protein-ligand interactions through flexible docking
Tad Hurst, ChemNavigator, 6166 Nancy Ridge Drive, San Diego, CA 92121, Fax: 858-625-2377, thurst@chemnavigator.com

Abstract
As the global research emphasis shifts from genomics to proteomics, the question of how copious amounts of bioinformation will ultimately be used to accelerate the discovery of therapeutic compounds becomes more prominent. At the same time, the number of commercially accessible compounds that can be tested for pharmaceutical efficacy has exploded into the millions.

ChemNavigator is addressing this need by offering advanced docking technology that more efficiently evaluates protein-ligand interactions. ChemNavigator has developed ultra-fast 3-D technology that will allow millions of structures to be docked into thousands of protein targets. In addition to rapid analysis, this technology will allow flexible ligand docking against the entire surface of a protein, not requiring specification of an active site.

This presentation details how ChemNavigator’s novel 3-D flexible docking technology can assist life science researchers by allowing them to quickly and efficiently filter millions of structures in their search for novel therapeutic compounds.



CINF 4 :  Docking of diverse ligands to diverse protein sites: six degrees of application
Teresa A. Lyons1, Michael Dooley2, Anne-Goupil Lamy1, Sunil Patel3, Remy Hoffmann4, Hughes-Olivier Bertrand4, and Marguerita Lim-Wilby5. (1) Accelrys Inc, 200 Wheeler Road, South Tower, 2nd Floor, Burlington, MA 01803-5501, Fax: (781) 229-9899, txl@accelrys.com, (2) Accelrys KK, (3) Accelrys Ltd, (4) Accelrys, (5) Lead Identification and Optimization, Accelrys Inc

Abstract
The utility of a docking application in the virtual screening of libraries prior to biological assay or custom synthesis is dependent on its ability to predict ligand affinity over a wide pKi range at a specific binding site. The definition of the binding site of interest thus becomes the most critical step in setting the stage for docking.

Examples will be presented of straightforward docking/screening cases, as well as difficult cases, such as proteins with extremely large potential binding sites, proteins with induced fit or allosterism, protein/ligand complexes with alternate ligand conformations from Xray crystal structures. In between are the “tunable” cases where user settings and protein preparation are critical: highly flexible ligands, steric problems or clashes, and local flexibility in the binding site. Finally we will summarize the classes of proteins and types of ligands for which LigandFit will perform suitably as a vHTS tool.



CINF 5 :  eHiTS: Novel algorithm for fast, exhaustive flexible ligand docking and scoring
Zsolt Zsoldos1, A. Peter Johnson2, Aniko Simon1, Irina Szabo1, and Zsolt Szabo1. (1) Research and Development, SimBioSys Inc, 135 Queen's Plate Dr, Unit 355, Toronto, ON M9W 6V1, Canada, Fax: 416-741-5083, zsolt@simbiosys.ca, (2) ICAMS, School of Chemistry, University of Leeds

Abstract
The flexible ligand docking problem is often divided into two subproblems: pose/conformation search and scoring function. For virtual screening the search algorithm must be fast; must provide a manageable number of candidates; and be able to find the optimal pose/conformation of the complex. Algorithms employing stochastic elements or crude rotomer samplings fail to satisfy the last criterion. The eHiTS (electronic High Throughput Screening) software offers new approaches to both subproblems. The search algorithm is based on exhaustive graph matching that rapidly enumerates all possible mappings of interacting atoms between receptor and ligand. Then dihedral angles of rotatable bonds are computed deterministically as required by the positioning of the interacting atoms. Consequently, the algorithm can find the optimal conformation even if unusual rotomers are required. The scoring function contains novel treatment of weak hydrogen bonds, aromatic pi-stacking and penalties for conflicting interactions. Validation results on over 300 complexes will be presented.



CINF 6 :  Effect of electrostatic models on the accuracy of ligand docking
Philip W. Payne, Consultant in Computational Chemistry, 660 Santa Paula Avenue, Sunnyvale, CA 94085-3416, Fax: none, PAYNES@PACBELL.NET

Abstract
Clustered ensembles of various ligands bound to the estrogen receptor were built by systematic search for energetically favored ligand positions. Four different electrostatic models were employed: point charge with constant dielectric, point charge with cubic spline cutoffs in the range 8-10 Ĺ, point charge with cubic spline cutoffs in the range 10-12 Ĺ, and a sigmoidal dielectric screening model.

Compared to the constant dielectric model without cutoffs, dramatic shifts in the energy spectra of ligand clusters and the positions of bound ligands were observed when calculations were done with either Coulomb distance cutoffs or the sigmoidal screening model. Subsequent analysis demonstrated that the distance cutoffs or sigmoidal screening cause chaotic instability of the electric field in the binding site. Distance cutoffs for Coulomb interactions should therefore be avoided in the study of protein-ligand interactions unless such cutoffs are uniformly applied to all atoms in each polar bond.



CINF 7 :  Integration Continuum...different strokes for different folks
Kirk Schwall, Manager, Authority Database Operations, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: (614) 447-5471, kschwall@cas.org, and Eileen M. Shanbrom, Manager, CAS and Web Content, Chemical Abstracts Service

Abstract
Scientists are challenged in today’s environment to locate the right information in a sea of information that incorporates both traditional and web services offered by government agencies and others. Information providers should move away from viewing this changing environment as a conflict of "traditional" resources versus the web. Information consumers want both available in integrated services. These services must provide access to the right information at the right time in the context of scientific research. This is especially relevant for producers of STM database and search-and-retrieval services, because scientists will be the biggest users of services that integrate web content with professionally built databases. Recent developments by a number of information providers offer good examples of what is now possible and necessary. BLAST searching for identifying sequences was originally a free service provided by the U.S. government but BLAST has now been incorporated into proprietary services that integrate the identification of genes, proteins and other biological entities with the retrieval of related literature and patents. To an increasing extent, a solid foundation for building the new "digital research environment" rests on three building blocks: professionally produced databases, value-added search tools, and the web.



CINF 8 :  Hindsight is an exact science
Jeremy N Potter1, Chris Hardy1, Robert D. Brown2, and Julian Hayward2. (1) Accelrys, Inc, 9685 Scranton Road, San Diego, CA 92121-3752, jeremyp@accelrys.com, (2) Accelrys Inc

Abstract
The value of knowing about work done by others in the field of organic synthesis goes without saying, and providing compilations of such information is the basis of the many reaction databases available on the market today. These range in size from a few hundred to over 10 million reactions and can typically be characterised as selective, thematic or comprehensive in nature. Almost without exception, however, such databases focus only on those reactions that have a successful outcome, with the goal of allowing chemists to search for tried and trusted methods that have been presented in the literature. However, our own experience of life tells us that it can be just as valuable to know in advance that something will not work. This paper will outline the ways in which such knowledge of synthetic 'failures' is used in the pharmaceutical industry, and will introduce a database of such reactions.



CINF 9 :  Bridging the gap between published and proprietary spectroscopic databases: an informatics system case study
Gregory M. Banik, Informatics Division, Sadtler Software & Databases, Bio-Rad Laboratories, 3316 Spring Garden Street, Philadelphia, PA 19104-2596, Fax: 215-662-0585, gregory_banik@bio-rad.com, and Ty Abshear, Informatics Division, Sadtler Software and Databases, Bio-Rad Laboratories

Abstract
A new informatics system, the KnowItAllTM Informatics System, is described that bridges the gap between published and proprietary spectroscopic information. KnowItAll offers the world's largest published collection of analytical information and the ability to manage multiple spectra and chromatograms, including 13C and 1H NMR, IR, Raman, MS, GC, and UV/Vis, along with the corresponding chemical structures and related property information. KnowItAll allows users to create their own databases and search them seamlessly with databases of published spectra as well as databases of reference spectra. Cross referencing from one analytical technique to another is also seamless, as is the use of user-assigned NMR spectra in the database-based prediction of NMR spectra. Finally, web addresses or file names can be added, either to published databases or proprietary databases, to permit linking to related documents that are outside the KnowItAll system.



CINF 10 :  Developing value-added organic chemistry databases from traditional print products
Darla Henderson, and Colleen Finley, John Wiley & Sons, Inc, 605 Third Avenue, New York, NY 10158, dhenders@wiley.com

Abstract
Various chemical databases, primarily abstracted databases, have existed in the chemical information business for three-plus decades. This presentation discusses the development and features offered by John Wiley & Sons in their newly released and developing chemical reaction databases. Wiley chemical reaction databases focus on offering the full content of a product, as opposed to abstracted data found in most other reaction databases, yet including the value-added features customers prefer, such as reaction searching and interoperability among databases. Critical issues, such as developing a product amenable to both the academic and corporate customers are discussed.



CINF 11 :  Linking reaction information from different sources
Guenter Grethe1, Peter Loew2, Hans Kraut2, and Josef Eiblmaier2. (1) Marketing/Scientific Applications, MDL Information Systems, Inc, 14600 Catalina Street, San Leandro, CA 94577, Fax: 510-614-3616, guenter@mdli.com, (2) InfoChem GmbH

Abstract
Collecting required relevant information to solve a synthetic problem is a formidable task. Unless the chemist is interested in the preparation of a known compound, it almost never is straightforward. The process usually involves consulting more than one source and going back and forth between different sources to find the most relevant answers. This can be a very time-consuming process in the hardcopy world and confusing when available electronic sources require the use of different programs. Today’s technology allows linking of information using point-and-click rather than cut-and-paste methodology. In the reaction world, the linking must foremost be based on reaction type rather than the similarity of participating molecules. As a first step in this direction we have developed a system that seamlessly links information from reactions of similar type described in reaction databases and major reference works. The latter provide important complementary information, including discussions about reaction mechanism, stereochemistry, the most suitable reagent or catalyst, and others. Linking the references from both databases and books to the primary literature augments the integration. We will describe the underlying concept of the system and demonstrate the usefulness with examples from the recent literature.



CINF 12 :  Reoptimization of MDL keys for use in drug discovery
Keith T Taylor, Joseph L. Durant Jr., Burton A Leland, Douglas R Henry, and James G Nourse, MDL Information Systems, 14600 Catalina Street, San Leandro, CA 94577, keitht@mdli.com

Abstract
The use of keysets based on a variety of different descriptors has an established place within the drug discovery workflow. MDL’s keysets were optimized for substructure searching, however, they do have performance for clustering and diversity analysis comparable with keysets based on feature trees. We will present an overview of the underlying technology supporting the definition of features in MDL’s keysets, and encoding them into keysets. Construction of a keyset containing all possible combinations of our set of defined features with occurence counts of one or more has been carried out. Standard deviations of a few percent were observed in the clustering performance of populations of similarly sized keysets. Additionally, performance is seen to be relatively insensitive to keyset size, especially for keysets larger than 1000 bits. We have also examined a variety of strategies to construct keysets, the performance and relative merits of these strategies will be discussed.



CINF 13 :  Strategies for Lead Discovery Oriented Virtual Screening
Tudor I. Oprea, EST Chemical Computing, AstraZeneca R&D Molndal, Molndal S-43183, Sweden, Fax: 46 (0)31-776-3792, tudor.oprea@astrazeneca.com

Abstract
Large numbers of virtual compounds can be evaluated in silico via Virtual Screening (VS). Some properties can be readily evaluated prior to enumeration from reactants. However, binding affinity estimates require enumerated structures. The Lipinski rule of five, the standard property filtering protocol for VS, was derived from drugs (not leads). For lead discovery oriented VS, this protocol needs to be shifted toward lower molecular weight, lower hydrophobicity and higher solubility, in order to capture high quality leads. Possible VS strategies with respect to optimizing binding affinity and pharmacokinetic properties are discussed.



CINF 14 :  Application of pharmacophore fingerprint keys to structure-based design and data mining
Marvin Waldman, Moises Hassan, Chien-Ting Lin, Shashidhar N. Rao, and C. M. Venkatachalam, Accelrys, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, marvin@accelrys.com

Abstract
By combining technology from Ludi and Catalyst, we are conducting studies on the use of active site based pharmacophores for mining databases of compound collections for the purpose of lead identification. In contrast to more conventional approaches using 3D pharmacophore searching techniques, we explore the use of similarity comparisons of 3D fingerprint maps of the active site and candidate ligands as a means of prioritizing ligands for real or virtual high throughput screening. Various alternative approaches will be examined including the effects of using binary vs. occurrence counts representation for pharmacophore keys, the use of different similarity metrics, and the use of different pharmacophoric feature types including donor and acceptor projected points. Data mining studies conducted on several protein systems will be presented and analyzed in terms of the effectiveness of recovering known seeded actives from a larger ligand pool using the various approaches outlined above.



CINF 15 :  Quasi2: Virtual site model derivation and application to lead identification

David G. Lloyd, Nicholas C. Perry, Nikolay P. Todorov, Iwan J. P. de Esch, and Ian L. Alberts, De Novo Pharmaceuticals, Compass House, Vision Park, Histon, Cambridge CB4 9ZR, United Kingdom, Fax: +44-(0)1223-238088, david.lloyd@denovopharma.com

Abstract
Traditional pharmacophore models define the minimum requirements for activity, but not necessarily the optimum conditions. Quasi2 produces virtual site models by optimising the molecular similarity within a set of ligands with respect to those features known to be important in binding to biomolecular targets as a function of ligand conformation, ionisation state and tautomeric state. The use of virtual site models in database searching bridges the gap between pharmacophore screening and high-throughput docking for targets on which structural information is limited or unavailable. Quasi2 virtual site models have been validated experimentally, through the design of active compounds ‘tailored’ to the virtual site features and computationally, through accurate binding mode predictions for known actives and enhanced hit-rates in high-throughput database screening.



CINF 16 :  Identification of Potent and Novel a4b1 Antagonists using In Silico Screening
Juswinder Singh1, Steve Adams1, Wen-Cherng Lee1, and Herman van Vlijmen2. (1) Structural Informatics, Biogen Inc, 12 Cambridge, Cambridge, MA 02142, Fax: 617-679-2616, juswinder_singh@biogen.com, (2) Biogen, Inc

Abstract
a4b1(VLA-4) plays an important role in the migration of white blood cells to sites of inflammation, and has been implicated in the pathology of a variety of diseases. We describe a series of potent inhibitors of a4b1 that were discovered using computational-based screening for replacements of the peptide region of an existing tetrapeptide-based a4b1 inhibitor (1; 4-[N'-(2-methylphenyl)ureido]phenylacetyl-Leu-Asp-Val) derived from fibronectin. The search query was constructed using a model of 1 that was based upon the X-ray conformation of the related integrin-binding region of VCAM-1. The 3D search query consisted of the N-terminal cap and the carboxyl side chain of 1 since based upon existing structure-activity data on this series, these were known to be critical for high-affinity binding to a4b1. The computational screen identified 12 reagents from a database of 8624 molecules as satisfying the model and our synthetic filters. All of the synthesized compounds tested inhibit a4b1 association with VCAM-1, with the most potent compound having an IC50 of 1 nM, comparable to the starting compound. Using CATALYST, a 3-D QSAR was generated that rationalizes the variation in activities of these a4b1 antagonists. The most potent compound was evaluated in a sheep model of asthma, and a 30mg nebulized dose was able to inhibit early and late airway responses in allergic sheep following antigen challenge, and prevented the development of nonspecific airway hyper-responsiveness to carbachol. Our results demonstrate that it is possible to rapidly identify non-peptidic replacements of integrin peptide antagonists. This approach should be useful in identification of non-peptidic a4b1 inhibitors with improved pharmacokinetic properties relative to their peptidic counterparts.



CINF 17 :  Unified virtual ADME/Tox using a hierarchy of machine learning models
Guido Lanza, and William Mydlowec, Pharmix Corp, 200 Twin Dolphin Drive, Suite F, Redwood Shores, CA 94065, guido@pharmix.com

Abstract
We present a unified virtual ADME/Tox system based on a hierarchy of machine learning models. All compounds are initially subjected to 3-D multi-conformer analysis, and numerous molecular descriptors are calculated, both conformationally-specific and not. A hierarchy of models based on these descriptors is then used to predict various physiochemical and pharmacokinetic properties. We describe a series of models, including: solubility, octanol/water partition coefficient, human intestinal passive absorption, intestinal transporter binding, P450 and related enzyme interactions, blood-brain barrier permeability, plasma protein binding, and serum transporter binding. We also describe predictive models of oral bioavailability, volume of distribution, and clearance, as well as limited models involving mechanism-of-action.



CINF 18 :  Application of 1D-similarity analysis to predict plausible modes of CYP-450 metabolism
Chaya Duraiswami, Molecular Modeling, Pharmacopeia, Inc, CN 5350, Princeton, NJ 08543, Fax: 732-422-0156, cduraisw@pharmacop.com, Steven L. Dixon, ADMET R&D Group, Accelrys, and John J. Baldwin, Concurrent Pharmaceuticals, Inc

Abstract
A computationally fast, semi-quantitative, visual method to predict the plausible mode of CYP 450 metabolism based on 1D-Similarity Analysis to known inhibitor, inducers and substrates of CYP-3A4, CYP-2C9 and CYP-2D6 will be presented. The advantages of this method include rapid detection of the possibility of drug-drug interactions, as well as predicting a plausible mode of metabolic degradation for each test compound. Since this method is semi-quantitative and fast, predictions for large combinatorial libraries as well as virtual libraries can be made in a predictive and timely fashion, making this approach a useful computational ADME filter. The results of this method as applied to a set of chemokine inhibitors will be presented.



CINF 19 :  Exact chemical structure batch mode searches
Christopher A. Lipinski, Exploratory Medicinal Sciences, Pfizer Global Research and Development, Groton Laboratories, Eastern Point Road, mail stop 8200-36, Groton, CT 06340, Fax: 860-715-3149, christopher_a_lipinski@groton.pfizer.com

Abstract
Chemistry structure searching tools lag behind those of biology and genomics. Specifically, chemical structures can easily be searched within corporate databases but it is very difficult for chemists to perform structure searches on the external literature. Currently a chemist cannot simply copy a structure from an ISIS/Base corporate database and use it to search chemical abstracts service (CAS) SciFinder. The same holds for a chemical structure from a virtual library sdf file. The search has to be performed by manually drawing in a chemical structure as a search query. Exact chemical structure searches cannot be done in batch mode. For example, one cannot search SciFinder for twenty-five chemically unrelated structures at a time. It is generally unrecognized that the tools are in place for chemists to solve this problem. Three software licenses are required: SciFinder from CAS; Accord for Excel from Accelrys and Name from Advanced Chemistry Development.



CINF 20 :  Integration of disparate data sources from genomics to chemistry
Robert D. Brown, and David Benham, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121, rbrown@accelrys.com

Abstract: Abstract text not available.



CINF 21 :  How to build and deploy chemoinformatics applications
Louis J. Culot Jr., CambridgeSoft Corporation, 100 Cambridge Park Drive, Cambridge, MA 02140, lculot@cambridgesoft.com

Abstract
Rapid development tools and practices have been used by many industries to develop and deploy informatics applications. However, the chemical community has been slow to adopt these tools because of dependencies on specialized technology for handling chemical data. With the recent availability of new technologies for chemistry, such as Java and Active-X clients, ODBC chemical drivers, and chemical Oracle Cartridges, these barriers are removed, and the chemical community can take advantage of the rapid development tools and practices available to the broader market. I review the technologies and practices, provide a framework for managing rapid-development projects, and provide a case study and example of building such an application.



CINF 22 :  Hybrid methodologies for pKa prediction and database selection
Mark J. Rice1, Ryan T. Weekley1, William K. Ridgeway2, and Paul A. Sprengeler1. (1) Structural Group, Celera Therapeutics, 180 Kimball Way, South San Francisco, CA 94080, Fax: 650-866-6654, mark.rice@celera.com, (2) University of California, Berkeley

Abstract
We have developed a new methodology for pKa prediction combining empirical prediction methods with an experimental database. For any compound, the nearest experimental values from the database are used to correct the predicted value. In order to quantify similarity, we have developed a novel site-specific fingerprint based in chemical graph theory. We believe this approach offers a trainable pKa predictor especially suited to series of compounds.



CINF 23 :  Rule-based two-layer model for virtual high throughput screening
Ruediger M. Flaig1, Thomas F. Kochmann2, and Roland Eils2. (1) Institute for Pharmaceutical Technology and Biopharmacy, University of Heidelberg, Im Neuenheimer Feld 366, Fax: +49 4075110-17171, flaig@sanctacaris.net, (2) Intelligent Bioinformatics Systems, German Cancer Research Center, Im Neuenheimer Feld 280, Fax: +49-6221-42-3620, t.kochmann@dkfz-heidelberg.de

Abstract
Science is producing vast amounts of data from which relevant knowledge has to be extracted, a process for which suitable tools still have to be developed. A universal tool to this end would use a set of rules which it can extend on its own. It requires two layers of processing: (1) subsymbolic processing (implemented in C, C++ or Java) for transforming raw data into information, (2) symbolic processing (implemented in Haskell, Miranda or ML) for extracting knowledge from the “predigested” information. Subsymbolic processing consists largely of deconstructing source data into patterns to be distributed over multiprocessor systems, yielding an array of summary lists (abstraction). Symbolic processing evaluates these lists by further application of the underlying rules. To start, we need a primary set of rules, the bootstrap rules (Kant: “a priori”), as opposed to the deduced rules (“a posteriori”) identified by the system. The rules can be extended by employing the knowledge gathered before, leading to a “rising spiral”.

CLICK TO VIEW FULL SIZE IMAGE



CINF 24 :  DNA decompiler for the establishment of bootstrapping rules
Thomas F. Kochmann1, Ruediger M. Flaig2, Christian Busold3, and Roland Eils1. (1) Intelligent Bioinformatics Systems, German Cancer Research Center, Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany, Fax: +49-6221-42-3620, t.kochmann@dkfz-heidelberg.de, (2) Institute for Pharmaceutical Technology and Biopharmacy, University of Heidelberg, Im Neuenheimer Feld 366, D-69120 Heidelberg, Germany, Fax: +49 4075110-17171, flaig@sanctacaris.net, (3) Functional Genome Analysis, German Cancer Research Center

Abstract
Generally, DNA-analysis is based on empirically gathered knowledge („deduced rules“), especially sequence-sequence comparisons. By contrast, the possibility of identifying rules from single sequences without resorting to empirical knowledge has not been fully exploited yet. We propose a tool for extracting knowledge purely from a single DNA sequence. In the decompiler algorithm, any dependency is estimated stochastically, by calculating its relative information content. Such a dependency may consist of specific nucleotide arrangements and neighborhood relationships. It can be determined for any given sequence, thus providing a universal mechanism for bootstrapping autonomous knowledge systems. This knowledge can be extended by deductive evolutionary algorithms, self-organizing into higher-level systems. Here categorical dependency relations between subparts determine Darwinian selection of the most relevant interactions. These autonomous virtual systems can be integrated into the actual scientific process, thus initializing a superimposed knowledge extraction spiral.

CLICK TO VIEW FULL SIZE IMAGE



CINF 25 :  Application of chemometric and QSAR approaches to scoring ligand-receptor binding affinity
Alexander Tropsha1, Jun Feng2, Alexander Golbraikh1, Curt Breneman3, Wei Deng4, and Nagamani Sukumar3. (1) Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, alex_tropsha@unc.edu, (2) Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill, (3) Department of Chemistry, Rensselaer Polytechnic Institute, (4) Department of Chemistry, RPI

Abstract
59 diverse ligand receptor complexes have been analyzed in multidimensional chemical descriptor space. TAE/RECON descriptors of steric and electronic properties were calculated for active site atoms and ligand atoms independently. For all pairs of ligand receptor complexes, the Euclidean distances between active sites in TAE/RECON descriptor space correlated linearly with the distances between complementary ligands (R2=0.8). Concurrently, k-nearest-neighbor (kNN) variable selection QSAR procedure was applied to ligands only using binding affinity as a target property and normalized MolconnZ descriptors as independent variables. Training and test sets of different size were generated, and multiple models have been built. The best model afforded leave-one-out cross-validated R2 (q2)=0.74 for the training of 50 compounds and predictive R2=0.85 for the test set of 9 compounds. Chemometric and QSAR approaches to the analysis of ligand-receptor interactions provide an important addition to current methodologies that rely on direct use of 3D molecular structures.



CINF 26 :  Evaluation of ligand-receptor binding affinity with a novel statistical scoring function derived from Delaunay tessellation of protein-ligand interface
Alexander Tropsha, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, alex_tropsha@unc.edu, and Jun Feng, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill

Abstract
A novel statistical contact scoring function for calculating ligand receptor binding affinity has been derived by the means of Delaunay tessellation. Given the full atom representation of protein ligand interface, Delunay tessellation generates a set of non-overlapping, space-filling tetrahedra or simplices, which rigorously define nearest neighbor atoms in sets of four vertices. For every quadruplet composition of ligand and receptor atom types found at the protein-ligand interface, a log likelihood factor is obtained from the statistical geometry analysis of 317 complexes. For 67 diverse protein-ligand complexes, the linear regression correlation between four-body scoring function and experimental binding affinity is characterized by R of 0.67. The combination of four-body contact scoring and two-body distance dependent potential of mean force affords R of 0.84. This novel scoring function can be used for rapid evaluation of binding affinity of ligand orientations obtained with various docking algorithms.



CINF 27 :  Massive Virtual Library (MVL) Screening at Biogen: An Integrated Approach From Medicinal Chemistry Design to Decision
Donovan N. Chin, Claudio Chuaqui, Herman van Vlijmen, Xin Zhang, Russell Petter, and Juswinder Singh, Structural Informatics, Biogen, 14 Cambridge Center, Cambridge, MA 02142, donovan_chin@biogen.com

Abstract
This talk will describe our integrated approach to virtual chemistry design, screening, and analysis of very large small-molecule libraries. We are developing an enterprise wide system that puts virtual-chemistry design capabilities on the desktops of medicinal chemists; links these designs with high throughput parallel computing methods for docking, shape-based screening, and statistical modeling; and finally presents the promising “hits” on the web through a series of custom pattern recognition methods and binding mode visualizations. By integrating the medicinal chemist into the virtual screening process, we are combining their ability to design new drug like compounds with molecular modeling and high performance computing. While throughput can be increased with more compute resources, we have also designed a system to handle the massive amount of information from the virtual screens and arrive at decisions quickly, which is essential for impacting projects with tight timelines. As the system evolves, we are developing “smart” library design rules that further enhance the value of the MVL at Biogen. The MVL is a key component that integrates and maximizes information and technologies from medicinal chemistry, structural biology, screening, and pharmacology. We will discuss the successes and failures, and the lessons learned in developing the MVL system in a pharmaceutical environment.


CINF 28 :  Fuzzy logic based focused libraries (FL/FL) for HTS screening: application to anti-carcinogenic compounds
Jacques R. Chretien1, Marco Pintore1, Nadčge Piclin1, and Frederic Ros2. (1) BioChemics Consulting, Centre d'Innovation, 16, rue Leonard de Vinci, Orleans cedex 2 45074, France, Fax: + 33 2 38 41 72 21, jacques.chretien@univ-orleans.fr, (2) Chemometrics & BioInformatics, University of Orleans

Abstract
A global strategy of Database Mining was applied for classifying a data set of 1294 anti-carcinogenic compounds, divided in 8 classes according their mechanism of action. After computing a set of 165 molecular descriptors, the most relevant parameters were selected with help of a procedure combining Genetic Algorithm concepts and Stepwise method. Successively, an Adaptive Fuzzy Partition algorithm was implemented on the training set, distributed in the hyperspace of the most relevant descriptors, to build a robust structure-activity relationships. The best model was able to predict correctly the anti-carcinogenic activity of the test set molecules, with a very satisfactory ratio of about 85%. Finally, this model was employed to screen three types of different data bases: (i) commercially available compounds of synthetic origin, (ii) natural substances derived from the Dictionary of Plant Toxins and (iii) natural substances potentially active as anti-carcinogenic agents. Statistics of these virtual HTS will be given.



CINF 29 :  Moore’s Law and the future of virtual screening
William Mydlowec, Pharmix Corp, 200 Twin Dolphin Drive, Suite F, Redwood Shores, CA 91898, Fax: (650) 637-0199, bill@pharmix.com

Abstract
This talk discusses virtual screening in the context of Moore’s Law, which projects that the number of transistors on an integrated circuit will double approximately every 18 months. We first discuss the implications of exponentially-increasing computing power on current-generation virtual screening technologies. For example, computers are more than 1000x faster than they were in 1987, yet algorithms of that era continue to dominate in simulation, optimization, and modeling in computational chemistry. We propose future directions and new algorithms based on recent advances in computer science and electrical engineering. We then project the impact of Moore’s Law on virtual screening several decades into the future, using metrics such as: cost/time/number of screens, volume/complexity/duration/accuracy of atomistic simulations, etc. We also consider relevant engineering issues, including development of multimillion-line software codebases, construction of >10,000 CPU supercomputers and multi-petabyte databases, and other large-scale issues.



CINF 30 :  100 years Houben–Weyl Methods of Organic Chemistry: Entering the New Millennium
Guido F. Herrmann, Rolf Hoppe, and Kristina Kurz, Thieme Publishers, Rüdigerstrasse 14, Stuttgart 70469, Germany, Fax: +49 711 9831 777, guido.herrmann@thieme.de

Abstract
The availability of scientific information in electronic format has significantly changed the way we select relevant information sources. Time matters! Information that is not accessible at the researcher’s desk-top will be overlooked simply because the library is a walk away and other resources compete for a researcher's attention. But a highly competitive environment in industry and academia makes knowledge and efficient access to it an important performance driver. Houben-Weyl is the standard reference work in synthetic chemistry since 1909 and comprises four editions, 140 volumes and roughly 160,000 pages.

Thieme (www.thieme-chemistry.com) chose to accept the challenge to convert 100 years of methodology information in the field of organic chemistry into a convenient and user-friendly online system. The complete series is now available in electronic format, featuring an interactive table of contents, key word search, using a controlled vocabulary, as well as a graphical interface.



CINF 31 :  Building digital archives for scientific information
Leah R. Solla, Physical Sciences Library, Cornell University, 293 Clark Library, Cornell University, Ithaca, NY 14853-2501, Fax: 607-255-5288, lrm1@cornell.edu

Abstract
Researchers, librarians, and publishers have valid concerns about the long-term preservation of digital information. There are many issues to be addressed in the formation of a trusted digital archive. Some parallel the more familiar preservation of print material, such as duplication and sustainability. LOCKSS (Lots Of Copies Keeps Stuff Safe) is a new acronym for an old practice in the print world of independently maintained and widely distributed collections. Digital preservation requires duplication; managed and distributed duplication is even better. Effective digital preservation models need to be self-sustaining, and adhere to format standards. The digital world does not respect traditional borders (political, corporate, publisher, content, etc.). The roles of stakeholders are changing in the digital realm. Publishers have often been the sole controllers of information, but increasingly there are authors, government agencies and other players in control. Until recently the library has been the archive and access provider, but publishers and other players are now active participants in digital preservation and access. The academic research library community is investigating a digital preservation role akin to their traditional role in print, subject based archiving. Archiving across subject areas in the academic environment complements the archiving approach of publishers in the competitive market environment. This paper will review a variety of digital preservation projects in the sciences.



CINF 32 :  Digital Archiving: Experiences of a major commercial publishing house
C. Amanda Spiteri, ScienceDirect, Elsevier Science, Molenwerf 1 1014 AG, Amsterdam, Netherlands, c.spiteri@elsevier.com

Abstract
Assuring the preservation of digital information is one of the highest priorities for libraries and publishers alike, particularly as more and more libraries go "electronic only" and the accessibility of traditional paper copies is reduced. For part of the life cycle of scientific information, commercial publishing practices support the most cost efficient means of maintaining electronic access to current information. For other parts of the cycle, digital preservation and access responsibilities must be supported by a designated agent. Elsevier Science has been a leader in the digital archiving of electronic journals through development of services like ScienceDirect. We continue to develop our experience in archiving issues such as policy, partnership relations, technology and creation of the digital archive itself. This presentation will cover some leading initiatives in these areas and give examples of how Elsevier Science currently addresses these challenges.



CINF 33 :  DSpace: MIT's Digital Repository
Margret Branschofsky, MIT Libraries, Massachusetts Institute of Technology, Bldg. 10-500, MIT, Cambridge, MA 02139, Fax: 617-452-3000, margretb@mit.edu

Abstract
DSpace, an MIT Libraries project sponsored by Hewlett-Packard Labs, is a digital repository that captures, stores and distributes the various digital products of MIT faculty and researchers. The repository will collect preprints, articles, working papers, technical reports, datasets, images, video and audio content. This web-based system provides 1)a flexible submission process for MIT contributors that captures both metadata and content files, 2)storage and preservation services for a variety of file formats, and 3)powerful search and retrieval capabilities for end users. The presentation will review DSpace design features, organizational issues surrounding development of the system in an institutional setting, and policy issues arising from implementation of the system. A review of the beta-testing experience with early adopters will also be provided.



CINF 34 :  Implementing the Physical Review Online Archive (PROLA)
Mark D. Doyle, Journal Information Systems, American Physical Society, 1 Research Road, P. O. Box 9000, Ridge, NY 11961, Fax: 631-591-4147, doyle@aps.org

Abstract
The American Physical Society has recently completed digitizing all of our journal content back to its start in 1893. This content is available online as the Physical Review Online Archive (PROLA) at http://prola.aps.org/. The archive contains 1.6 million scanned pages for almost 300,000 articles. All bibliographic information and all reference sections have been captured in XML allowing PROLA to offer all of the features expected in a modern electronic journal. We describe the history, building, and implementation of the archive as well as some of the business concerns in making it available.



CINF 35 :  Journey from books to analytical informatics
Marie Scandone, Informatics Division, Bio-Rad Laboratories, Inc, 3316 Spring Garden Street, Philadelphia, PA 19104, Fax: 215-662-0585, marie_scandone@bio-rad.com, and Deborah Kernan, Informatics Division, Bio-Rad Laboratories

Abstract
Taking spectral information from a number of different analytical instruments, presenting it in a digital format and archiving it can be an enormous undertaking. Sadtler Research Laboratories has been producing quality spectral information for the analytical laboratory since 1947. The history is fantastic and the process is unusual. The journey that Sadtler Research Laboratories has taken to become Bio-Rad Laboratories, Informatics Division is a part of the history of chemical information. Along the way, Bio-Rad changed their method of spectral data delivery but always focused on the quality of the analytical information. This paper examines that history and the transition from print to digital media.



CINF 36 :  LOCKSS: Lots of copies keeps stuff safe
Vicky Reich, and Grace Baysinger, HighWire Press, Stanford University Libraries, 1454 Page Mill Rd, Stanford, CA 94305-8400, Fax: 650-725-4902, vreich@stanford.edu, graceb@stanford.edu

Abstract
LOCKSS has the potential to become a sustainable, affordable, preservation tool and archiving system for web delivered information. LOCKSS software systematically caches content in a self-correcting P2P network. The current beta test has demonstrated that the underlying LOCKSS technology works, and in a production environment is likely to allow libraries to maintain high integrity persistent caches of electronic content from journal subscriptions. The beta test includes 60 caches at 50 libraries and two scholarly journals. The system has been in continuous operation for over ten months. The fault-tolerance of the system has been amply demonstrated: two beta caches suffered catastrophic disk failures. Both were able to restart with new, empty disks and recover their content automatically. 41 publishers have expressed strong support for the LOCKSS project. The system shows the potential to preserve digital materials with current publishing systems, the cost of entry is low, the payoffs promise to be high.



CINF 37 :  Combining heterogeneous physical property data sets
Peter J. Linstrom, Physical and Chemical Properties Division, NIST, Building 221, Room A111, 100 Bureau Drive, Stop 8380, Gaithersburg, MD 20899-0830, Fax: 301-896-4020

Abstract
The lack of standards for electronic storage of physical property data often makes it difficult to merge data from different data sets. Data sets often employ different conventions for identifying chemical systems, data accuracy, and data quality. It is a challenge for the archivist to insure that the combined data set represents all data in an appropriate manner.

This talk will discuss lessons learned from the development of the NIST Chemistry WebBook (http://webbook.nist.gov/). The data set for this archive consists of the combination of work from several independent contributors. Efforts were made to produce a set that appears homogeneous to users despite its origins. This required the design of a database that was flexible enough to support the various conventions used by contributors. Examples of problems encountered and their solutions will be discussed.



CINF 38 :  Evaluation, Comparison and Successful Application of Virtual Screening Tools
Romano T. Kroemer1, Joe McDonald2, Douglas Rohrer3, Anna Vulpetti1, Jean-Yves Trosset1, Shashidhar Rao4, John Irwin5, Brian Shoichet6, Colin McMartin7, and Pieter Stouten1. (1) Molecular Modelling & Design, Pharmacia, Discovery Research Oncology, Viale Pasteur, 10, Nerviano 20014, Italy, Fax: ++39-02 4838 3965, romano.kroemer@pharmacia.com, (2) Discovery Research, Pharmacia, (3) Computer-Aided Drug Discovery, Pharmacia, (4) Accelrys Inc, (5) Department of Molecular Pharmacology and Biological Chemistry, Northwestern University, (6) Department of Pharmacology and Biological Chemistry, Northwestern University, (7) Thistlesoft

Abstract
The latest Pharmacia efforts in validating and comparing virtual screening tools are presented. Two studies were carried out in order to assess the performance of docking programs with respect to reproducing correct binding modes. The first of these studies contained 20 publicly available crystal structures of protein-inhibitor complexes belonging to different protein classes. The second study focused on 20 complexes with the same protein (CDK2/Cyclin A). The docking programs evaluated and compared comprise the latest versions of DOCK (Brian Shoichet’s NWU incarnation), Colin McMartin's QXP, Tripos’ FlexX, CCDC’s Gold, Accelrys’ LigandFit, MolSoft's ICM and the in-house Mosaic2 program. We also present a case study where docking was used in order to identify hits for a project in the absence of HTS. After pre-selection, 3,000 compounds were docked to the target. The top-scoring compounds were inspected visually and 22 molecules were selected. The best binding compound, as verified by NMR screening and isothermal titration calorimetry, had a Kd of 450 nM.



CINF 39 :  Assessing the quality of virtual screening results for combinatorial libraries
Dennis G. Sprous, Robert D. Clark, Josepph M. Leonard, and Trevor W. Heritage, Research, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241, dsprous@tripos.com

Abstract
Recent developments in virtual screening tools now make it possible to do enough experiments on the same library to allow critical evaluation of the quality of the results. CombiFlexX incorporates both OptiDock and FlexX(c) methods, and takes advantage of structural redundancies in combinatorial libraries to dramatically speed up docking. Numerous computational experiments can be done in a reasonable period of time, allowing an investigation of the thoroughness of conformational and positional sampling under different protocols and parameters. Metrics and strategies for assessing the quality of the virtual screening results will be presented.



CINF 40 :  Virtual high throughput screening using LigandFit as an accurate and very fast tool for docking, scoring, and ranking
Marguerita Lim-Wilby1, Jeff Jiang2, Marvin Waldman2, and C. M. Venkatachalam2. (1) Lead Identification and Optimization, Accelrys Inc, 9685 Scranton Rd, San Diego, CA 92121, rwilby@accelrys.com, (2) Rational and Combinatorial Drug Design, Accelrys Inc

Abstract
The imperative for virtual high throughput screening arises from the availability of multiple targets, millions of compounds in screening libraries, and limited resources for even the best-endowed pharmaceutical enterprises. The docking application LigandFit has been developed to address this need. A suite of algorithms is provided that (1) aids the user in the detection and definition of binding sites, (2) provides various docking modes with user-defined options, and (3) scores dock poses using proprietary and published scoring functions. We will present considerations that affect accuracy in docking & in scoring, as well as the effects of disproportionately large binding sites, extremely flexible ligands, metal ions, and the presence of flexible protein side chains. Recent advances have allowed reasonably large (~50k) ligand libraries to be screened in a matter of hours, such that the bottleneck in virtual screening is no longer docking, but the preparation and analysis of the datasets.




CINF 41 :  EasyDock: a new docking program for high-throughput screening and binding-mode search
Nikolay P. Todorov1, Ricardo L. Mancera1, Per Kallblad1, and Philippe Monthoux2. (1) De Novo Pharmaceuticals, Compass House, Vision Park, Chivers Way, Histon, Cambridge CB4 9ZR, United Kingdom, Fax: 1223-238088, nikolay.todorov@denovopharma.com, ricardo.mancera@denovopharma.com, (2) Department of Physics, University of Cambridge

Abstract
We have implemented the stochastic tunneling global optimization method within a ligand docking application software, easyDock. By using a novel multiple ligand copy approach and adding a new hydration penalty function, we have optimized various scoring functions and have achieved excellent results in the prediction of protein-ligand binding modes. We have run easyDock on the GOLD data set of protein-ligand complexes and nearly always found the correct ligand binding mode as observed in the corresponding crystal structures. Furthermore, we have achieved a 76% success rate when searching for the correct binding mode using an energy score criterion. These results show that easyDock can be used effectively both for the high-throughput screening of large datasets of compounds and for searching for the correct binding mode of a given ligand.



CINF 42 :  Glide: a new paradigm for rapid, accurate docking and scoring in database screening
Thomas A. Halgren1, Robert B. Murphy2, Jay Banks1, Daniel Mainz1, Jasna Klicic2, Jason K. Perry2, and Richard A. Friesner3. (1) Schrödinger, 120 West Forty-Fifth Street, New York, NY 10036, Fax: 646-366-9550, halgren@schrodinger.com, (2) Schrodinger, Inc, (3) Department of Chemistry, Columbia University

Abstract
Glide uses a novel algorithm for rapid conformation generation that allows an efficient systematic search of conformational space to be performed during docking. A second key to Glide's efficiency is a series of "filters" that rapidly reduce the possible ligand positions and orientations in the search space to a manageable number for detailed examination. In addition, Glide uses a novel GlideScore function for scoring that ensures chemical sensibility by penalizing docked poses that include non-physical juxtapositions of polar and nonpolar groups.

In tests of docking accuracy, Glide achieves root-mean-square deviations between docked and co-crystallized ligand geometries that are half those reported for Gold and FlexX for test sets of 100-200 co-crystallized complexes defined by the developers of these methods. In addition, Glide achieves enrichment factors ranging from 12 to 91 in database screens for 9 diverse receptor systems. Such a high level of reliability is not typical of current-generation docking programs and scoring functions.



CINF 43 :  RACHEL: A new tool for structure-based lead optimization
Chris M.W. Ho, Drug Design Methodologies, LLC, 700 S. Euclid Ave., St. Louis, MO 63110

Abstract
Lead optimization is still something of an art. Structural modifications that logically should enhance affinity can decrease it. The time lines can be long, the process uncertain and frustrating, and the progress hit-or-miss. RACHEL is software designed to streamline lead optimization by automated combinatorial optimization of substituents on a lead scaffold. Starting from a ligand/receptor structure, substitutions are systematically done at user-defined points on the ligand core. Custom substituent databases based on in-house sources can be used, allowing the incorporation of enterprise and project experience. The impact of these substitutions on affinity is assessed using RACHEL's general scoring function or a custom scoring function generated by PLS analysis of user-supplied ligand/receptor affinity data. This presentation will discuss RACHEL's unique capabilities along with specific applications that demonstrate its value in lead optimization.



CINF 44 :  HostDesigner: a program for the de novo structure-based design of molecular receptors with binding sites that complement metal ion guests
Timothy K. Firman, and Benjamin P. Hay, W. R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, PO BOX 999, Richland, WA 99352, Fax: 509-375-6631, Timothy.Firman@pnl.gov

Abstract
To bring the powerful concepts embodied in de novo structure-based drug design to the field of coordination chemistry, we have devised computer algorithms for building millions of potential host structures from molecular fragments and rapid methods for prioritizing the resulting candidates with respect to their complementarity for a targeted metal ion guest. The result is HostDesigner, the first structure-based design software that is specifically created for the discovery of novel metal ion receptors. In this talk we describe the molecular structure building and scoring algorithms, and provide several examples to demonstrate their usage.



CINF 45 :  Collaborative eR&D - what is it and how do electronic notebooks fit into it ?
Rich Lysakowski Jr., The Collaborative Electronic Notebook Systems Association, 800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113, rich@censa.org

Abstract
"Collaborative eR&D" is a new computing paradigm for scientific research, engineering, product development, and testing. Collaborative eR&D has two major aspects to it: 1) collaborative software environments, and 2) cultural support for collaboration with these tools. The software infrastructure or environmental aspect of Collaborative eR&D is that software applications have standardized, intelligently self-integrating interfaces. Software components in this paradigm may require some configuration, but no extra programming, to integrate into new business processes. Integration becomes a dynamic, end-user driven process, rather than one that requires custom coding. The cultural aspect of Collaborative eR&D beckons R&D teams and enterprises to use collaborative tools (collaborative electronic notebooks, meeting tools, and others) to be more effective and efficient. This talk will define and explain CENSA’s new work beyond “Collaborative Electronic Notebooks” to catalyze the markets to deliver “Collaborative eR&D” environments and their huge impact on the practice and productivity of Research and Development.



CINF 46 :  Components of Research Laboratory Notebooks Policy
Sylvia C. Diaz, Knowledge Integration Resources, Bristol-Myers Squibb, P.O. Box 4000, Princeton, NJ 08543-4000, Fax: 609-252-6743, sylvia.diaz@bms.com

Abstract
Records Management has long been a core function in a Pharmaceuticals' management of records. The management of the research laboratory notebooks and its ancillary supporting data is essential for establishing priority of invention, uphold the validity of a patent, and memorializing scientific practices and work. A good laboratory notebook policy sets the boundaries for preparing, signing, witnessing, protecting and storing the research notebooks. A thorough policy establishes the fundamentals and standards of good records management practices for the storage of the paper, hardcopy version of the research notebook. These same principles translate in to the electronic laboratory notebook world.

This presentation will outline the essential parts of a good research laboratory notebook management policy.



CINF 47 :  An E-Notebook success story, a roadmap for future trips
Christopher J. Ruggles1, Jim Rizzi2, and Jorge Manrique1. (1) CambridgeSoft Corp, 100 CambridgePark Drive, Cambridge, MA 02140, Fax: 617-588-9190, cruggles@cambridgesoft.com, (2) Array BioPharma

Abstract
A successful Electronic Laboratory Notebook in a drug discovery company inventing new small-molecule drugs through the integration of chemistry, biology and informatics, has been deployed. We report a methodology where legal, technological, and scientific issues were addressed.

Through the use of directed discussion, needs analysis, and process abstraction, many seemingly insurmountable problems were resolved. The result is that a fully functional Electronic Notebook has been deployed throughout the enterprise, and is acting as the primary repository of scientific data for Array BioPharma Inc., dovetailing appropriately with preexisting protocols. We believe that this methodology, when properly applied, is scalable to organizations of varied sizes and complexities. We report here the results of our implementation of this methodology, and explore suggestions for modifications to optimize the methodology for future implementation.



CINF 48 :  LabBook incorporated's eLabBook knowledge management solution
Tom Tom Zupancic, LabBook, Inc, 2501 9th Street, Suite 102, Berkeley, CA 94710, Fax: 614-846-2243, thomas.zupancic@labbook.com

Abstract
LabBook's eLabBook solution is a flexible integration system designed to facilitate knowledge management within an organization by simplifying the processes required to access, capture, organize and manipulate information. This capability creates an environment within the organization where knowledge is generated at an enhanced rate and captured with a high degree of efficiency. The eLabBook environment provides a versatile computer interface between people and information so that it becomes much easier to create a layer of "knowledge" (understanding, interpreted information, rationales for decisions, actions and plans) and to superimpose this layer on an organized information collection. The accessible, user configurable presentation and delivery of this knowledge integrates the intellectual assets of the organization and accelerates knowledge transfer. That is, the system by design makes organized, interpreted information widely and effectively available and actionable.



CINF 49 :  Roundtable discussion focused on implementation successes and issues for collaborative electronic notebooks and collaborative eR&D environments
Rich Lysakowski Jr., Executive Director, Collaborative Electronic Notebook Systems Association, 800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113, rich@censa.org

Abstract
This last session will be a Facilitated Roundtable Discussion focused on implementation successes and open issues using electronic notebooks, collaborative applications, standardized software application interfaces, agents, component integration tools and frameworks to tie together the many software packages in common usage in constantly changing R&D environments. This roundtable discussion will raise issues, identify the problems and introduce prudent paths forward for their elimination. It will provide a panel of experts for the audience to get many of their questions answered.



CINF 50 :  CINF Division Business Meeting
Andrew Berks, Patent Dept, Merck & Co, RY 60-35, 126 E. Lincoln Ave, Rahway, NJ 07065, Fax: 732-594-5832, andrew_berks@merck.com

Abstract
This is the open meeting for discussion of CINF business.



CINF 51 :  Open Meeting: Committees on Publications and on Chemical Abstracts Service
Robert J. Massie, and Robert D. Bovenschulte, Director, Chemical Abstracts Service, American Chemical Society, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: (614) 447-3713, rmassie@cas.org, rbovenschulte@acs.org

Abstract
This is an open meeting for the Committee on Publications and for the Chemical Abstracts Service.



CINF 52 :  Development of a polymer property database from traditional print products
Maggie Johnson, Science and Engineering Libraries, University of Kentucky, 150 C/P Bldg, Lexington, KY 40506-0055, Fax: 859-257-4074, mjohnson@uky.edu, and Darla Henderson, John Wiley & Sons, Inc

Abstract
The polymer community has for years depended on the value and reliability of data found in The Polymer Handbook, a print product containing data about polymers and their properties. Moving forward with Wiley’s chemical databases, we have developed a polymer property database from The Polymer Handbook, adding features such as the capability to search by full text or fielded searches, search the entire database for properties by polymer name, and search the entire database for polymers by property ranges. Additionally, cross-reference and linking capabilities have been added. This presentation will focus on the development and useability of this database to the polymer academic and corporate communities.



CINF 53 :  Teaching and learning of strucural organic chemistry with nomenclature/structure software
Bert Ramsay1, Antony John Williams2, Andrey Erin2, and Robin Martin2. (1) Department of Chemistry, Eastern Michigan University, Ypsilanti, MI 48197, Fax: 734-487-1496, Bert.Ramsay@emich.edu, (2) Scientific Development, Advanced Chemistry Development

Abstract
Many organic chemistry students have difficulty in determining and "seeing" the configuration about a stereogenic carbon presented in 2-d structures. A true understanding comes when these diagrams are converted to 3-D pictures or models that can be rotated to correspond to the diagram's perspective. Much of this confusion can be avoided if students would use Nomenclature/Structure software programs to compare 2- and 3-D renderings and names of chemical structures. A Student Guide to the Use of Nomenclature/Structure software has been developed for inclusion with ACD's ChemSketch and ACD/Name software. The Guide also helps students recognize the location and naming of functional groups.



CINF 54 :  Homogenizing analytical data from multiple vendors into a unified workspace
Antony John Williams, Scientific Development, Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596, tony@acdlabs.com

Abstract
Today a plethora of analytical techniques are used to characterize a particular chemical compound or material as it migrates from research and discovery through scale-up to manufacturing. These techniques include the multiple forms of spectroscopy and chromatography, hyphenated techniques and other analytical techniques that produce “curves” including electrochemistry and thermal analysis. The lifecycle of any particular compound can originate with spectra to identify the structure, chromatograms to separate the material and other technologies to characterize its performance. To date it has not been possible to manage all this associated analytical data, together with associated chemical structure information, in a single unifying interface and the need for an integrated system for processing and management of all associated data persists. This talk will provide an overview of how to address the diverse needs in processing and data management for multiple forms of analytical data and make the results available across an enterprise.



CINF 55 :  Aventis Competitor Tracking Database
Christine Rudolph, DI & A Lead Generation Chemoinformatics, Aventis Pharma Deutschland GmbH, Industrial Park, Building G879, D-65926 Frankfurt/Main, Germany, Holger Heitsch, DI & A, Medicinal Chemistry, Aventis Pharma Deutschland GmbH, and Raul Munoz-Sanz, DI & A Information Solutions, Aventis Pharma Deutschland GmbH

Abstract
Aventis Competitor Tracking Database

A Competitor Tracking Database has been designed to facilitate and accelerate the task of disease program chemistry experts to track the activities of Aventis' competitors. The arduous task of extracting information from online-publications and transferring the interesting details (text and structure) by manual selection and putting them into report documents has been replaced by a simple flagging selection procedure of relevant competitor records in a central raw data pool which is fed by our selected news providers (currently: IDDB3, Prous).

The system has been designed to be flexible enough not also to store and annotate the information from various providers but also to store the knowledge about our own compounds such that we can inspect them in a common view with the structures of our competitors. Annotations by mechanism, target, and target families with controlled vocabularies enable us to link this data repository with other sources of information within Aventis.

Currently, this database covers the following Aventis Pharma Frankfurt disease programs: thrombosis, osteoarthritis, heart failure, vascular disease, arrhythmia, diabetes, obesity and lipid disorders. We estimate that we may be covering upto 80% of the relevant competitor information by this tool, expecting to include more information providers in the future.

This application has been designed with standard client/server tools (ISIS/Oracle). In a second phase, the content of the database will be made available through a web front-end to enable the integration into Aventis information portals.



CINF 56 :  Knowledge management in the spectral laboratory
Marie Scandone, Informatics Division, Bio-Rad Laboratories, Inc, 3316 Spring Garden Street, Philadelphia, PA 19104, Fax: 215-662-0585, marie_scandone@bio-rad.com, and Gregory M. Banik, Bio-Rad Laboratories, Informatics Division

Abstract
In a spectral laboratory, knowledge management is the identification, collection and active management of analytical information. The goal is to make existing knowledge resources available to everyone and the effective management of that data. In managing analytical data, we have moved from the archiving and warehousing of spectral data to tools that help identify and evaluate information. This approach is necessitated by the business need to effectively analyze all available data as rapidly as possible to facilitate decision-making and to provide required information for regulatory compliance. There has been strong impetus, especially from the pharmaceutical industry, to share information from diverse analytical disciplines. This need has arisen from the realization that escalating costs for drug development dictate a “fail early, fail often” new paradigm. Some companies have come to realize that parallel efforts in analytical chemistry, for instance, the use of NMR and Mass Spectrometry, could have yielded earlier, more cost effective decisions on drug candidates if these data types could have been combined earlier into a single knowledge management system. As the amount of spectral data increases, so does the need for accessing, processing, and examining that data.



CINF 57 :  Molecular docking for generating peptides inhibitors for thrombin
Cristina C. Clement1, Julian Gingold2, and Manfred Philipp1. (1) Chemistry Department, Lehman College and Biochemistry Ph.D. Program, City University of New York, 365 Fifth Avenue, New York City, NY 10016-4309, Fax: 212-817-1503, cclement_us@yahoo.com, (2) New Rochelle H.S

Abstract
A promising method of rational drug design involves the molecular modeling of peptides or small molecules that might bind to the active site of a target protein. The goal of this investigation is to discover peptides that reversibly inhibit thrombin. The approach combines in silico docking using Sculpt (from MDL) with automated chemical synthesis of candidate compounds using standard Fmoc chemistry. Initial molecular docking experiments were used to generate candidate compounds (with both L- and D- amino acids) that were characterized by predicted free interaction energies that range from –20 to -50 kcal/mol. Candidate competitive inhibitors were selected from two classes of sequences: X-Pro-Arg-dPro-Y and X-dPhe-Pro-dArg-Y. The experimental results showed that D-Phe-Pro-D-Arg-Gly-Asp and D-Phe-Pro-D-Arg-Gly-Asn have Ki values of 156 µM and 112 µM, respectively. D-Phe-Pro-D-Arg-Gly has a Ki of 6 µM. A library of tetrapeptides with other L- and D-amino acids at P1’ position (Y=P1’) is under study.

CLICK TO VIEW FULL SIZE IMAGE


CINF 58 :  Visualization of results in markush structure database searches
Andrew H. Berks, Merck & Co, 126 E. Lincoln Ave RY60-35, Rahway, NJ 07065-0900, Fax: 732-594-5832

Abstract
Visualizations of Markush structures in Markush database search results is problematic because results are often complex and difficult to interpret. This talk presents a method for representing Markush structures in database search results, involving overlaying a representation of the query structure on the search results, and providing a Markush analysis for each database hit, so that each substituent in the database record that corresponds to a part of the query structure is displayed in a distinctive manner, for example by using colors, in the overlaid query structure.



CINF 59 :  Digging Deeper: from holes in cards to whole structures - indexing chemistry at Derwent
Peter Norton, (retired), 17 Woodstock Road, Balby, DN4 0UF Doncaster, England

Abstract
This paper gives the author’s personal reminiscences about the trials and tribulations involved in the evolution of the various Derwent retrieval systems, beginning with the Farmdoc codes, which provided simple manual and punch card retrieval of Pharmaceutical and Veterinary patents. It then moves on to the extension of coverage to non-patent pharmaceutical literature (RINGDOC) and the various patent services (Agdoc, Plasdoc, Chemdoc, CPI and WPI). The paper concludes with the author’s involvement in the start-up of the Markush DARC graphics retrieval system.



CINF 60 :  Polymer searching: a capability in progress
Stuart M. Kaback, Information Research & Analysis, Research Support Services, ExxonMobil Research & Engineering Co, 1545 Route 22 East, Annandale, NJ 08801, Fax: 908-730-3230, stuart.m.kaback@exxonmobil.com

Abstract
From time to time this speaker has had the privilege of reporting to a session of the ACS Division of Chemical Information on the capabilities and shortcomings of systems used to search for information about polymers. One notable instance was the 1984 Herman Skolnik Award Symposium honoring Monty Hyams. Another was a 1991 symposium on Polymer Information Storage and Retrieval. This presentation examines progress that has been made, and points to areas in which further advances would be desirable.


CINF 61 :  Polymer indexing by IFI – past, present, and future
Harry M. Allcock, and Darlene Slaughter, IFI CLAIMS Patent Services, 102 Eastwood Road, Wilmington, NC 28403, Fax: 910-392-0240, allcock@ificlaims.com, darlene.slaughter@aspenpubl.com

Abstract
IFI has been indexing polymer chemistry in US patents since 1955, and since that time has developed a powerful retrieval system for polymers. Patent searchers currently use the IFI polymer indexing and associated search tools to locate polymers by structure, modification, and component monomers. Future enhancements to the system will be driven by searchers’ needs, and IFI’s intellectual and technological solutions to those needs.



CINF 62 :  Broadening horizons, sharpening the focus: The challenges of searching multiple datasets to obtain focused recall
Richard W Neale1, Steve Hajkowski2, Linda Clark3, and Gez Cross1. (1) Product Development Group Chemistry & Life Sciences, Derwent Information UK, 14 Great Queen Street, Holborn, London, United Kingdom, Fax: +44 207 344 2911, richard.neale@derwent.co.uk, (2) Online Training Department, Derwent Information, (3) IT R&D Group, Derwent Information UK

Abstract
The chemical industry continues to be one of the largest investors in R&TD. In today’s market place the R & TD budget can extend beyond the Ł1million per day value. It is therefore imperative that patented inventions are not duplicated. With R&TD spends continuing to spiral upwards the industry has become dependant on the provision of precise patent information to aid the development of effective R&TD strategies.

As the Information Professional’s requirements broaden, the information provider must evolve to meet the user needs. Searching chemical structure and text data in combination has become a necessity, for accurate retrieval and to limit results within larger databases. This paper will examine combination search approaches currently used in the chemical information industry and investigate how Thomson Scientific as an information vendor is developing future products and content with the combination search in mind.



CINF 63 :  Chemical patent indexing and Gresham's Law
Edlyn S. Simmons, SourceOne-Business Information Services, Procter & Gamble Co, 5299 Spring Grove Ave., Cincinnati, OH 45217, Fax: 513-627-6854, simmons.es@pg.com

Abstract
For many years, fragmentation coding was the gold standard of patent indexing. Fragmentation coding schemes, such as the one applied to Derwent's Chemical Patents Index, are applied to both specific and generic chemical structures and serve as keys to retrieval of documents through searches for chemical structures or substructures. By providing codes for structural fragments, they allow the searcher to find molecular structures rather than chemical names.

In recent years, value-added patent databases have been joined by many databases for which indexing is generated automatically from the original text. Gresham's Law tells us that bad money drives out good money. As searchers begin to substitute full text searching for the use of value-added indexing, they lose the capacity to search for chemistry expressed in Markush structures and other structural diagrams. If this is true, Gresham's Law may also tell us that bad indexing drives out good indexing.



CINF 64 :  Chemical structures and reactions in CAS databases – searching for prior art
Matthew J. Toussant, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-150, Fax: 614-447-3906, mtoussant@cas.org

Abstract
Chemical information in CAS databases takes many forms. Structure information is one form that links many databases through a connection table identifier system, the CAS Registry Number. The nature of prior art information in the CAS Registry, CASREACT, and CHEMCATS databases will be described, and the pivotal role of the CAS chemical identifier system in linking those collections will be detailed. Further, the MARPAT database will be examined. CAS approaches to covering chemical information and the effect of these approaches on efforts to create exhaustive prior art collections, including from patents, journals, chemical supply catalogs, and web disclosures, will be assessed.



CINF 65 :  Biotechnology patent searching: past, present and future
Sandy Burcham, Service Is Our Business, Inc, 111 Lincoln Terrace, Norristown, PA 19403-3317, Fax: 610-630-0863, cass123@earthlink.net

Abstract
In the last 2 decades, the importance of biotechnology has increased dramatically, moving from straightforward enzyme catalysed reactions to the complexities of the human genome project. Similarly, the application of biotechnology has spread from simple fermentation processes to many complex previously non-biological technologies. During this time the number of biotechnology patents has also increased dramatically as organisations have sought to protect their research and discoveries.

To cope with the increasing importance of biotech and the increasing volume of patent and journal literature, various abstracting and indexing services together with software suppliers and online hosts, have developed resources providing increasingly powerful retrieval and display capabilities.

This paper will discuss the searching of biotech patents - where we were, where we are and where we seem to be going.



CINF 66 :  Back for the future: making coding cool
Gez Cross, and Katharine Hancox, Product Development Group Chemistry & Life Sciences, Derwent Information UK, 14 Great Queen Street, Holborn, London, United Kingdom, Fax: +44 207 344 2911, gez.cross@derwent.co.uk

 Abstract
The chemical indexing systems introduced at Derwent by Peter Norton have been used for many years by Information Professionals to retrieve chemical information from the patent literature. When structure indexing of Markush compounds was made available, discontinuation of the structural codes was proposed – and strongly opposed by professional searchers. However, despite the introduction of software to help generate the strategies for searching these codes, they remain a tool used mainly by experienced, professional patent searchers.

With the advent of inhouse and online browser based information retrieval tools, a new generation of information users has arisen – scientists, who formerly relied on IPs for their searching requirements. To encourage these new users, intuitive, user-friendly interfaces have been created, which have further raised the expectations of both old and new users. This paper will examine attempts to bring the older, code-based systems into the internet era with new user-friendly tools and interfaces.



CINF 67 :  Developing HT Information Systems, a modular design
Steve Coles, Database Applications Developer, Tripos Receptor Research, Bude-Stratton Business Park, Bude EX23 8LY, United Kingdom, Fax: +44 1288 359222, stcoles@tripos.com  

Abstract
It is possible to develop information systems for high-throughput design, chemistry, analysis and purification by incorporating a modular approach using best of breed scientific and information technologies. Working iteratively in close collaboration with users of the system it is possible to streamline integration projects, reconcile process issues, and provide customer-facing support. A modular approach encapsulates domain knowledge, permits easier introduction of new modules and increments, and can be shared between different applications



CINF 68 :  Automating Library Design
Mark J. Duffield, and Kevin Daniels, EST Lead Informatics, AstraZeneca R&D Boston, 35 Gatehouse Drive, Waltham, MA 02451, Fax: 781-839-4580, mark.duffield@astrazeneca.com

 Abstract
The library design process is generally performed differently by every participant. Each chemist has a number of "favorite" parameters with which to evaluate a potential library. The process usually involves a large number of manual steps including the reformatting, collating, and integration of data from disparate sources. This process is time consuming and requires the chemist to perform complex computing tasks, often across multiple environments. The end result is that the chemist must spend significant time away from the bench planning their library.

This session will summarize our work in the area of streamlining the library design process through automation. We will describe our library design workflow and present the details of how we have automated many of the steps in the process. The chemist is now able to get the computational aspects done side by side with the actual synthetic work, while maintaining control over the end result.



CINF 69 :  On a new model for cheminformatics: Learning the classes of compounds
Dmitry Korkin, Faculty of Computer Science, University of New Brunswick, 540 Windsor St., Fredericton, NB E3B 5A3, Canada, dkorkin@unb.ca

 Abstract
We have outlined a radically new approach to cheminformatics called ChemETS model. It is based on the first general formalism for structural (or symbolic) object representation and classification proposed by us, called the evolving transformations system (ETS) framework. The main central features of the ETS framework are: 1) a new structural form of class representation that can be constructed (and modified) inductively and 2) a new structural form of object representation, which incorporates the constructive (or synthetic) history of object and is directly related to the above representation of the corresponding class of objects (containing this object).

I will, first, outline the basic principles of the ChemETS model, together with the central problem of inductive approach to cheminformatics and computer-aided drug design (CADD). Then, I will discuss the application of the ChemETS model to the basic problems in cheminformatics and CADD, such as virtual lead discovery, design and screening of virtual combinatorial libraries of compounds, and others. In particular will be discussed: construction of the class of androgene-like compounds (based on a small set of known androgenes), construction of the new androgene-like compounds (based on the above class representation), and the resulting classification of compounds as either belonging or not to this class.



CINF 70 :  Choosing the proper grid resolution for cell-based diversity estimation
Dmitrii N. Rassokhin, and Dimitris K. Agrafiotis, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Drive, Exton, PA 19341, rassokhin@3dp.com

 Abstract
Although cell-based methods are becoming increasingly popular for diversity analysis, the choice of grid resolution is still guided primarily by intuition and lacks any theoretical or empirical support. Here we present a systematic analysis of several typical chemical data sets, and propose a simple technique for identifying a suitable bin size for cell-based diversity estimation using an algorithm inspired from the field of fractal analysis. We demonstrate that the relative variance of the diversity score as a function of resolution exhibits a characteristic bell shape that depends on the size, distribution and dimensionality of the data set under consideration, and whose maximum represents the optimum resolution for a given data set. Even though box counting can be performed in an algorithmically efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.



CINF 71 :  Quantification of drug-likeness and similarity for combinatorial follow-on libraries
Mark J. Rice, Ryan T. Weekley, and Paul A. Sprengeler, Structural Group, Celera Therapeutics, 180 Kimball Way, South San Francisco, CA 94080, Fax: 650-866-6654, mark.rice@celera.com

 Abstract
Striking a balance between good physical properties and similarity to the initial hit often poses a problem in the design of follow-on libraries. Good physical properties are needed to improve both ADME characteristics and drug-likeness, while similarity is needed to maintain an adequate pharmacophore for binding. These requirements are often at odds and difficult to quantify. Therefore, we have developed a site-specific fingerprint based on chemical graph theory as a basis for sidechain similarity. We have also developed a continuous drug-likeness metric, using multivariate statistical analysis. We combine these measures to suggest sidechain selection and more efficiently develop follow-on libraries.



CINF 72 :  Predicting generic methods and retention times for high-throughput chromatography
Daria Jouravleva, Scott Macdonald, Michael McBrien, and Eduard Kolovanov, Advanced Chemistry Development, Inc, 90 Adelaide St.West, Suite 600, Toronto, ON M5H 3V9, Canada, daria@acdlabs.com

 Abstract
In experimental validation of combinatorial libraries, speed and high-throughput are the key. For chromatographic separation or LCMS of the newly synthesized compounds, generic chromatographic methods have been designed to accommodate a widest possible diversity of samples. However, when the sample is not suited to the method, costly instrument downtime slows the analytical process, and often results in rejection of the whole plate or series of compounds. New ACD/ChromGenius software will advise if methods are viable, and select between available multiple methods. This presentation describes retention time and method selection algorithms used to power the new software computational tool, as well as physicochemical parameters used to model the chromatographic separation.



CINF 73 :  Copyright and the EU Database Directive: Issues for chemistry
John R. Rumble Jr., Office of Measurement Services, National Institute of Standards and Technology, 100 Bureau Drive MS 2310, Gaithersburg, MD 20899-2310, Fax: 301-926-0416, john.rumble@nist.gov

 Abstract
The computerization of scientific information continues to change the scientific communication process. As we approach the end of the first decade of the Internet era, ownership issues still loom large with respect to the communication process itself as well as the economics of the process. In this presentation, I will review some of the issues related to traditional ownership of authored material (copyright) as well as new ownership rights (sui generis) as created by the European Union. Both rights are under review, and possible changes could affect the communication process in many ways. This talk also provides an introduction to more detailed talks on this subject later in this session.



CINF 74 :  Pressures on the public domain in scientific data and information
Paul F. Uhlir, Office of International S&T Information, The National Acacemies, 2101 Constitution Avenue NW, Washington, DC 20418, Fax: 202-334-2231, puhlir@nas.edu

 Abstract
The public domain in scientific and technical data and information (STI) is massive and has played a major role in the success of the research enterprise in the United States. The "public domain" may be defined in legal terms as sources and types of data and information whose uses are not restricted by statutory intellectual property regimes or by other legal constraints, and that are accordingly available to the public without authorization. Various legal, economic, and technological pressures in recent years have narrowed the scope of the public domain in STI, with poorly understood and perhaps significantly under-appreciated consequences to our nation's preeminent research capabilities. This presentation will discuss the background of public-domain information in research and review some of the many constraints that are being placed on open access to and use of such resources.



CINF 75 :  IPR and modern scientific society publishing
Eric S. Slater, Publications Division, Copyright Office, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112, e_slater@acs.org

 Abstract
This presentation will provide basic information about United States Copyright Law and its application to modern scientific publishing. Included will be the major issues surrounding publishing today such as protecting content against piracy, protecting works that appear online, and how recent court decisions have shaped the copyright landscape.



CINF 76 :  Copyright and the information industry
Dan Duncan, Executive Director, NFAIS, 1518 Walnut Street, Suite 307, Philadelphia, PA 19102, Fax: 215-893-1564, danduncan@nfais.org

Abstract
A review of major developments in copyright and related law, with particular emphasis on U.S. activities, that are of special importance to informaton database providers. The presentation will focus on how policy developments may affect the delivery and use of online information databases.



CINF 77 :  Database protection and academic research
Harlan J. Onsrud, Department of Spatial Information Science and Engineering, University of Maine, 5711 Boardman Hall, Room 340, Orono, ME 04469-5711, Fax: 207-581-2206, onsrud@spatial.maine.edu

 Abstract
Many economic and legal scholars argue that the current, relatively open, access to data environment in the United States is beneficial to advancing knowledge and the economy. If so, the traditional method of scientific advancement by extending from and building upon the data and works of others may be substantially burdened if the U.S. moves to a database protection legal environment similar to that instituted recently throughout much of Europe. This talk explores evidence to date of the effect of the European Database Directive including its effect on scientific and technical databases. Provisions of the Directive and the implications for expanding or constraining scientific discourse are discussed. Likely responses of the scientific community to similar legislation in the U.S. are hypothesised. Several alternatives for working around such a default law are suggested and several illustrative examples already being pursued are highlighted.



CINF 78 :  An academic chemist looks at copyright
S. Scott Zimmerman, Department of Chemistry and Biochemistry, Brigham Young University, C205 BNSN, Provo, UT 84602-5700, Fax: 801-422-5474, scott_zimmerman@byu.edu

 Abstract
Most academic chemists think little about copyright issues. They treat copyrighted materials like their mentors and colleagues do, often without questioning the legality of their actions. But academicians should know the answers to a few common copyright questions, for example: Can I photocopy book chapters and research papers for my personal files? Can I photocopy these materials, include them in a course packet, and pass them out to my classes? Can I use copyright materials in my PowerPoint presentations at meetings and in classes? When my students write a paper describing research done in my laboratory, who owns the copyright? Can my students publish research results in theses and dissertations, and then publish the same materials in a journal? If I prepare and publish a graph in a journal article, can I re-publish the same graph in another journal or review article? Can I post my published research papers on my Web page? This presentation will try to answer these and other questions about copyright in academia.



CINF 79 :  Integration of Combinatorial Chemistry Analyses with Other Relevant Information
Jeff Saffer, OmniViz, Inc, Two Clock Tower Place, Suite 600, Maynard, MA 01754, saffer@omniviz.com

 Abstract
Today's chemist deals with very large collections of information from diverse sources. Integration of the analysis of textual information (patents or scientific literature), high throughput screening results, structures, descriptors and fingerprints is prerequisite for the comprehensive understanding required for improved decision-making. One of the best instruments for this integration is the human mind, but this can only be fully engaged when the diverse information is presented in a context that is easy to assimilate. To this end, we have developed a visualization framework that integrates analysis of experimental and computational data with conceptual analysis of textual information while maintaining the data in the context in which it was generated. Tools enabling exploration across the multiple data types and detailed exploration within specific data types increase understanding and decrease the time required to reach decisions. The application of these approaches to very large (hundreds of millions of data points) chemistry data sets will be discussed in the context of discovery research.



CINF 80 :  Barriers to effective integration in chemical experiment management software
J. Christopher Phelan, Marketing, MDL, 1550 Bryant St., Suite 739, San Francisco, CA 94103, Fax: 415-252-8610, phelan@mdli.com

Abstract
During the past twenty years, computers have become ubiquitous in chemical research, for instrument control and for data collection, management, and analysis. However, despite a pressing need, general software solutions that integrate these functions are not yet widely available. We present an analysis of several significant obstacles to the implementation of effective integrated software solutions in the chemical experiment management arena. Specific topics will include: compartmentalization of domain specific expertise, lack of a consistent data model for chemical information beyond simple structure data, idiosyncratic workflows in the research environment, and complexity issues in design and architecture.



CINF 81 :  Application of statistical design tools for improved efficiency in chemistry development for high-throughput parallel synthesis
Jean E. Patterson, and Robb Nicewonger, Department of Library Optimization, ArQule, 19 Presidential Way, Woburn, MA 01801, Fax: 781-994-0677, jpatterson@arqule.com

Abstract
Although there are multiple techniques to select structurally diverse subsets of virtual library products, there remains a need for a practical method to identify reagents that represent the range of reactivity needed to build a library. Chemical intuition has been the predominant driver for selection of such reagents, but it has a number of shortfalls. Chemical intuition is not consistently predictive, it is not an automated process, and it is not possible to quantitatively describe the process to enhance the chemistry development of future projects. This presentation will focus on ArQule’s statistics-based approach to the selection of experimental test reactions using a commercially available software package from Umetrics. Identification of chemical descriptors that most closely describe reagent reactivity using multivariate statistics followed by experimental design techniques to choose a diverse sampling of reagents that represents the reactivity of the entire virtual library will be described.



CINF 82 :  Library design using multi-dimensional SAR analysis: Incorporating structure-based predictions
Carleton Sage, Kevin Holme, and Manish Sud, Cheminformatics Research, LION Bioscience Inc, 9880 Campus Point Drive, San Diego, CA 92121, carleton.sage@lionbioscience.com

 Abstract
After screening results are available for a compound library, SAR analysis is often used to determine which R-Groups add favorably to activity. After a chemical core and R-groups positions are specified, SAR analysis involves identifying R-Groups and generating a SAR table. We have implemented a system that takes this analysis one step further. In addition to activity data, we have integrated structure-based models to predict the ADME and specificity properties of compounds and have developed methods to simultaneously consider multiple properties in R-Groups analysis. A critical component of these analyses is the number and weighting of the properties when they are combined and how changes in these parameters affects the final prioritization of compounds and R-Groups. We will present results from using different strategies for simultaneous parameter combination.



CINF 83 :  Use of recursive partitioning/simulated annealing (RP/SA) for mining combinatorial libraries
Paul Blower, LeadScope, Inc, 1245 Kinnear Rd, Columbus, OH 43212, pblower@leadscope.com, and Petr Kocis, Enabling Science & Technology, Chemistry, AstraZeneca R&D Boston

 Abstract
Recursive partitioning is a powerful tool for mining large, diverse data sets encountered in drug discovery. It is useful for explaining a complex, nonlinear response, and it can handle very large descriptor sets with continuous, discrete, or categorical variables. At each node, we use simulated annealing to optimize several variables simultaneously and find good combinations of descriptors. The search is incorporated into a recursive partitioning design to produce a regression tree on the space of descriptors. We used RP/SA for mining combinatorial libraries to identify combinations of structural features and reaction parameters that give superior yields. In this talk, we will describe statistical techniques used in this new method and illustrate its application in mining a combinatorial librar