ACS Washington, DC, 2009 - CINF & Related Abstracts

Please note: abstracts are provided "as is" by ACS from their OASYS registration system.

Chemical Information Division Abstracts


CINF 1
PDF

U.S. EPA computational toxicology programs: Central role of chemical-annotation efforts and molecular databases

Ann M. Richard1, richard.ann@epa.gov, Maritja A. Wolf2, wolf.marti@epa.gov, ClarLynda R. Williams- Devane3, williams.clarlynda@epa.gov, and Richard Judson1, judson.richard@epa.gov. (1) National Center for Computational Toxicology, U.S. EPA, Research Triangle Park, NC 27711, (2) Lockheed Martin, Contractor to the U.S. EPA, Research Triangle Park, NC 27711, (3) National Health & Environmental Effects Research Lab, U.S. EPA, Research Triangle Park, NC 27711

EPA's National Center for Computational Toxicology is engaged in high-profile research efforts to improve the ability to more efficiently and effectively prioritize and screen thousands of environmental chemicals for potential toxicity. A central component of these efforts involves the construction and integration of toplevel chemically indexed, structure-searchable databases of historical toxicology data, including: 1) high quality structure-annotated toxicity data files from public resources (DSSTox); 2) a relational database of detailed toxicology studies from EPA regulatory programs (ToxRef DB); and 3) a publicly available relational database broadly spanning chemical data resources pertaining to environmental toxicology on the Internet (ACToR). Challenges of chemical structure annotation and indexing of public resources broadly pertaining to environmental toxicology will be described, highlighting DSSTox, ACToR, and recent efforts to annotate and publish chemical (treatment)-experiment index files for the primary public microarray database repositories, GEO and ArrayExpress. This abstract does not necessarily reflect EPA policy.


CINF 2

Linking public and commercial chemical data: ChemSpider and SureChem

Nicko Goncharoff, nicko.goncharoff@surecheminc.com, SureChem, Inc, 2255 Van Ness Avenue, Suite 101, San Francisco, CA 94109

The scientific community is calling for better integration of public and commercial databases, particularly in chemistry. This presentation discusses the linking of ChemSpider, the Open Access internet chemistry database, with SureChem, a proprietary online structure-searchable patent database. Topics will include a technical review of the integration, use of chemical identifiers to ensure consistent searches across both databases, the end user search experience and benefits to the scientific community.


CINF 3
PDF

Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

A J Williams, tony@chemspider.com, ChemZoo, 904 Tamaras Circle, Wake Forest, NC 27587

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles call now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.


CINF 4
PDF

Turning mining inside out

Colin R Batchelor, batchelorc@rsc.org, Royal Society of Chemistry, Thomas Graham House, Milton Road, Cambridge CB4 0WF, United Kingdom

The Royal Society of Chemistry now has several years of experience of identifying chemistry through text mining, and combining this with editorial QA and publishing standards to enhance our publications. We are using our award winning project RSC Prospect to show some of the benefits of applying new standards such as ontologies and the InChI to our journal articles. By apply them specifically to our areas of chemical science publishing, we have added a layer of semantic enrichment to articles that enable them to be found and link to other sources whether other publication, databases or reference information. The movement to use the results of traditional text mining, done on a limited of internal set of documents, to the wider scientific information world through the application of standards, will be hugely significant for the publication and use of scientific information in the years to come.


CINF 5

Chemreader: A tool for extracting chemical structure information from digital raster images

Jungkap Park, jungkap@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan, Department of Mechanical Engineering, 3211 EECS, 2350 Hayward Street, Ann Arbor, MI 48109, United Kingdom, Kazu Saitou, kazu@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan, Department of Mechanical Engineering, 3211 EECS, 2350 Hayward Street, Ann Arbor, MI 48109, Kerby Shedden, kshedden@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan, Department of Statistics, 461 West Hall, 1085 S University, Ann Arbor, MI 48109, and Gus R. Rosania, grosania@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan College of Pharmacy, Department of Pharmaceutical Sciences, 428 Church Street, Ann Arbor, MI 48109

Annotation of virtual libraries of small molecules ultimately involves linking entries in a database to relevant patents and research articles. Chemreader is a machine vision tool to automate conversion of chemical diagrams in analogue images into standard chemical file formats. Chemreader builds on advances in chemical object character recognition made over the past fifteen years. To facilitate database annotation, algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams are run independently and in sequence -so that input parameters can be tailored to a desired chemical database annotation scheme. Introduction of a chemical spell-checker functionality can be used to automatically assess errors in Chemreader's output. Furthermore, pre-processing filters can be used to eliminate sub-standard (ie. low resolution or noisy) images from the input. For a database annotation task, Chemreader can be adjusted at a user-defined level of accuracy, to optimize the relevance or number of useful links.


CINF 6
PDF

Exploiting a hidden treasure: Automated chemical entity recognition in Chemisches Zentralblatt

Valentina Eigner-Pitto, ve@infochem.de, Heinz Saller, and Peter Loew, InfoChem GmbH, Landsberger Strasse 408, Munich 81241, Germany

The German publication Chemisches Zentralblatt was the first chemistry abstract collection in history starting in 1830 and contains 140 years of research progress in chemistry and chemical knowledge. Modern scan- and OCR-software technology was utilized to make the entire content of this unique reference work available for full-text retrieval, but a solution offering chemical structure search seemed to be unfeasible as this work is written in German, the original document quality is not consistent, and numerous obsolete compound names occur. This talk describes our approach to identify and extract chemical compounds automatically from the text and convert them into a structure database. The process is based on the systematic training and enhancing of the OCR, the Annotation and the Name-to-Structure process using specifically developed German dictionaries. A web-based prototype application is implemented providing structure, substructure and similarity search with the hits linked back directly to the original pages of Chemisches Zentralblatt


CINF 7

NIH public access policy

Neil M. Thakur, Special Assistant to the NIH Deputy Director for Extramural Research, National Institutes of Health, One Center Drive, Building One - Room 140, Bethesda, MD 20892-0152, Fax: 301- 402-3469

This presentation will provide an overview of the NIH Public Access Policy and compliance strategies for NIH authors and awardees. It will describe key policy details, methods to submit papers in compliance with the Policy, and methods to document compliance with the Policy. Time will be allotted for audience comments and questions.


CINF 8

Three revolutions

George O. Strawn, Office of Information and Resource Management, National Science Foundation, Arlington, VA 22230

The digital computer has spawned many important developments, one of which, electronic communication of scholarly information, may be maturing at this time. This talk will compare electronic communication of scholarship with two other computer-related disruptions, review recent developments in this area, and speculate on its importance for the future of science and engineering. In addition to this historical perspective, NSF policies and activities will also be described.


CINF 9

STM publishers and author rights

Eric S. Slater, e_slater@acs.org, Publications Division, Copyright Office, American Chemical Society,

1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112

This session will feature discussion pertaining to author rights, and ways that ACS and other STM publishers are addressing this broad issue. Many publishers that require transfer of copyright grant a number of rights back to authors and their employers; however, there is a perception that publishers don't do this. The reality is publishers are more “generous” than what is perceived, and achieve this in a positive way by carefully attempting to balance its own rights as copyright holder with the rights of authors and the user community at large.


CINF 10

Perils of parallel publishing systems: The ramp-up of institutional and subject-matter repositories and potential impact on journal subscriptions

Mark Seeley, Legal Department, Elsevier, 30 Corporate Drive - Suite 400, Burlington, MA 01803, Fax: 781-313-4814

The copying and sharing of journal articles by authors has been accepted by journal publishers, either expressly (journal publishing agreements which permit sharing/posting of some versions of articles) or implicitly (by lack of enforcement or objection) for many years. Generally journal publishers accept that scholars need to quickly share their work with colleagues and researchers in the field, as part of the “informal” communication systems, and the view has been that the risk to traditional subscription and “pay by the drink” document delivery businesses from such informal sharing is low to moderate. The development of major pre-print servers such as arXiv.org, subject repositories run by funding agencies such as NIH (the Public Access database in PMC), and institutional repositories run on a systematic basis (with mandates such as those proposed at Harvard, MIT, Southampton and others), raises the question about whether such repositories and the versions of articles posted on such sites, are “good enough” to serve as compelling substitutes for journal subscriptions and individual article sales.


CINF 11

Trends in rights management: A copyright clearance perspective

Edward Colleran, Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, Fax: 978-646- 8600

Trends in the Use of Copyrighted Works-a Copyright Clearance Center Perspective.


CINF 12
PDF

Online chemical modeling environment: database

Sergii Novotarskyi1, Iurii Sushko1, Robert K_rner1, Anil Kumar Pandey1, and Igor V. Tetko2, itetko@vcclab.org. (1) Helmholtz Zentrum Muenchen German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology, Ingolstaedter Landstrasse 1, Neuherberg D-85764, Germany, (2) Helmholtz Zentrum Muenchen German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology, Ingolstaedter Landstrasse 1, Neuherberg 85764, Germany

The main goal of our database http://qspr.eu is to collect, store and manipulate chemical data with the purpose of their use for model development (see our presentation at COMP). It's main features, that distinguish it from other available databases include 1) the database is open and it is based on Wiki-style principles. We encourage users to submit data and to correct inaccurate submitted data; 2) the database is aimed at collecting high-quality data. To achieve this we require users to submit references to the article, where the data was published. The reference may include the article name, journal name, date of publication, page number, line number, etc. 3) Since the compound properties may vary depending on the conditions, under which they were measured, we store the measurement conditions with the data to provide the users with more accurate information about each data point. The examples of the use of the database within national and EU projects will be exemplified.


CINF 13

Public molecular databases: How can their value be increased by generation of additional data in silico?

Vladimir V. Poroikov1, vladimir.poroikov@ibmc.msk.ru, Dmitry Filimonov1, dmitry.filimonov@ibmc.msk.ru, and Marc C. Nicklaus2, mn1@helix.nih.gov. (1) Russian Academy of Medical Science, Institute of Biomedical Chemistry, Pogodinskaya Str., 10, Moscow 119121, Russia, Fax: 007-499-245-0857, (2) Laboratory of Medicinal Chemistry, National Cancer Institute, National Institutes of Health, Frederick, MD 21702

Public molecular databases (PubChem, ChemIDplus, NCI, DSSTox, ChemBank, PDSP, libraries of commercially available samples, etc.) contain data on compounds' identifiers, structural formulae, physicochemical characteristics, biological activities, etc. In addition to experimentally determined properties, various calculated data (log P, number of H-donors and acceptors, drug-likeness, biological activity spectra, a.o.) are also included in some databases. Since for the majority of compounds in public molecular databases numerous experimentally determined parameters are unknown, the question arises: Could one generate the missing data in silico to obtain the total data profile for each molecule? Due to the continued progress in both accuracy of computational methods and performance of computer techniques, this task looks quite realistic for the near future. However, the possibility of transforming such calculated data into useful information and thence into true knowledge depends not only on the accuracy of the data itself but also on the ability of the end users to perceive these data. Thus, the increase in value of calculated data could be achieved through the development of intelligent computational-informational systems, which should combine reliable computational methods with explanatory scenarios suitable for solving practical tasks. We discuss the possibilities and limitations of creating such systems.


CINF 14

Chemical space management of large libraries for new active small molecules selection for prostate cancer treatment

Andrew V. Scorenko, avs@iihr.ru, Computational Chemistry, Chemical Diversity Research Institute, Rabochaya St. 2-a, Khimki, Moscow Region 141400, Russia, Fax: +7-495-626-9780, Andrei A. Gakh, gakhaa@ornl.gov, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6242, Andrey V. Sosnov, sva@iihr.ru, Chemical Diversity Research Institute, 2a Rabochaya St., Khimki, Moscow Region 114401, Russia, and Mikhail Yu. Krasavin, MedChem, Chemical Diversity Research Institute, Rabochaya St. 2-a, Khimki, Moscow Region 141400, Russia

Identification of primary hit compounds through large databases (over million compounds) and their further optimization into high-content series in the early drug discovery are described in this work. We carry out permanent global monitoring of new biotargets and active compounds related to cancer treatments, modify our bank of compounds and use them for effective drug development and chemical space management. We use available approaches for effective selection of new small molecules for prostate cancer treatment. More effective biotargets were generally selected, classified and studied. Base on the knowledge, we developed computational algorithms of bioisosteric transformation rules, comprehensive chemical space filters, ligand-based model search, etc. The selected massive was classified by chemotypes. The final selection was performed by ORNL specialists. As a result in this work about 5,000 compounds were recommended for high-throughput screening. About 300 compounds were identified as active and went to further optimization. This research was supported by the Global IPP program through the International Science and Technology Center (ISTC). Oak Ridge National Laboratory is managed and operated by UT-Battelle, LLC, under U.S. Department of Energy contract DE-AC05-00OR22725. This paper is a contribution from the Discovery Chemistry Project.


CINF 15

Crowdsourcing nonaqueous solubility and synthesis using Open Notebook Science

Jean-Claude Bradley1, bradlejc@drexel.edu, Khalid Mirza1, Rajarshi Guha2, rguha@indiana.edu, Andrew Lang3, gameshoncho@hotmail.com, and A. Williams4, antony.williams@chemspider.com. (1) Department of Chemistry, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, (2) School of Informatics, Indiana University, 1130 Eigenmann Hall, 1900 E 10th Street, Bloomington, IN 47406, (3) Department of Mathematics, Oral Roberts University, 7777 S. Lewis Ave, Tulsa, OK 74171, (4) ChemZoo Inc, 904 Tamaras Circle, Wake Forest, NC 27587

The use of Open Notebook Science to collect and make publicly available the solubility measurements of aldehydes, primary amines and carboxylic acids will be described. This involves the real time sharing of all experiments and associated raw data by a community of collaborators who are geographically distributed and may have never communicated using channels other than this project. Monthly cash prizes were awarded to participating students by means of the ONS Challenge Submeta Awards. The laboratory notebook pages are recorded on a public wiki and the solubility measurements, including relevant calculations, are stored in public Google Spreadsheets. A combination of ChemSpider, the GoogleDoc visualization API and web services is used to enable flexible searching and display of desired subsets of the data. The utility of the project will be illustrated by exploring optimal solvent selection for a Ugi reaction.

CINF 16

ChemXSeer: A cyberinfrastructure for environmental chemical kinetics

Karl T. Mueller1, ktm2@psu.edu, William J. Brouwer1, wjb19@psu.edu, C. Lee Giles2, clg20@psu.edu, Prasenjit Mitra2, pum10@psu.edu, and Carl Lagoze3, lagoze@cs.cornell.edu. (1) Department of Chemistry, Penn State University, 104 Chemistry Building, University Park, PA 16802, Fax: (814) 863- 8403, (2) College of Information Sciences and Technology, Penn State University, 104 Information Sciences and Technology Building, University Park, PA 16802, (3) Computing and Information Science, Cornell University, 301 College Avenue, Ithaca, NY 14850

A main goal of interdisciplinary geochemistry and environmental chemistry research at Penn State has been the integration of experimental, analytical, and simulation results from the molecular to the field scales. Such an undertaking requires the synthesis of large amounts of data, especially those data related to measurements and modeling of chemical kinetics. We will report here on our development of the ChemXSeer architecture as a portal for academic researchers in the area of environmental chemical kinetics, which integrates the scientific literature with experimental, analytical and simulation datasets. ChemXSeer (chemxseer.ist.psu.edu) offers unique aspects of search not yet present in other scientific search services: for example, we will demonstrate tools for the extraction of tables, figures, equations and formulae from scientific documents. Included in ChemXSeer are searchable databases of computational results on molecular and surface structures using a number of methods (DFT, molecular dynamics, Monte Carlo, etc.). Future directions will include the deployment of oreChem, the chemistry domain implementation of the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) Project, a developing standardized, interoperable, and machine-readable methodology to express information about compound information objects on the web. oreChem will provide a model to aggregate documents, data, and metadata in chemistry.


CINF 17

Mining a large reaction database with name reaction patterns

Matthew A. Kayala1, mkayala@ics.uci.edu, Qian-Nan Hu1, qhu@uci.edu, Jonathan H. Chen1, chenjh@uci.edu, James S. Nowick2, jsnowick@uci.edu, and Pierre Baldi1, pfbaldi@uci.edu. (1) Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697, (2) Department of Chemistry, University of California, Irvine, 4126 Natural Sciences 1, Irvine, CA 92697-2025

Over the past several years, comprehensive data sets of small chemical compounds, such as our own ChemDB (http://cdb.ics.uci.edu), have been made publicly available for statistical analysis and data mining purposes. However, access to reaction data resources is comparatively restricted. With data largely unavailable, how to approach knowledge discovery in reaction databases is an open question. One potential method for data mining is to classify reactions using pattern matching rules. We present initial results on mining 2,000,000+ well-annotated reactions from a database of published reactions (SPRESI). Here, we have hand-composed 500+ SMIRKS language patterns to cover 306 common `Name Reactions'. The rules provide a broad classification of the database into a small number of classes based on net structural changes. To facilitate future research, a tool to classify reactions using the patterns has been made available as part of ChemDB.


CINF 18

Reliable reactions and stable structures

Jonathan M Goodman, J.M.Goodman@ch.cam.ac.uk, Unilever Centre for Molecular Science Informatics, Cambridge University, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362

Databases are not useful unless they are reliable. Public databases have the potential to be more reliable than ones that are protected by carefully designed license conditions, because it may be possible to check the data and publish the results without the constraint of restrictive intellectual property agreements. I will describe ways in which we look at the reliability of databases and try to improve the quality of chemical information. The small on-line molecular databases available on our website have been used to test the utility of InChI-based descriptions of molecules and how they can be effectively searched and retrieved.


CINF 19

Copyright and student theses: Challenges of the modern world

Kevin P. Gable, kevin.gable@oregonstate.edu, Department of Chemistry, Oregon State University, 153 Gilbert Hall, Corvallis, OR 97331-4003, Fax: 541-737-2062

Migration of publishing and distribution to electronic media has a number of important implications for preparation and publication of student theses. This presentation will frame the challenges from the perspective of the academic interest in training the student.


CINF 20

Implementing an open access policy at Trinity University

Steven M. Bachrach, sbachrach@trinity.edu, Department of Chemistry, Trinity University, 1 Trinity Place, San Antonio, TX 78212, Fax: 210-999-7569, Jorge Gonzalez, Department of Economics, Trinity University, 1 Trinity Place, San Antonio, TX 78212, and Diane Graves, Coates Library, Trinity University, 1 Trinity Place, San Antonio, TX 78212

Ever-increasing costs of journals, particularly in the STM fields, have caused a decreasing ability of scientists and laymen to access needed information. The Open Access initiatives were developed in part to create universal access to the publications. Following on the lead of the Faculty of the School of Liberal Arts and Sciences at Harvard University, the talk presents how Trinity University is maneuvering to adopt a similar policy. This policy will require Trinity faculty to assign limited copyrights to the University. This limited set of rights includes the ability to distribute the articles at no cost, principally through an institutional repository. Efforts to coordinate adoption of this policy at other undergraduate institutions will be discussed.


CINF 21

Copyrights, contracts, and the common good: Making noncongressional law, and making it work for us

Sherwin Siy, Global Knowledge Initiative, Public Knowledge, 1875 Connecticut Avenue, NW, Suite 650, Washington, DC 20009, Fax: 202-986-2539

Most often, copyright affects us not through the operation of the law directly, but in contracts and licenses that leverage the powers that those laws grant to authors. In evaluating whether those licenses are good policy, we need to examine their effects on the two entities that copyright law exists to benefit: authors and the public. Copyright law confers a benefit on authors by granting them limited monopoly rights. This benefit, though an integral part of copyright, is a means to a larger end—a benefit to the public by incentivizing the creation of new works that the public can access. Contracts and licensing agreements that manipulate these rights can have positive or negative policy effects—all without having any effect on the underlying law. The NIH open access policy works as just such an agreement, and can be evaluated in comparison with other contracts and licenses—like free/open source licenses or end user license agreements.


CINF 22

SPARC: The Scholarly Publishing and Academic Resources Coalition

Heather Joseph, SPARC, American Research Libraries (ARL), 21 Dupont Circle, Suite 800, Washington, DC 20036

SPARC, the Scholarly Publishing and Academic Resources Coalition


CINF 23

Science Commons: A project of Creative Commons

Michael W. Carroll1, Thinh Nguyen2, and John Wilbanks2. (1) Washington College of Law, American University, 4801 Massachusetts Ave., N.W, Washington, DC 20016, Fax: 202-730-4756, (2) Science Commons, Creative Commons, 171 Second Street, Suite 300, San Francisco, CA 94105

Science Commons.


CINF 24

An integrated approach in the search of GABA aminotransferase inhibitors

Savita Bhutoria, savita_rs@iicb.res.in and Nanda Ghoshal, nghoshal@iicb.res.in, Structural Biology and Bioinformatics Division, Indian Institute of Chemical Biology, Jadavpur, kolkata, India

gamma aminobutyric acid is the inhibitory neurotransmitter in the mammalian central nervous system. The major pathway for its degradation involves the pyridoxal phosphate (PLP) dependent enzyme, GABA aminotransferase (GABA-AT). Designing GABA-AT inhibitors is trivial task, first because of very small enzyme active site and second the inhibitor should first react with PLP for enzyme inactivation. The inhibitors can attack the enzyme reversibly and irreversibly depending on the fact that inhibitor binds to only PLP or with PLP and protein. The solution applied here involved a set of multiple approaches together for designing new inhibitors. Using a virtual library, created by LUDI based fragments, substructures and subsequent isosteric group replacement, molecules were screened with structure guided multiple pharmacophores having the reversible and irreversible attacking functionalities. A set of similarity assessment methods and clustering was employed to recommend compounds for screening in a prospective docking experiment. The inhibitors should first react with the PLP and then with the enzyme, so a strategy was used in which hits were analyzed and validated by their tendency to react with PLP and formation of ternary complex. The hits selected were further evaluated and prioritized using QSAR analysis, which included the shape of the molecule into account and other important electronic and structural attributes of the molecules. Thus here a combined virtual screening and QSAR methodology is used to target the GABA-AT enzyme, reversibly and irreversibly. The new actives contained different underlying chemical architecture to the known inhibitors, results indicative of successful scaffold-hopping.


CINF 25

Descriptor importance of HIV-1 protease crystal structures for QSAR using random forest

Gene M. Ko1, gko@sciences.sdsu.edu, A. Srinivas Reddy2, asvreddy@gmail.com, Sunil Kumar3,

skumar@mail.sdsu.edu, and Rajni Garg1, rgarg@mail.sdsu.edu. (1) Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-1245, (2) Electrical and Computer Engineering Department, San Diego State University, 5500 Campanile Drive, C/O Sunil Kumar, San Diego, CA 92182-1309, (3) Electrical and Computer Engineering Department, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-1309

Random forest (RF) is a machine learning classifier that comprises of a collection of unpruned classification trees generated by using bootstrap samples of the data with random feature selection. Unlike many other machine learning techniques, RF has the advantage of determining the importance of all the variables in the dataset. The crystal structures of 62 HIV-1 protease binding pockets complexed with one of the nine FDA approved protease inhibitors deposited in the Protein Data Bank were studied. Quantitative understanding of the nature of the binding pockets would drive us to design novel inhibitors for HIV-1 protease. The descriptors have been computed for the binding pocket of each crystal structure, yielding 462 constitutional, topological, geometric, electrostatic, and quantum mechanical descriptors which can be used for deriving the Quantitative structure-activity relationship (QSAR). The optimal tree size (ntree) using the default sampling parameter (mtry) of 21 was determined to be 334 with an out-of-bag error of 45.2%. Adjusting the mtry parameters using 334 trees consistently produced the same highly ranked descriptors in the top ranked group of features, which confirms the stability of the classifier trees. The top ranked descriptors will be used to derive a QSAR model for bioactivity prediction.


CINF 26

Finding renewable energy materials one screensaver at a time

Roel S. S_nchez-Carrera1, rsanchez@fas.harvard.edu, Leslie Vogt2, lvogt@fas.harvard.edu, Roberto Olivares-Amaya2, olivares@fas.harvard.edu, and Al_n Aspuru-Guzik2, aspuru@chemistry.harvard.edu. (1) Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, (2) Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford St, Cambridge, MA 02138

Renewable energy technologies rely on materials with the ability to efficiently harness and transport energy from renewable sources. Recent advances in the field of computational chemistry have brought us closer towards an accurate prediction of the photovoltaic properties of a given molecular material even before experimental synthesis. However, scanning the vast chemical space in a single computer represents a difficult proposition. Working together with IBM's World Community Grid effort, we developed a screensaver (http://cleanenergy.harvard.edu), which allows individual users anywhere in the world to contribute their idle computer time to perform electronic structure calculations on combinatorial molecular libraries derived from fused aromatic molecules. The deployment of such a world-wide distributed computational engine has the potential to quickly find novel materials for the next generation of solar cells. The preliminary results of our combinatorial strategy will be presented. The preparation of a publicly available database of molecular structures and calculated properties will be also discussed.


CINF 27

Mining a large reaction database with name reaction patterns

Matthew A. Kayala1, mkayala@ics.uci.edu, Qian-Nan Hu1, qhu@uci.edu, Jonathan H. Chen1, chenjh@uci.edu, James S. Nowick2, jsnowick@uci.edu, and Pierre Baldi1, pfbaldi@uci.edu. (1) Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697, (2) Department of Chemistry, University of California, Irvine, 4126 Natural Sciences 1, Irvine, CA 92697-2025

Over the past several years, comprehensive data sets of small chemical compounds, such as our own ChemDB (http://cdb.ics.uci.edu), have been made publicly available for statistical analysis and data mining purposes. However, access to reaction data resources is comparatively restricted. With data largely unavailable, how to approach knowledge discovery in reaction databases is an open question. One potential method for data mining is to classify reactions using pattern matching rules. We present initial results on mining 2,000,000+ well-annotated reactions from a database of published reactions (SPRESI). Here, we have hand-composed 500+ SMIRKS language patterns to cover 306 common `Name Reactions'. The rules provide a broad classification of the database into a small number of classes based on net structural changes. To facilitate future research, a tool to classify reactions using the patterns has been made available as part of ChemDB.


CINF 28

Predicting metabolic transformation by cytochrome P450 main isoforms

Maayan Elias1, maayan.elias@mail.huji.ac.il, David Marcus2, david.marcus1@mail.huji.ac.il, and Amiram Goldblum2, amiram@vms.huji.ac.il. (1) Department of Medicinal Chemistry, School of Pharmacy, The Hebrew University of Jerusalem, Jerusalem 91120, Israel, (2) Department of Medicinal Chemistry, Hebrew University of Jerusalem, Grass Center for Drug Design and Synthesis, and Sudarsky Center for Computational Biology, Jerusalem 91120, Israel

Cytochrome P450 is a heme containing protein superfamily, responsible for most of the metabolic transformations taking place in the human body. Several isoforms are responsible for most xenobiotic transformations in the liver. Iterative Stochastic Elimination (ISE) was used to build classification models for predicting substrates and inhibitors of the isoforms 3A4,2D6,1A2 and 2C9. We constructed curated databases of substrates and inhibitors from compounds published in the literature. Models used molecular properties (2D descriptors) that were picked by optimizing the huge combinatorial problem of choosing a small subset of properties and their ranges from a large set of descriptors, by ISE. ISE models may be applied to molecular databases of any size and used to score each molecule's fitness to a specific model. We applied the models of P450 isoforms to search for new substrates and inhibitors and constructed a library of molecules that have high probabilities for becoming substrates or inhibitors of these isoenzymes. Isoform selectivity was studied by providing a matrix containing cross information from individual models. With this tool we can predict the metabolic potentials of investigated compounds and use these models as a screening tool for molecules in the drug discovery pipeline.


CINF 29

Sphericity and oblate-prolate indices: 3D shape descriptors for fast shape comparison

Sunghwan Kim, kimsungh@ncbi.nlm.nih.gov, Evan Bolton, bolton@ncbi.nlm.nih.gov, and Stephen H. Bryant, bryant@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894

If molecules are structurally similar to each other, they are likely to have similar biological and physicochemical properties. This so-called "similarity principle" is an important concept in ration drug design, which is applied to find potential drug candidates at the initial stage of drug discovery and development. Although the shape-Tanimoto (ST) value between two molecules is considered to be an accurate measure for 3-dimensional (3D) molecular shape comparison, the ST value computation is not fast enough to screen a huge molecular library which typically contains more than billions of conformers. In the present study, the shape quadrupole moments of a molecule were used to devise two 3D shape descriptors (the sphericity index and the oblate-prolate index) that allow a simple and fast shape comparison between molecules.


CINF 30
PDF

Federated search. An in-depth introduction

Abe Lederman, abe@deepwebtech.com, Deep Web Technologies, 301 N. Guadalupe Street, Ste. 201, Santa Fe, NM 87501

What is federated search? How do its users benefit? How does the technology unleash access to high quality scientific content hidden in the "deep web" and why is it that Google and the other "web crawlers" don't find much of this content? This comprehensive introduction to federated search will answer these questions and more. Topics include how the technology works and also the finer points of how quality of results, user interface, and results delivery matter. The important concepts of federated search will be reinforced through chemistry research via a demonstration using Deep Web Technologies' recently released science search portal, ScienceResearch.com.


CINF 31
PDF

Scitopia.org: Case study on using federated search to enable science and engineering research

Naveen K. Maddali, n.maddali@ieee.org, Product Management and Business Development, IEEE, 59 Woodhill St, Somerset, NJ 08873

Scitopia.org is a federated search portal that searches on Engineering and science society publisher websites. It was designed to provide a single location for researchers to retrieve high quality research articles. Scitopia was created and managed by 20+ societies, whose primary motivations to create the partnership were to promote the value of society literature and to bring new traffic into their society libraries. To date, Scitopia has had success, but there have been some challenges and obstacles. Integrating the content of 20+ diverse societies, competing in a highly competitive market and working with limited resources are some of the challenges that have been presented to the partnership. But in the end, has Scitopia been able to address the needs of the research community and has the federated search model worked? These will be evaluated.


CINF 32
PDF

SeerSuite for distributed indexing, federated search, and meta search

C. Lee Giles, giles@ist.psu.edu, Information Sciences and Technology, Pennsylvania State University, 101 Information Sciences and Technology Building, University Park, PA 16802, Fax: 814-865-7882, Prasenjit Mitra, Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, and Karl Mueller, Chemistry, Pennsylvania State University, University Park, PA 16802

Scalability has always been a challenge for search and retrieval systems. As more information becomes available, indexing and retrieval mechanisms to provide accurate and precise coverage of the objects and stores becomes more complex and less reliable. An example of this can be seen in the existing search engines on the world wide web. Scalability for information retrieval with search has been addressed by various methods, prominent among these include distributed indexing. This approach is in contrast to federated search, where the goal is to assemble multiple indices, covering different data sources and making them accessible, through a single interface. Federated search improves both performance and scope of information presented. While it has been argued that metasearch is a form of federated search since both span multiple data sources, a distinction can be made. In federated search the data sources may not overlap, and, therefore, coverage of the indices may not necessarily overlap. While different approaches exist to address the scalability issue, the tasks performed are consistent; query assembly and transformation, communication with multiple indices, extracting information from query results, mapping and merging these results into ordered lists in user defined formats. These approaches to scalability and distribution are not mutually exclusive; for example, a federated search can occur over multiple distributed indices. To address many issues in academic document search and indexing, we have developed many unique search tools which we call SeerSuite. These tools have allowed us to build a collection of academic search services - CiteSeerX,ChemXSeer, ArchSeer – which offer some of the largest publicly available collections of scientific literature and data on the web plus many unique metadata extraction features. We propose incorporating the SeerSuite services to give access to niche scalable search through a scalable federated search interface that has unique trainable metadata.


CINF 33
PDF

Delivering content to end users at their point of need means going beyond federated search

Brian P. Cannan, brian_cannan@oclc.org, Licensed Content Portfolio, OCLC, Inc, 6565 Kilgour Place, Dublin, OH 43017, Fax: 614-718-7073, Matthew Goldner, matthew_goldner@oclc.org, End User Services, OCLC, Inc, 6565 Kilgour Place, Dublin, OH 43017, and Mindy Pozenel, WorldCat Discovery Services, OCLC, Inc, 6565 Kilgour Place, Dublin, OH 43017

In a world where users only want to view the content that most directly relates to their research requirements, OCLC has undertaken a major initiative to better connect library audiences globally with the content libraries license for them. Beginning in April 2007, over 60 million article citations have been added to WorldCat from NLM, ERIC, GPO, Elsevier, the British Library and the ArticleFirst_ databases. This action reflects the need to connect users with the content licensed for them where users are working – web destinations such as search engines, FaceBook, Google Book Search, Google Scholar or Yahoo! Search, as well as their library. Continuing this effort to enrich the search experience of WorldCat users and improve the discoverability of this authoritative content, means addressing the challenge of increasing its visibility to these users where they are working, while protecting the Intellectual Property Rights of content providers.


CINF 34
PDF

21st century library: The preferred starting point for serious research? Helle Lauridsen, helle.lauridsen@serialssolutions.com, SerialsSolutions, Discovery Services, ProQuest, Kastedvej 37, 8200 Aarhus, Denmark

The move from print library to e-library in the past 10 years is causing huge and rapid changes in the access to information. Increasingly complicated web pages has been built by both publishers and libraries in order to show case the virtual cornucopia in the best possible way. But do they work? Can researchers and students find their way to the best possible resource instead of just the best known or the most convenient? Why is it that Google is barging ahead as the preferred start point for search – when research clearly shows that users do know that the library has the most reliable and trustworthy resources? This talk will investigate some of the attempts there has been to solve this problem and discuss the latest solution.


CINF 35
PDF

Using federated search to improve your ROI and boost research capabilities

Stephen R. DiStasio III, stephen.distasio@serialssolutions.com, Product Management- Resource Discovery, Serials Solutions/ProQuest, 501 N. 34th Street, Suite 200, Seattle, WA 98103

Federated Search exists today to serve a basic need- allowing the search of many resources from a single search box saving time and effort. However, by solving one problem federated search has created anotherhow is a researcher supposed to navigate through thousands of results from many resources with disparate areas of expertise? How many types of results will a researcher see if they search for "Magellan" in 300 different resources? "Magellan" means a lot of things... This session will explore methods of results handling that will help your patrons dig out from underneath the "avalanche" of results and find the information they need with minimal clicks. We will also discuss how federated search enables the "discoverability" of the expensive subscription resources in your library and ensures that you see a return on investment for those subscription dollars.


CINF 36

100 Years of Houben-Weyl and Science of Synthesis: Why you should care

Thomas Krimmer, thomas.krimmer@thieme.de, Thieme Chemistry, Georg Thieme Verlag, Ruedigerstrasse 14, Stuttgart 70469, Germany, Fax: +49-711-8931777

Today's researchers are overwhelmed by the myriad of synthetic methods available. Their personal experience as a practicing chemist usually covers only a few narrow fields. This dilemma cannot be solved by studying the journal literature alone. To fully assess the utility of a published method for lab use, ideally one must personally try and test a method. This is what Theodor Weyl wrote in the preface to the first edition of ‘Weyl's Methods in Organic Chemistry' in 1909. 100 years later the medium of publication has changed, but the basic problem remains the same. To benefit from the wealth of chemical information resources available, you need to understand them. This talk will discuss the chemical information landscape using the 100 year history of Houben-Weyl and Science of Synthesis as a thread to provide a clearer picture on what is out there and what it is good for in chemical information.


CINF 37

Ninety editions and still going strong: The CRC Handbook of Chemistry and Physics

Fiona Macdonald, Fiona.macdonald@taylorandfrancis.com, Taylor and Francis/CRC Press, 6000 Broken Sound Parkway NW, Boca Raton, FL 33411, Fax: 561-998-2559

Publishing the 90th edition of the CRC Handbook of Chemistry and Physics is a true milestone in the history of CRC Press. Since its first publication in 1913 – as a 116-page pocket-sized book priced at $2 – the Handbook has developed into a 2800-page tome that no longer fits anyone's pocket but still finds a place on every scientist's bookshelf. This journey, and other milestones in the 96-year history of the book, will be discussed, and along the way we will take a look at the Editors who shaped the book over the years


CINF 38

Chemical handbooks in the electronic age: Assuring data quality

David R. Lide, drlide@post.harvard.edu, CRC Press, Editor, 13901 Riding Loop Dr, Gaithersburg, MD 20878, Fax: 301-738-7147

While compilations of data in the form of printed handbooks have served chemists for almost two centuries, computer technology has produced major changes in data access in the last 20 years. Recent trends and future expectations in data dissemination will be discussed. Particular emphasis will be given to the questions of quality control that are raised by the ease of posting chemical data on the Internet and the highly effective search tools for retrieving the data. A case will be made for the continued utility of concise, carefully documented handbooks as data sources.


CINF 39

CAS databases: Where do we get all that chemistry?

Roger J. Schenck and Rebecca Kopelman, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: 614-461-7140

Over the years, CAS has published many handbooks, from the CA Index Guide™ to the Registry Number Handbook. Concurrent with the waning of such printed products at CAS, more and more chemical information, beyond the traditional sources such as journals and patents, has been included in the CAS databases. This presentation will focus on the types of handbook information being added to the various CAS databases and the challenges in harmonizing these eclectic collections. Examples of how CAS has added its own handbook information to its electronic collections as well as examples of adding other handbook information, from both printed and electronic sources, to the CAS databases will be showcased. In conclusion, the future of chemical handbooks at CAS will be explored.


CINF 40
PDF

Seeking solutions with federated search tools

Grace Baysinger, graceb@stanford.edu, Swain Library of Chemistry and Chemical Engineering, Stanford University Libraries, 364 Lomita Drive, Organic Chemistry Building, Stanford, CA 94305-5081, Fax: 650- 725-2274

Multidisciplinary approaches are needed to address increasingly complex problems. Federated and multidatabase search tools reduce barriers and enable users to discover information resources outside the boundaries of traditional academic fields. While technologies used to provide federated search services continue to evolve and improve at a rapid rate, outstanding issues remain. This presentation will highlight tools and services being used by the Stanford Libraries to support interdisciplinary collaboration and learning on campus.


CINF 41
PDF

Fedora: A network overlay approach to federated searching

Leah R. Solla, lrm1@cornell.edu, Physical Sciences Library, Cornell University, 293 Clark Hall, Ithaca, NY 14853-2501, Fax: 607-255-5288

Fedora (Flexible Extensible Digital Object Repository Architecture) is a very flexible framework for aggregating, organizing, and making use of a mix of metadata and content records. It has been used by the National Science Digital Library (NSDL) project to aggregate metadata records from over a hundred OAIPMH providers in order to provide a central search service over that metadata (and a limited amount of crawled text from the resources themselves) with search results that link out to the original web-based resources. Search exposure is critical as increasing numbers of repositories become available and cyber-research expands across traditional disciplines. The need for standards-based search solutions that can flexibly aggregate and combine information about resources from multiple repositories and other information sources is becoming increasingly evident. This talk will give an overview of the current status of using Fedora-based network overlays to search across repositories.


CINF 42
PDF

Fee-based abstracting and indexing services vs. free federated searching

Valerie K. Tucci, vtucci@tcnj.edu, Library, The College of New Jersey, 2000 Pennington Road, Ewing, NJ 08628, Fax: 609-637-5177

Library budgets are facing drastic cuts given the current economic crisis. Unfortunately, the fee-based abstracting and indexing services are now in the spotlight and in many cases, on the chopping block. These A&I services were considered sacred and essential in the seventies and the birth of online searching only strengthened their position. However, in the intervening forty years a new paradigm has evolved. Open access and free federated search services such as Google Scholar, Scitopia, Scirus and CiteSeer are now making librarians question the need for fee-based services. This presentation will examine the changing landscape for secondary A&I services and explore the possibility that these services may indeed follow the downward spiral that newspapers are on today.


CINF 43

Application of the Modular Chemical Descriptor Language (MCDL) methodology to SAR and QSAR in prostate cancer chemotherapy

Michael N. Burnett and Andrei A. Gakh, gakhaa@ornl.gov, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6242

In the Modular Chemical Descriptor Language (MCDL), the atomic composition of a molecule is specified with structural fragments, each consisting of a nonterminal atom and all terminal atoms attached to it. For example, the MCDL composition module of 2-bromobutane is CBrH;CHH;2CHHH, which shows there are three different structural fragments. In a study of how MCDL structure fragments containing halogen versus hydrogen might contribute to biological activity, a Free-Wilson analysis was performed on 200-300 literature examples of compounds studied for potential use in cancer chemotherapy. The results of this study will be presented and compared with SAR and QSAR results taken from the literature on the effects of halogenated fragments. This research was supported by the Global IPP program. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, under contract DE-AC05-00OR22725 for the U.S. Department of Energy. This paper is a contribution from the Discovery Chemistry Project.


CINF 44

3D QSPR for general use: Structure standardization

George D. Purvis III1, gpurvis@us.fujitsu.com, David T. Stanton2, stanton.dt@pg.com, William D Laidig3, and John D. Shaffer3. (1) Biosciences Group, Fujitsu, 15455 NW Greenbrier Pkwy, Suite 125, Beaverton, OR 97006, (2) Procter & Gamble, Miami Valley Innovation Center, 11810 East Miami River Road, Cincinnati, OH 45252, (3) Modeling and Simulations Group, Procter & Gamble, Miami Valley Innovation Center, 11810 E. Miami River Road, Cincinnati, OH 45253

Quantitative structure property relationships (QSPR) models are increasingly used to estimate properties of chemicals and to screen them for new product applications. Chemists who are not expert in modeling often use these models. Consequently, the models must be robust. In particular predictions must be insensitive to structure entry. Ideally, the same prediction is produced whether the QSPR model is given a structure in linear notation (e.g. SMILES), connection table format, or any of a number of 3D conformations regardless of the order of atom entry. Arguably, in the hands of experts, the best predictions and mechanistic interpretations are produced when the most structural information is available such as a fully optimized 3D conformation or an ensemble of conformations. However, 3D models come at the risk of sensitivity to structural input, not only for conformations of the same structure, but variability of conformations for similar structures. Here we address the question, "Can 3D based QSPR models be robust enough for general use by non experts or do the advantages of more unambiguous structure information of topological methods offset their possible lower accuracy?"


CINF 45
PDF

Chemical space network topology through atom typing

N. Sukumar1, nagams@rpi.edu, Mike Krein2, kreinm2@rpi.edu, and Curt M. Breneman2, brenec@rpi.edu. (1) Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute / RECCR Center, 110 8th St., Troy, NY 12180-3590, Fax: 518-276-4887, (2) Department of Chemistry / RECCR Center, Rensselaer Polytechnic Institute, 110-8th Street, Center for Biotechnology and Interdisciplinary Studies, Troy, NY 12180

The topological characteristics of chemical spaces and structure-activity landscapes set upper bounds to the predictivity of models constructed within these spaces. Here we analyzed the PubChem and ZINC databases (about 19 million and 2.5 million molecules, respectively) and the topological characteristics of the resulting networks. These are defined independent of biological activity, with nodes (molecules) within a preset level of 2-D similarity being connected by edges. Pairwise “Atomtyper distances” (the number of atoms in one molecule that are different from any atom in the other, to within a specified level of similarity) and “alchemical distances” (the number of atoms that have to be added, deleted or substituted to “transmute” one molecule into another) between molecules were determined, with pairs randomly sampled until the network characteristics converged. We also study the degree distributions of various subspaces at different similarity thresholds and the effects of employing other standard similarity measures.


CINF 46

Screening databases of hypothetical porous materials

Maciej Haranczyk1, mharanczyk@lbl.gov, Kevin Theisen2, Bei Liu2, and Berend Smit3, Berend- Smit@Berkeley.edu. (1) Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Mail Stop 50F-1650, Berkeley, CA 94720, Fax: 510-486-5812, (2) Chemical Engineering, UC Berkeley, Berkeley, CA 94720, (3) Chemical Engineering, UC Berkeley, 101B Gilman #1462, Berkeley, CA 94720-1462

Porous materials, e.g. zeolites, have many applications in the chemical industry. The number of possible zeolite structures has been estimated to be larger than 2.5 millions. Databases of hypothetical zeolite structures are being developed and they could in principle be screened for zeolites of any desired property. The current state-of-the-art molecular simulations allow for accurate prediction of zeolite properties but the computational cost of such calculations prohibits their application in the characterization of the entire database of hypothetical structures, which would be required to perform brute-force screening for novel structures with useful properties. Our work focuses on the development of an efficient screening technique that requires such expensive characterization only for carefully selected and statistically relevant subset of a database. Then, the database is screened employing the similarity principle. The developed screening technique, structural descriptors and similarity measures will be presented. This work is supported by the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.


CINF 47

Data mining cluster analyses of zeolite crystals

Mohammed Lach-hab1, mlachhab@gmail.com, Shujiang Yang1, syangf@gmu.edu, Iosif Vaisman2, ivaisman@gmu.edu, and Estela Blaisten-Barojas3, blaisten@gmu.edu. (1) Computational Materials Science Center, George Mason University, 4400 University Dr., MSN 6A2, Fairfax, VA 22030, (2) Department of Bioinformatics and Computational Biology, George Mason University, 10900 University Dr, Manassas, VA 20110, (3) Computational Materials Science Center, George Mason University, 4400 University Dr, MS 6A2, Fairfax, VA 22030

Computationally predicted inorganic solid materials are becoming increasingly available. Zeolite crystals are one example. Mining such source of information is challenging since the hypothetical compounds lack their crystallography description. Unsupervised classification of these compounds is useful for the material designer. In this work we train an unsupervised clustering model for identifying zeolites into four superclasses sharing common structural properties. The clustering algorithm is based on the probabilistic expectation maximization, which is trained on a set of 1400 zeolite crystals from the Inorganic Crystal Structure Database. A thorough feature importance analysis is carried out, resulting in two groups of features allowing classifications with up to 97 % accuracy. (Work supported under the National Science Foundation grant CHE-0626111. ICSD data are courtesy of the National Institute of Standards and Technology).


CINF 48
PDF

Knowledge acquisition from reaction database for metabolic profiling

Lothar Terfloth1, terfloth@molecular-networks.com, Thomas Klein_der1, J_rg Marusczyk1, Christof H. Schwab2, schwab@molecular-networks.com, and Johann Gasteiger2, gasteiger@molecular-networks.com. (1) Molecular Networks GmbH, Henkestrasse 91, D-91052 Erlangen, Germany, (2) Molecular Networks GmbH, Henkestrasse 91, Erlangen D-91052, Germany

In the drug discovery process multiple – partly competing – objectives have to be optimized in order to come up with a new lead structure. The identification of a potent and selective compound is not sufficient. Furthermore, a lead compound should possess a favourable pharmacokinetic profile. A lot of papers in the field of the in silico prediction of ADMET (absorption, distribution, metabolism, elimination, toxicity) properties were published. In comparison to the number of models which are available for the prediction of absorption it seems that less interest was dedicated to the modeling of metabolism. This paper focuses on the knowledge acquisition from reaction databases and its application to the metabolic profiling of drugs. The performance of metabolite prediction is investigated on an external validation data set of drugs and their metabolites which are reported in the literature. The merit of the consideration of the intrinsic reactivity of the substrates estimated by physico-chemical descriptors will be presented.


CINF 49

Crystal structure information aids drug discovery and development

Frank H. Allen, allen@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre (CCDC), 12 Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44-1223-336-033

Experimental observations of conformational preferences and intermolecular interactions in small-molecule crystal structures have been of fundamental importance in computational drug discovery since the early 1980s. The Cambridge Structural Database (CSD) now contains almost half a million crystal structures, and is used to generate two searchable knowledge-based libraries: Mogul, containing substructure-based distributions and statistics derived from more than 20 million bond lengths, angles and torsions, and IsoStar, containing over 20,000 scatterplots of non-bonded interactions between chemical functional groups. The talk will summarise applications of structural knowledge in chemistry, drug discovery and, more recently, in drug development and formulation. The inter-relationship between experimental information and computational results will also be discussed.


CINF 50

Creating data resources for biology: Lessons from the PDB and the PSI SGKB

Helen M. Berman, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ 08854

Many issues need to be considered when building resources that enable a variety of scientific communities. One is the necessity of a scalable infrastructure that can handle vast amounts and different types of data. This infrastructure must also be extensible to handle new and evolving technologies. Another concern is how to solicit and incorporate the needs and wants of a variety of user communities. Two global resources for science – the Protein Data Bank (PDB) and the Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB) – will be presented. The PDB has been the archive for the three-dimensional coordinates for experimentally-determined biological structures for the last 30 years. The PSI SGKB, launched in 2007, expands upon this information by integrating available structural, experimental, biological, and modeling data for all protein sequences. Today, both resources are used by researchers and students in a variety of disciplines who are studying these biological macromolecules and their relationships to sequence, function, and disease.


CINF 51

Community Structure-Activity Resource: Data repository to improve docking and scoring.

James B. Dunbar Jr., jbdunbar@umich.edu, College of Pharmacy Department of Medicinal Chemistry, University of Michigan, 428 Church Street, Ann Arbor, MI 48109

The Community Structure-Activity Resource is a center at the University of Michigan funded by the National Institutes of Health, specifically the National Institute of General Medical Sciences. The function of this center is to collect, curate, and disseminate data sets of crystal structures, biological binding affinities, and thermodynamic data to aid in the refinement of docking and scoring methodologies. These data sets are to come from in-house projects at the University of Michigan, other academic labs, a most importantly from industrial, large pharma, sources. Part of our remit is to fill in the gaps with synthesis, crystallography and biology targeted to augment, as best we can, what is currently available in terms of the full range of properties, binding affinities and other relevant characteristics involved in docking and scoring. This presentation will detail what we have done so far and what we plan for the future.


CINF 52

Bio-Activity databases

Albert J. Leo, aleo@biobyte.com and Alka Karup, akurup60@gmail.com, BioByte Corp, 201 W, 4th St. #204, Claremont, CA 91711

Following the Hansch-Fujita research that established the usefulness of a hydrophobic parameter in biological QSARs, the early databases constructed and compiled at Pomona College concentrated on collecting measured partition coefficients of as many solutes in as many solvent pairs as possible. Octanol/water, oil/water, and ether/water proved to be the solvent pairs most useful in the biological field, but other pairs soon found applications in fields such as ore enrichment for rare earths and uranium. The QSAR database of Hammett-type free energy equations, explaining bio-activity in quantitative terms, using hydrophobic, steric and electronic parameters, was at first kept separate from the properties database (Masterfile) of log Ps and pKas. The merger of the two, allowing for parameter ‘range searching' and automatic loading, has been accomplished, which makes user training much easier. The final product was called BioLoom to emphasize the need to "weave into fabric" the current "meteoric shower of facts."


CINF 53

Trust…but verify! On the importance of experimental data curation prior to building (Q)SAR models

Alexander Tropsha, alex_tropsha@unc.edu, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, Eugene Muratov, 00dqsar@ukr.net, Laboratory of Molecular Modeling, School of Pharmacy, The University of North Carolina at Chapel Hill, CAMPUS BOX 7360, Chapel Hill, NC 27599, and Denis Fourches, fourches@email.unc.edu, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, Beard Hall, Chapel Hill, NC 27599

Molecular modelers are always at the mercy of the primary data providers. We argue and illustrate with examples that the quality of data predefines the accuracy and predictive power of models irrespective of the rigor and thoroughness used in building (Q)SAR models. The primary data may contain errors in both chemical structures, values of the biological data, and associations between structure and bioassay results; frequently, there are duplicates. We show that many publicly available datasets including those recently used for QSAR competitions contain erroneous information that is sometimes sufficient to undermine the virtue of the competition. We further show that the data errors influence significantly if not dramatically the accuracy of the resulting models. Conversely, we demonstrate that rigorously built (Q)SAR models can help identifying and correcting gaps and possible errors in primary datasets. Finally, we propose simple protocols for primary data analysis and curation.


CINF 54

Learning from a drug guru: Part of a new wave of cheminformatic analysis

Kent D. Stewart, kent.d.stewart@abbott.com, Global Pharmaceutical Research & Development, Abbott Laboratories, 100 Abbott Park Road, Abbott Park, IL 60064, Fax: 847-937-2625

DRUG GURUTM (Drug Generation Using Rules) is a computer program that applies medicinal chemistry “rules-of-thumb” to an input structure to design new analogs [K. D. Stewart et al., Bioorg. Med. Chem. 14, 7011-7022, 2006]. This presentation will review the basics of the Drug Guru program and compare and contrast results with related software programs BROODTM (Open Eye), BIOSTERTM (Accelrys) and EMILTM (CompuDrug). This work will be placed into the context of current research in “Compound Pairs Analysis” that is under active investigation in many research groups.


CINF 55

Hyperparametric modeling, i.e. modeling.

Anthony Nicholls, OpenEye Scientific Software, Inc, 9d Bisbee Court, Santa Fe, NM 87508

The field of molecular modeling, from quantum mechanics to QSAR, is beset with parameters. Is this a problem? Even good physical theory requires a parameter or two, but forty, a hundred, a thousand? This talk will propose that the entire field of molecular modeling is over-parameterized, hyper-parameterized even, because methods of assessing a judicious number of constraints are either unknown, ignored or improperly applied. The consequences are profound but not irredeemable.


CINF 56

Template-constrained topomer CoMFA

Richard D. Cramer, cramer@tripos.com, Tripos Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241

While topomer CoMFA (using topomer poses in 3D-QSAR) is showing remarkable promise (1), in particular by providing an R-group virtual screening to lead optimization projects (2), there remains a concern, if only when depicting the results, about the physicochemical improbability of many topomer poses. Our belief is that the success of topomer CoMFA results from the self-consistency of topomer poses, and therefore that other rigorously self-consistent pose generation methodology should also succeed. Constrained topomer CoMFA allows the user to provide template conformations. Under this circumstance, wherever there is a suitable mapping of 2D structures between a template conformation and the fragment whose topomer is being generated, the template coordinates are copied directly to the topomer, with the topomer rules are used as usual elsewhere. In this context, “suitable mappings” must begin at the fragment root, include exact matches of (heavy) atom and bond types to the maximum possible extent, and then finish whenever heavy atom topologies no longer conform. Results from using such constrained topomer poses in the 25 published topomer CoMFA models and 50 R-group searches will be presented.

1) Cramer, R. D. Topomer CoMFA: A Design Methodology for Rapid Lead Optimization, J. Med. Chem., 2003, 46, 374-389.

2) Cramer, R. D.; Cruz, P.; Stahl, G.; Curtiss, W. C.; Campbell, B.; Masek, B. B.; Soltanshahi, F. Virtual Screening for R-groups, including Predicted pIC50 Contributions, within Large Structural Databases, Using Topomer CoMFA. J. Chim, Inf. Mod., 2008, 48, 2180-2196.


CINF 57

Power to the people: Integrating data and analysis in one easy application

Derek A. Debe, Discovery Informatics, Abott Laboratories, Mailstop AP10/R42T, 100 Abbott Park Rd., Abbott Park, IL 60064, Fax: 847-937-2625

This talk will discuss the successful development and deployment of a Drug Discovery data integration and analysis platform at Abbott Laboratories. This application serves as the data analysis centerpiece for Abbott's Discovery chemists and biologists. Specific use case examples will be presented, including functionality useful for Hit-to-Lead analysis and Lead Optimization efforts. Attendees will gain an understanding of 1) the successful deployment of a very well-received data integration and analysis platform to our research scientists and 2) how software vendor available tools can be integrated together to produce a successful small molecule discovery research centerpiece application.


CINF 58

A probabilistic approach to compound subset selection for virtual and high-throughput screening

Philip Hajduk, Advanced Technology Division, Abbott Laboratories, AP10 LL, 100 Abbott Park Rd, Abbott Park, IL 60064, Fax: 847-937-2625

A probabilistic approach to compound subset selection is described using a belief theory framework for chemical similarity and bioactivity. The approach outperforms conventional methods of subset selection and enables a quantitative assessment of the risk of missing bioactives when testing only a subset of the available compounds. Applications of this approach in assessing chemical diversity in various compound collections will be described.


CINF 59

Application of belief theory to similarity data fusion for use in analog searching and lead hopping

Steven Muchmore, R4DG, Abbott Laboratories, 100 Abbott Park Rd., Abbott Park, IL 60064

Computational approaches to detecting chemical similarity have been developed using diverse strategies that strive to capture the features of molecules that are salient to some activity. Methods have been developed that exploit both 2D and 3D descriptions of molecules, and it has long been recognized that different measures of similarity will give rise to different rankings of a collection of molecules to the query. The use of the similarity measures to find effective substitutions for a known active molecule is commonly undertaken, and this technique has been referred to as “lead hopping”. One difficulty in effective lead hopping is in combining results from different measures of similarity in a meaningful and productive way. This work presents a probabilistic approach, which attempts to reconcile different lead hopping techniques by establishing a common framework for comparison.


CINF 60
PDF

Ligand-based drug discovery in an era of structure-based drug discovery

Yvonne C. Martin, yvonnecmartin@comcast.net, 2230 Chestnut St., Waukegan, IL 60087, Fax: 847-937- 2625

With increasing numbers and types of 3D ligand-macromolecule structures becoming available every year, it is time to ask whether ligand-based methods are obsolete when one has a structure on which to base a design. This presentation will present observations that suggest that careful analysis of ligand structureactivity relationships provides independent information that contributes the discovery of ligands with the desired profile of potency, novelty, selectivity, etc.


CINF 61
PDF

One search, many answers: Bringing together results from multiple databases through the DiscoveryGate platform

Carmen I Nitsche, Carmen.Nitsche@symyx.com, Vice President Content, Symyx Technologies, 254 Rockhill Drive, San Antonio, Texas, TX 78209

Despite technological advances, chemists are still faced with having to learn a myriad of online search systems from which they are trying to retrieve pertinent chemical information. In this paper we will review various approaches employed on the DiscoveryGate_ platform to bring together over a dozen different commercial and no fee databases across various vendors. In particular we will discuss the Compound Index, as a means of retrieving related information. We will also explore how newly developed technologies based on web services readily bring together information from varied sources, and deliver the sought information into standard search/browse applications, into customer built applications, and directly into scientific workflow applications.


CINF 62
PDF

Federated search in commercial and noncommercial structure and reaction databases: A flexible approach

Valentina Eigner-Pitto, ve@infochem.de and Josef Eiblmaier, InfoChem GmbH, Landsberger Strasse 408, Munich 81241, Germany

Chemically relevant databases often are located either in the company's intranet or the internet. The approach described here provides access to multiple structure and reaction databases, commercial and noncommercial. User access is provided via an intuitive, easy to use web interface. The search can be conducted as structure, reaction, or factual data search. A challenge faced in the implementation of a federated structure search is the heterogeneity of the different data sources as regards the technical interface and the content and format of the results. Moreover the query must be translated into the specific query language of each of the foreign target systems. Our approach connects to any database that provides either an Oracle cartridge or a web service and that can handle a structure query in MDL Molfile, Smiles or Rosdal format. The meta search engine utilizes database specific connectors that use one of multiple protocols such as SOAP, SRU, SQL*Net/Net8. The query is sent to the distributed search services. Results in different native formats are collected, consolidated and presented to the user. In a results overview, hits are grouped in different contexts such as “Structures”, “Reactions” or “Documents” which gives the user the possibility to view the hit in the desired context. The hit lists themselves provide hyperlinks for direct access to the original display or document page.


CINF 63
PDF

Oops and downs of resolving InChIs for the chemistry community

A J Williams, tony@chemspider.com, ChemZoo, 904 Tamaras Circle, Wake Forest, NC 27587

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.


CINF 64
PDF

BioMart: Federating public and proprietary data

Arek Kasprzyk, arek.kasprzyk@oicr.on.ca, Bioinformatics and Biocomputing, Ontario Institute for Cancer Research, 101 College Street, suite 800, Toronto, ON M5G 0A3, Canada

BioMart is an open source data management system focused on 'data mining'-like searches of complex descriptive data. The power of the system comes from integrated querying of data sources regardless of their geographical locations through a single web interface. BioMart Central Portal (www.biomart.org) offers a one-stop shop solution to access to over 20 biological databases distributed in multiple locations. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape. The system also supports programmatic access through Perl API as well as RESTful and SOAP oriented web services. Recently, BioMart has been adapted as a data management platform for the International Cancer Genome Consortium (ICGC). The BioMart-based ICGC portal will provide unified access to new generation sequencing data from 50,000 genomes distributed among different cancer research institutes around the world. Additional data sources from public domain will be federated in order to add more annotations to the data generated by the ICGC. BioMart can easily be adapted as an in-house data management solution. Furthermore, once deployed it will facilitate federation with publically available data sources thus bringing the wealth of public domain data into integrated querying of proprietary data.


CINF 65

Half a million crystal structures in the CSD: A unique teaching resource in 3D structural chemistry

Frank H. Allen, allen@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre (CCDC), 12 Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44-1223-336-033

Crystallography is the method of choice for characterising chemical structures. The Cambridge Structural Database (CSD) now contains data for half a million crystal structures, representing a massive library of 3D chemical information that can be interrogated and displayed using state of the art software. Apart from its well known research applications in pharmaceutical and structural chemistry, the CSD System provides chemistry teachers with a unique opportunity to incorporate all aspects of 3D chemistry into their courses. These include, inter alia, molecular dimensions, the conformations and stereochemistry of cyclic and acyclic moieties, reaction pathways, and the geometrical and directional aspects of hydrogen bonds and other non-bonded interactions.


CINF 66

Bond lengths, crystal structure determinations, and research in the undergraduate classroom

Guy Crundwell, CrundwellG@mail.ccsu.edu, Department of Chemistry, Central Connecticut State University and STaRBURSTT CyberDiffraction Consortium, 1615 Stanley St., New Britain, CT 06050, Neil M. Glagovich, glagovichn@ccsu.edu, Department of Chemistry, Central Connecticut State University, 1615 Stanley Street, PO Box 4010, New Britain, CT 06050, and Barry L Westcott, westcottb@ccsu.edu, Department of Chemistry, Central Connecticut State University, New Britain, CT 06050

At CCSU, the Cambridge Structural Database (CSD) is used in undergraduate research, inorganic laboratory, and in our special topics course in crystallography. When encountering topics for the first time in textbooks, student often find hand-picked data aimed to illustrate fundamental topics in structure and bonding. However, the raw data mined from the CSD challenges students to think more critically about these fundamental topics of bonding and molecular structure since the data does not present itself as neatly as a vetted table in a textbook. The use of the CSD allows a professor to test student backgrounds of previously learned material, to highlight to students the limitations in methods of data collection, and to work with students to gain the ability to synthesize broader applications and connections between bonding and structure.


CINF 67

CRYSTMET: Inorganic crystal structures in chemical education and materials design

J Rodgers, jrodgers@innovativematerials.com, Innovative Materials Technologies Inc, 12B Charles Bagot Street, Gatineau, QC J8X4E1, Canada

CRYSTMET is a database of crystal structures of compounds that do not contain a C-H bond – metals, alloys, minerals and other inorganic compounds. CRYSTMET contains 130,000 crystal structure entries classified according to structure type and other criteria. Software for searching and structure visualisation is supplied with the database. CRYSTMET is a rich source of examples of both common and uncommon structure types in inorganic and materials chemistry that are essential in chemical education at various levels. The talk will describe the database and its information content, and also indicate how the accumulated data is being used in modern materials design.


CINF 68

Using the Cambridge Structural Database to explore concepts of symmetry

Dean H. Johnston, djohnston@otterbein.edu, Department of Chemistry and Biochemistry, Otterbein College, Westerville, OH 43081, Fax: (614) 823-1968

The Cambridge Structural Database provides a rich and virtually unlimited source of example molecules for teaching concepts of symmetry. Various exercises have been developed for use in basic and advanced undergraduate courses in Inorganic Chemistry. In one exercise, students used the CSDSymmetry database along with the Cambridge Structural Database to identify molecules with interesting point group symmetry and then presented their findings to the other students in the class. Several of these examples have been incorporated into an online symmetry gallery that includes an interactive display of the full set of symmetry elements and operations for each molecule.


CINF 69

Teaching crystallography in physical chemistry

Virginia B. Pett, pett@wooster.edu, Department of Chemistry, The College of Wooster, 943 College Mall, Wooster, OH 44691, Fax: 330-263-2386

In a computer-based laboratory session physical chemistry students visualized the packing diagram of a crystal structure. They examined both "real space"—the unit cell of the crystal—and "reciprocal space"— the diffraction pattern. The students calculated the density of the crystal, measured bond lengths and bond angles to compare the experimental measurements with valence bond ideas of hybridization, and found hydrogen bonds in the crystal. In an advanced physical chemistry topics course the students accessed the Cambridge Structural Database to visualize the three-dimensional crystal structure of molecules and to investigate symmetry, packing, and intermolecular interactions in the solid state. Each project was organized so that the students made discoveries, drew conclusions, and presented their results in writing. They examined an organic bicyclic ring structure to investigate ring symmetry and ring pucker; they were challenged to find an example of unusual hydrogen bonding in the packing diagram of another organic molecule.


CINF 70

Teaching molecular structure using Jmol

Robert M. Hanson, hansonr@stolaf.edu, Department of Chemistry, St. Olaf College, 1520 St. Olaf Avenue, Northfield, MN 55057

In this presentation the current principal programmer and project director of the Jmol molecular visualization applet will illustrate recent advances in Jmol that are particularly relevant to crystal structure visualization.


CINF 71

Modeling and simulation in biochemistry: A guide for users and consumers of crystallographic information

Katherine A. Kantardjieff, kkantardjieff@fullerton.edu, Department of Chemistry and Biochemistry, California State University Fullerton, 800 N. State College Blvd., Fullerton, CA 92834-6866, Fax: 734- 939-4225

A biomolecular crystal structure is a hypothesis based upon model agreement with the diffraction data. Models validated by established criteria present an opportune starting point for additional computation that may provide further insights into biochemical function and mechanism, as well as successfully guide drug discovery efforts, including target selection, synthesis, and design modification to optimize binding affinity and pharmacokinetic properties. Crystal structures provide the basis for comparative protein structure modeling which, by matching accuracy with intended use, may be used for virtual screening, defining antibody epitopes, protein engineering, rational mutagenesis, molecular replacement phasing, and fitting low resolution electron density. Given a structure, molecular dynamics or QM/MM approaches may further elucidate catalytic mechanism and contribute meaningfully to inhibitor design. As we shall see in this presentation, exploiting biomolecular crystal structure in modeling and simulation can be quite powerful in addressing a research problem or learning about fundamental chemistry. However, caveat emptor.


CINF 72
PDF

Education and certification of patent information professionals in Europe

Bob Stembridge, Bob.Stembridge@thomsonreuters.com, Customer Relations, Thomson Scientific, 77 Hatton Garden, EC1N 8JS London, United Kingdom

The work of the patent information professional is central to the patent system from identifying the prior art necessary to establish the patentability of an invention, through determining freedom to operate within a given territory, to helping to detect infringement of IP rights and providing support for proceedings against alleged infringers. But how does one learn the skills involved and, perhaps more importantly, how can an individual demonstrate that they possess the necessary knowledge and experience to conduct patent information work competently and reliably? Although University courses exist which include modules for IP education, these are scant basis on which to equip the student with the wide range of knowledge about search systems and languages, databases, patent systems, claims construction etc. required to be considered a competent professional. In practice, this knowledge has traditionally been acquired and accumulated through experience and learning “on the job”. In today's fast-moving environment, there is a need for trained professionals ready to step up to the plate straight out of training. This presentation will describe initiatives in Europe to formalize the education of tomorrow's patent information professionals and put in place a system to assess and certify both existing and aspiring patent information professionals to assure the necessary quality required for the future health of the patent system.


CINF 73
PDF

PERI Patent Information Course

Edlyn S. Simmons, edlyns@earthlink.net, Simmons Patent Information Service LLC, 5528 Brewer Rd., Mason, OH 45040, Fax: 513-398-3660

In 1989, the Patent Committee of the Pharmaceutical Manufacturers Association's Information Management Subsection introduced a course on the fundamentals of patent law and patent information resources. The course was developed because existing patent search training covered only database content and search techniques, while training in basic patent law and principles was left to informal interactions with mentors and colleagues. The course continues to be presented by PERI, filling the training gap for new patent searchers in the 21st century. This presentation will review the content of the course.


CINF 74
PDF

Law librarianship

Renate Chancellor, School of Library Science, Catholic University of America, 620 Michigan Avenue, NE, 246 Marist Hall, Washington, DC 20064

Intellectual property education in law librarianship.


CINF 75
PDF

USPTO: Education of the inventor community

John Calvert, Supervisory Patent Examiner, USPTO, Alexandria, VA 22313-1450

The USPTO has offered education and assistance to individual inventors for many years. Recently, the concept of the individual inventor has progressed from the mom and pop garage inventor to small business inventors and university inventors. From this change has come a need to educate a growing number of individuals. With the increased need for education and limited resources the USPTO has begun to provide many education opportunities using the vast resources of the electronic age. The USPTO now provide online chats, video links on their web-site, educational sessions from various web sources and live web casts of inventor conferences.


CINF 76
PDF

Copyright basics

Eric S. Slater, e_slater@acs.org, Publications Division, Copyright Office, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112

This session will feature a general discussion of basic United States Copyright Law, including, but not limited to, such topics as subject matter of copyright, exclusive rights of copyright, duration of copyright and application of copyright law to new technology and methods of distribution. Additionally, the speaker will discuss different “movements” (e.g., Open Access, Creative Commons, etc.) and how these have affected copyright law and practices of publishers. Finally, the session will conclude with a primer on the permissions process, and why it is important to be aware of copyright when using material posted on the Internet.


CINF 77
PDF

Recent developments in intellectual property

Hans Sauer, Biotechnology Industry Organization, Washington, DC 20024, and Pamela J. Scott, Pamela.J.Scott@pfizer.com, Legal Division, Pfizer, Inc, Eastern Point Road, MS 8260-1611, Groton, CT 06340

Overview to intellectual property education


CINF 78

Using crystallographic databases in the ACA summer course in small molecule crystallography

John C. Woolcock, woolcock@iup.edu, Department of Chemistry, Indiana University of Pennsylvania, 239C Weyandt Hall, Indiana, PA 15705

The American Crystallographic Association (ACA) Summer Course is a ten-day intensive program that teaches both single-crystal and power diffraction. Participants are encouraged to bring their own samples for structure determination and during the course they have access to both the Cambridge Structural Database (CSD) and the Powder Diffraction File (PDF). This presentation will focus on the ways the CSD and the PDF are incorporated into the lecture and lab components of the ACA course. The previous knowledge that participants have about crystallographic databases and how they use them to support structure determination in the course will also be examined.


CINF 79

Conceptualizing reaction mechanisms using crystallographic data

Kraig A. Wheeler, kawheeler@eiu.edu, Department of Chemistry, Eastern Illinois University, 600 N Lincoln Avenue, Charleston, IL 61920

Classroom discussions of organic reaction mechanisms offer students useful opportunities to explore the intimate details of reaction processes. The advantage of having students study chemical reactions from a mechanistic view rather than pattern recognition (memory recall) is obvious; students with a fundamental understanding of mechanisms are more able to predict reaction outcomes and can transfer prior mechanistic insight to new reaction schemes. In general, attention to such course material is limited to 2-D drawings and arrow-pushing exercises. Since the Cambridge Structural Database contains a wealth of structural information that has served to support existing reaction theories and unravel mechanistic details, this resource should also provide a valuable teaching tool. Well-placed discussions that combine the advantages of crystallographic data and traditional approaches help students gain a more lucid grasp of this material. This presentation will highlight several examples of the application of crystallographic data to reaction mechanisms in the organic classroom.


CINF 80

An interactive online teaching subset of the Cambridge Structural Database

Gary M Battle, battle@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, United Kingdom

The Cambridge Structural Database (CSD) serves as the worldwide repository of small-molecule crystal structure data. As such, this unique database of approaching one-half million molecules is a crucially important resource for chemical education. However, despite the obvious benefits of using experimentallymeasured 3D structures, this powerful resource is under utilised in undergraduate teaching. This talk will illustrate how, through use of a free interactive online teaching subset of the CSD, crystal structure information can be made readily accessible to students. A range of associated teaching exercises will also be discussed. Together, these online resources aimed at non-crystallographers can be used to enhance learning across the chemistry curriculum.


CINF 81

Teaching with the Cambridge Structural Database from general chemistry to advanced inorganic chemistry

Stephen A. Koch, stephen.koch@sunysb.edu, Department of Chemistry, State University of New York at Stony Brook, Stony Brook, NY 11794-3400, Fax: 631-632-7960

Stony Brook has had a site license for many years for the Windows version of the Cambridge Structural Database. This has enabled the author to use the CSD in his classes including Honors General Chemistry, Organic Lab, and Advanced Inorganic Chemistry as well as Graduate Inorganic Chemistry. General Chemistry students can easily learn how to use this research level database and use it to explore the diverse structural chemistry of molecular inorganic compounds. I also use the CSD to introduce 2nd and 3rd year chemistry majors to the concept of structure and substructure searching before teaching them to use more expensive, seat limited databases. The Web version of the CSD will make it much easier for undergraduate students to use the CSD in their classes.

CINF 82

Utilization of the Cambridge Structural Database system in the undergraduate chemistry curriculum.

Gregory M. Ferrence, ferrence@ilstu.edu, Department of Chemistry, Illinois State University, Campus Box 4160, Normal, IL 61790-4160

Spatial ability, the ability to manipulate 3-D objects in our heads, has a relationship to undergraduate chemists' performance. Commonly, weakness in this area impedes his/her progress. Technological advances both in e-learning tools and information availability have greatly enhanced the chemical educations community's ability to address this skill set. The Cambridge Structural Database System includes an atomic coordinate database for nearly half a million compounds. The information available is fundamentally 3-D in nature and may be easily rendered as visual graphics using CSDS programs. Academic access to the CSD has been available since the 1960's; however, it was primarily used as a tool for crystallographers. During the past decade, the tools used to extract and manipulate information from the CSD have evolved into a powerful, user-friendly suite of software (the CSDS) which industrial sector researchers in the life-sciences have come to regard as invaluable. Contrastingly in academia, use of this powerful set of research and teaching tools remains dominantly regarded as useful to “hard-core”crystallographers alone. Through a Discovery Corps Senior Fellowship (DCF) from the National Science Foundation, the speaker has been facilitating site license access to the CSDS for more than 30 Primarily Undergraduate Institutions and providing workshops at these institutions to help and encourage faculty to integrate use of the CSDS into their undergraduate chemistry curriculum. This talk will discuss the overall DCF project and illustrate examples of how CSDS information can be used to enhance chemistry learning throughout the span of organic, analytical, physical, biochemical, inorganic, etc chemistry.

CINF 83

Using the Cambridge Structural Database as a resource for undergraduate research and teaching

Barbara A. Reisner, reisneba@jmu.edu, Department of Chemistry and Biochemistry, James Madison University, MSC 4501, Harrisonburg, VA 22807

The Cambridge Structural Database (CSD) has become an important resource for undergraduate research and teaching in the Department of Chemistry and Biochemistry at James Madison University, a Primarily Undergraduate Institution (PUI). This presentation will focus on the role that the CSD plays in early research experiences in inorganic chemistry both through the Integrated Inorganic/Organic Laboratory and in undergraduate research. CSD-centered learning objects (small instructional units) that have been developed for implementation in the sophomore-and senior-level inorganic chemistry lecture courses during the 2009-2010 academic year will be presented. Finally, the role of the Virtual Inorganic Pedagogical Electronic Resource (VIPEr, http://www.ionicviper.org) as a platform for dissemination and as a mechanism for obtaining community feedback will be discussed.

CINF 84
PDF

Evaluation of FTrees in terms of scaffold hopping on different targets by retrospective and prospective virtual screening

R_bert Kiss, r.kiss@richter.hu, Gedeon Richter Plc, Gy_mr_i _t 30-32, Budapest H-1103, Hungary, andGy_rgy M. Keser_, gy.keseru@richter.hu, Gedeon Richter Plc, P.O.Box 27, Budapest H-1475, Hungary

FTrees was reported as a useful approach for finding novel hits by using information about known actives. Trees can be classified as a mixed 2D/3D approach. It uses a molecular descriptor (Feature Tree) that is a reduced graph representation of molecules containing connectivity information and pharmacophore features only. Several publications suggested the efficiency of FTrees in scaffold hopping. Our group evaluated the efficiency of FTrees on four different targets in comparison with simple 2D fingerprint similarity searching. The evaluation was carried out by analyzing the highest enrichment factors, speed and diversity of active compounds discovered. The influence of the query reference compound was also investigated. We also conducted prospective virtual screens and subsequent pharmacological evaluation of the virtual hits.

CINF 85

Anticancer activity of SERT binding sulfur-substituted α-alkyl phenethylamines

Andrew JS. Knox1, andrew.knox@tcd.ie, Suzanne Cloonan2, cloonans@tcd.ie, John J Keating3,jj.keating@ucc.ie, Stephen G Butler4, butlersg@tcd.ie, Georgia Golfis5, ggolfis@tcd.ie, Anne M J_rgensen6, anmj@lundbeck.com, Gunther H. Peters6, Dilip Rai7, dilip.rai@ucd.ie, David G. Lloyd1,david.lloyd@tcd.ie, D Clive Williams2, and Mary J Meegan4. (1) Molecular Design Group, School of Biochemistry and Immunology, Trinity College Dublin, College Green, Dublin, Dublin D2, Ireland, Fax:353-676-2400, (2) School of Biochemistry and Immunology, Trinity College Dublin, Dublin D2, Ireland,

(3) School of Pharmacy and Department of Chemistry, University College Cork, Cork, Ireland, (4) School of Pharmacy and Pharmaceutical Sciences, Trinity College Dublin, Dublin D2, Ireland, (5) Molecular Design Group, School of Biochemistry and Immunology, Trinity College Dublin, Dublin D2, Ireland, (6)Department of Chemistry, The Technical University of Denmark, Lyngby, Denmark, (7) Centre for Synthesis and Chemical Biology, School of Chemistry & Chemical Biology, University College Dublin, Dublin D4, Ireland

The recent revelation that certain serotonin reuptake transporter (SERT) targeting ligands may act as proapoptotic agents in the treatment of cancer adds greatly to their diverse potential pharmacological application. 4-methylthioamphetamine (MTA) is a potent inhibitory ligand for SERT. In this study, a novel library of structurally diverse 4-MTA analogues were synthesised with or without N-alkyl and /or C-Α methyl or ethyl groups and their potential SERT-dependent antiproliferative activity was assessed. A number of these novel SERT-targeting agents displayed potential anti-cancer effects with EC50's within the low micromolar range. Computational analyses were carried out to determine any possible relationship between SERT activity and pro-apoptotic activity on several cell lines. Using in silico 'Target-Fishing' techniques we propose possible mechanisms of action for these compounds.

CINF 86
PDF

Merging and growing fragments interactively

Marcus Gastreich, marcus.gastreich@biosolveit.de and Christian Lemmen, BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, Germany, Fax: +49 2241 2525 525

Fragments experience a buzz these days: Upon detection of fragment binders in protein active sites, the general strategy is to merge, grow, or link them to enhance binding of a resulting 'composite' ligand.

Computational chemistry ideally supports this workflow by sensible proposals for synthesis and modifications. However, on the computational side complications are the lack of time and the quality of checks for synthetic accessibility of the proposals. Our tool ReCore was extended to identify linker motifs from excessively large fragment libraries which do not only connect fragment binders in their experimentally observed position, but also comply with the binding motifs using pharmacophores and other features. ReCore is fast enough to provide instant feedback to the user – thereby enabling an interactive query refinement. The algorithm moreover favors synthetically feasible solutions by the setup of its search libraries and upon forming the resulting composite ligands. Validations across different targets are reported.

Computers in Chemistry - Abstracts


COMP 8

Detection, assignment, and analysis of multiple scaffolds for medicinal chemistry project databases

 

Alex M. Clark, aclark@chemcomp.com, Research & Development, Chemical Computing Group, Inc, 1010 Sherbrooke St West, Suite 910, Montreal, QC H3A2R7, Canada
Analysis of structure-activity data for lead optimization often involves simultaneously classifying several series of analogous compounds according to scaffolds and R-group substituents. We have developed new algorithms for detection and analysis of multiple common scaffolds, and an interactive web-based report for examining the relationship between structure and activity.

The method for scaffold analysis advances the state of the art in the following ways:

• The scaffold detection method finds multiple related common scaffolds, which will be aligned to each other in order to estimate common orientation

• The assignment of scaffolds to molecules takes into account degeneracy, such as is the case for symmetrical scaffolds, in order to minimize the resulting R-group diversity

• If partial information about the scaffolds is already available, this can be used to influence or override the automated methods

The results of this analysis are used to create a report, in which:

• Molecules are rendered in 2D showing aligned scaffolds and implied R-groups

• Tools for structure-activity analysis include correlation tables, activity estimation, fragment analysis, property graphs and navigation of similarity space

• The report uses standard cross-platform HTML/JavaScript features which can be rendered by all modern browsers


COMP 9

Online chemical modeling environment: models

 

Iurii Sushko, Sergii Novotarskyi, Anil Kumar Pandey, Robert Körner, and Igor V. Tetko, itetko@vcclab.org. Helmholtz Zentrum Muenchen German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology, Ingolstaedter Landstrasse 1, Neuherberg, D-85764, Germany
The modeling framework is being developed to complement the Wiki-style database of chemical structures available at http://qspr.eu (see also our presentation at CINF). It's main goal is to provide a flexible and expandable calculation environment, that would allow a user to create and manipulate QSAR and QSPR models on-line. The modeling framework is integrated with the database web-interface, that allows easy transfer of database data to the models. The web interface of the modeling environment is aimed to provide to the Web users easy means to create high-quality prediction models and estimate their accuracy of prediction and applicability domain. The developed models can be published on the Web and be accessed by other users to predict new molecules on-line. This tool is aimed to generate a new paradigm for structure activity relationship knowledgebases, making QSAR/QSPR models active, user-contributed and easily accessible for benchmarking, general use and educational purposes.

COMP 10

In silico profiling based on Aureus Global Pharmacology Space Knowledgebase

 

François Petitet, francois.petitet@aureus-pharma.com, Aureus Pharma, 174 Quai de Jemmapes, Paris, 75010, France
In the past years Aureus Pharma scientists assembled from structure-activity relationship literature a considerable amount of pharmacological data integrated into a unique knowledge management system. In Aureus Global Pharmacology Space (GPS) more than 500 000 chemical structures are linked to over 2 million quantitative biological activities for major therapeutic drug targets such as GPCRs, Kinases, Ion Channels, Proteases, and Nuclear receptors. Mining this GPS helps revealing potentially interesting polypharmacology compounds and rapidly generate in silico drug profiles based on chemical and biological annotations. Considering typical medicinal chemistry scaffolds such as phenothiazines, butyrophenones, benzodiazepines, dihydroperidines and others we analyzed the target activity profiles available for corresponding ligands and described in the GPS platform. For most of these structures we identified active representative compounds in several target protein classes. Using a newly developed application named AurPROFILER and thanks to our highly structured data schema and biological activities normalization, the target profiles obtained are easily visualized, analyzed, and reported. Several other examples of in silico generated profiles to build hypotheses on drug action mechanisms as well as off-target risk assessment will demonstrate the powerful approach of in silico profiling based on a strongly structured pharmacological knowledge database.

COMP 11

BIDATA: An SAR Knowledgebase for data retrieval and new compound suggestions

 

Scott Oloff, Research Chemistry Systems, Boehringer-Ingelheim Pharmaceuticals Inc, 900 Ridgebury Road, PO Box 368, Ridgefield, CT 06877
Having a thorough understanding of the published SAR, internal data, and available IP is an absolutely necessity in the pharmaceutical industry. Many of these data sources however are scattered across multiple applications and in different formats making it difficult to interpret the data in the same context. This presentation will discuss approaches and technologies we have used to incorporate commercial SAR DB's with our own internal DB. There will also be discussions surrounding how this SAR is compared with patent DB's to identify available IP space.

COMP 12

Using knowledgebases of structure-activity-data, receptor-site and protein structural similarity to generate new matter ideas

 

Steven M Muskal, smuskal@eidogen-sertanty.com, Eidogen-Sertanty, Inc, 3460 Marron St #103-475, Oceanside, CA 92056
For several years, researchers have leveraged SAR, protein sequence and structural similarity in numerous ways, including but not limited to target hypothesis, target prioritization, ligand design, and lead optimization.

Strong synergies can be realized when coupling a large body of ligand-based structure-activity content with the growing body of target-based structural information. For example, given over 55,000 publicly available apo- and co-complex protein structures, very reliable models can be proliferated within and across many species. With this expanded structural view of the proteome, larger than expected conservation of receptor site-similarity can be identified and leveraged. We show how an automated design of novel matter by LigandCross or ligand hybridization using receptor-site similarity can be a very productive workflow.


COMP 23

2'-F-2'-C-Methyl nucleosides for the treatment of HCV: From discovery to the clinic

 

Michael J. Sofia, Pharmasset Inc, 303-A College Road East, Princeton, NJ 08540
Hepatitis C is a global health problem with over 170 million individuals infected with the hepatitis C virus (HCV). Infection with HCV has been shown to lead to chronic liver disease, cirrhosis and eventually hepatocellular carcinoma. Currently, the standard of care is a combination of interferon-alpha and ribavirin, however, this regimen has limited effectiveness and is associated with debilitating side-effects. The search for direct acting antiviral agents has lead to the discovery of R7128 a nucleoside prodrug that inhibits the HCV NS5B polymerase. R7128 has demonstrated exceptional potency and safety in the clinic against genotype 1, 2 and 3 patients. In addition, PSI-7851, a nucleotide prodrug, showed increased liver exposure of the active triphosphate metabolite in laboratory animals and has also entered clinical evaluation. The discovery and current state of development for these two agents will be presented.

COMP 24

Modeling binding modes of HIV integrase inhibitors

 

Xiaowu Chen1, S. Swaminathan2, and James M. Chen, James.Chen@gilead.com2. (1) Dept. of Structural Chemistry, Gilead Sciences, Inc, 333 Lakeside Drive, Foster City, CA 94404, (2) Department of Structural Chemistry, Gilead Sciences, Inc, Foster City, CA 94404
Although significant progress has been made in HIV integrase inhibitor drug discovery, as demonstrated by FDA approval of Merck's raltegravir, there is still very limited understanding of inhibitor binding modes due to the lack of relevant crystal structures. In order to gain insight into the mechanism of inhibition and aid drug discovery effort, we have constructed an active site model of HIV-1 integrase complexed to both viral DNA and inhibitor. Our model suggests a common binding mode for potent integrase inhibitors that involves interactions with an induced active site hydrophobic pocket, formed upon viral DNA binidng. In addition, based on analysis of large number of nucleotidyltransferase, substrate, and Mg complex structures, we hypothesized that potent integrase inhibitors interact with only one of two bound active site Mg cations. To further validate the model, we made specific compounds and mutated key residues predicted to play an important role in inhibitor binding. These predictions were subsequently confirmed experimentally.

COMP 25

Identifying novel anthrax toxin lethal factor inhibitors via topomeric searching and docking/scoring

 

Elizabeth A. Amin, eamin@umn.edu1, Ting-Lan Chiu, tlchiu@umn.edu1, Derek Hook, hookx017@umn.edu2, Michael A. Walters, walte294@umn.edu2, and Satish Patil, pati0037@umn.edu1. (1) Department of Medicinal Chemistry, University of Minnesota, 717 Delaware St SE, Minneapolis, MN 55414, (2) Institute for Therapeutics Discovery and Development, University of Minnesota, 717 Delaware St SE, Minneapolis, MN 55414-2959
Anthrax is an acute infectious disease caused by the spore-forming, Gram-positive, rod-shaped bacterium Bacillus anthracis. The lethal factor (LF) enzyme is a zinc metalloenzyme secreted by B. anthracis as part of a tripartite exotoxin and is chiefly responsible for anthrax-related cytotoxicity. As LF can remain in the system for long after antibiotics have eradicated B. anthracis from the body, the preferred therapeutic modality is the administration of antibiotics together with an effective LF inhibitor. Such inhibitors must not only bind strongly to the receptor but must also possess excellent ADMET profiles. Although LF has attracted much attention as a drug target, few published inhibitors have demonstrated activity in cell-based assays and no LF inhibitor is currently available as a therapeutic or preventive agent. Here we present a novel virtual screening protocol which, together with experimental high-throughput screening, was able to identify nine new non-hydroxamic acid small molecules functioning as LF inhibitors with low micromolar-level inhibition against that target. A key topomeric searching component of this protocol was able to prioritize twenty-two thousand compounds from an initial dataset of approximately thirty-five million non-redundant structures. Compounds identified by this method were subsequently subjected to docking and scoring and drug-like (ADME-related) filtering protocols. Among the nine new hits, none of which was previously identified as a LF inhibitors, seven demonstrated experimental activity against LF less than 50 micromolar. Three of the top hits that exhibited single-digit IC50 values may potentially serve as scaffolds for lead optimization. Each of these three hits demonstrates a different zinc-binding mechanism predicted by docking and scoring; future work is planned to experimentally assess predicted binding modes by means of X-ray crystallography.

COMP 26

Fragment-based molecular docking in inhibitor discovery against CTX-M class A β-lactamase

 

Yu Chen, chen@blur.compbio.ucsf.edu, Department of Pharmaceutical Chemistry, UCSF, 1700 4th ST, RM#501, QB3 Building, San Francisco, CA 94158-2330 and Brian Shoichet, shoichet@cgl.ucsf.edu, Department of Pharmaceutical Chemistry, University of California, San Francisco, 1700 4th Street, QB3 Building, Room 508D, San Francisco, CA 94143.
Fragment screens have successfully identified new scaffolds in drug discovery, often with relatively high hit rates (5%) using small screening libraries (1,000-10,000 compounds). This raises two questions: would other interesting chemotypes be found were one to screen all commercially available fragments (>300,000), and does the success rate imply low specificity of fragments? We used molecular docking to screen large libraries of fragments against CTX-M beta-lactamase, one of the most common extended-spectrum beta-lactamases in many regions of the world and also a challenging target for inhibitor discovery. Ten mM-range inhibitors were identified from the 69 compounds tested. The docking poses corresponded closely to the crystallographic structures subsequently determined. Intriguingly, these initial low affinity hits showed little specificity between CTX-M and an unrelated beta-lactamase, AmpC, which is unusual among beta-lactamase inhibitors. This is consistent with the idea that the high hit rates among fragments correlate to a low initial specificity. As the inhibitors were progressed, both specificity and affinity rose together, leading to the first micromolar-range non-covalent inhibitors against a class A beta-lactamase.

COMP 27

Design and optimization of novel peptide deformylase inhibitors as new antibacterial agents

 

Kelly M. Aubart, Kelly.M.Aubart@gsk.com1, Andrew B. Benowitz1, Xiangmin Liao1, Joseph M. Karpinski1, Jinhwa Lee2, Jason Dreabit1, Yuhong Fang1, Andrew Knox1, Stephanie Kelly1, Nino Campobasso3, Chaya Duraiswami3, Kate J. Smith3, Maxwell Cummings4, Jacques Briand3, Swarupa Kulkarni5, Thomas F. Lewandowski6, Peter DeMarsh6, Rimma Zonis6, Lynn McCloskey6, Stephen Rittenhouse6, Siegfried B. Christensen7, Magdalena Zalacain6, and Martha Head3. (1) Medicinal Chemistry, Infectious Diseases CEDD, GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, PA 19426, (2) Green Cross Corp, 303 Bojeong-dong, Giheung-gu, Yongin, 446-770, South Korea, (3) Molecular Discovery Research, GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, PA 19426, (4) 3-Dimensional Pharmaceuticals, 665 Stockton Drive, Exton, PA 19341, (5) Oncology Business Unit, Novartis, Florham Park, NJ 07932, (6) Microbiology, Infectious Diseases CEDD, GlaxoSmithKline, 1250 S.Collegeville Road, Collegeville, PA 19426, (7) Virtual Proof of Concept Discovery Performance Unit, GlaxoSmithKline, 709 Swedeland Road, King of Prussia, PA 19406
Polypeptide Deformylase (PDF) is a metalloenzyme that has garnered much attention within the pharmaceutical industry as a promising target for the development of novel antibacterial agents. This enzyme catalyzes the removal of a formyl group from the N-terminal methionine of newly synthesized bacterial proteins, a deformylation process that is essential for bacterial survival. PDF is a relatively small protein (20-25 kD) that has proven to be amenable to X-ray crystallography studies. We have capitalized on this readily available structural information to design multiple series of novel non-peptidic inhibitors. The design and successful optimization of these PDF inhibitors will be discussed.

COMP 28

De novo design of novel polypeptide deformylase (PDF) inhibitor templates with broad spectrum antibacterial activity

 

Chaya Duraiswami, Chaya.2.Duraiswami@gsk.com1, Robert A Daines2, Nino Campobasso3, Mythili Vimal4, Israil Pendrak2, Magdalena Zalacain5, and Kelly M. Aubart, Kelly.M.Aubart@gsk.com6. (1) Computational and Structural Sciences, GlaxoSmithKline Pharmaceuticals, 1250 South Collegeville Road, UP-1110, Collegeville, PA 19426, (2) Dept. of Chemistry, GlaxoSmithKline, 1250 South Collegeville Road, UP-1110, Collegeville, PA 19426, (3) Molecular Discovery Research, GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, PA 19426, (4) Department of Discovery Medicinal Chemistry, GlaxoSmithKline, Harlow, United Kingdom, (5) Microbiology, Infectious Diseases CEDD, GlaxoSmithKline, 1250 S.Collegeville Road, Collegeville, PA 19426, (6) Medicinal Chemistry, Infectious Diseases CEDD, GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, PA 19426
A broad-spectrum antimicrobial target must be conserved across all pathogens of interest within a therapeutic product profile, essential for bacterial growth, and either absent, substantially different or non-essential in humans. PDF meets all of these criteria and is one of the most promising unexploited bacterial targets in the search for new antibiotics with a novel mode of action. PDF (EC 3.5.1.88) is a metalloprotease that removes the N-formyl group of the polypeptides as they emerge from the ribosome during or immediately after completion of the elongation process.

Structure-based design studies in conjunction with de novo design studies using Allegrow was employed to find novel backup templates with broad-spectrum activity for the PDF program. The results from these studies will be presented.


COMP 29

Discovery of novel small-molecule inhibitors of P. falciparum using the hybrid structure based method

 

Sandhya Kortagere, sandhya.kortagere@drexelmed.edu1, JM Morrisey1, J Bosch2, KD Laroiya1, T Daly1, WJ Welsh3, E Fan2, W Hol2, P Sinnis4, I Ejigiri4, LW Bergman1, and AB Vaidya1. (1) Department of Microbiology and Immunology, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, (2) University of Washington, Department of Biochemistry and Biological Structure, Seattle, WA, (3) Department of Pharmacology, University of Medicine and Dentistry of New Jersey-Robert Wood Johnson Medical School, Piscataway, NJ, (4) Department of Parasitology, New York University School of Medicine, New York, NY
A key component of host cell invasion by Apicomplexan parasites is the interaction between the carboxy terminal tail of myosin A and the myosin tail interacting protein-MTIP. Based on the co-crystal structure of P. knowlesi MTIP and a MyoA tail peptide and using Hybrid Structure Based virtual screening approach, a series of small molecules were identified as having potential to inhibit MyoA-MTIP interactions. Of the initial 15 compounds tested, a pyrazole urea compound inhibited P. falciparum growth with an EC50 of ~250 nM. Screening of an additional 51 compounds belonging to the same chemical class identified eight compounds with EC50 of ~300 nM and one with an IC50 of ~50 nM. Interestingly, the compounds appear to act at several stages of the parasite life cycle to block growth and development. Thermal melting studies of MTIP in the presence and absence of the compounds show that many of the compounds bind and stabilize MTIP. The pyrazole urea compounds identified in this study could be effective antimalarials since they competitively inhibit a key protein-protein interaction between MTIP and MyoA responsible for the gliding motility and invasive features of the malarial parasite.

COMP 37

Structure activity relationship analysis using PubChem

 

Evan Bolton, bolton@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894
PubChem is a free, online public information resource from the National Center for Biotechnology Information (NCBI). The system provides information on the biological properties and activities of chemical substances, linking together results from different sources on the basis of chemical structure and/or chemical structure similarity. With over 500 different targets, 1,400 bioassays, and 45,000,000 activity data points, PubChem is a significant source of publicly available bioactivity data. Unlike many available SAR Knowledgebases, PubChem contains screening data of both actives and inactives. Available tools allow one to dynamically create structure activity relationships based on structure similarity (2D or 3D), target similarity, and target profile. The use and utility of these tools will be discussed.

COMP 38

GOSTAR: GVK BIO online structure activity relationship database: Data and its utility

 

Jagarlapudi Sarma, sarma@gvkbio.com, Informatics, GVK Biosciences Private Limited, S-1, Phase-1, Industrial Technocrats Estate, Balanagar, Hyderabad, 500 037, India
GVK BIO is well known for the development of Knowledge databases of chemical entities (~4 million compounds) with structure activity relationships. Information relating chemical structure, biological target, in vitro and in vivo assay for efficacy/pharmacodynamics, clinical as well as Pharmacokinetics and toxicity is well integrated in different databases wherein the source information has been covered from a variety of Journals articles and patents for a variety of target families. Many pharmaceutical companies have been using these databases for different applications and/or modeling studies. Recently, GVKBIO has integrated all its individual databases into one single database, GOSTAR which has a very good web-based UI for different types of online queries. In the process of integration of all individual databases into one data model, all the data has been standardized and necessary taxonomy and ontology were used to handle the integrated data. Any query will extract the data from all databases whether they feature in discovery, development or marketed drug space. Further, one can analyze the retrieved molecules for any off-target activity as well as other indications. A number of descriptors can be generated using the online available tools and the data can be analyzed for various models. Tools have been developed to study and to visualize the chemical, biological and Therapeutic indication space as well as company related information. Further tools were developed to filter the data based on chemical, pharmacological or toxicity filters and help the research process for better drug discovery. We will be discussing some case studies on the usefulness of the database in the drug discovery.

COMP 39

ChEMBL: Large-scale mapping of medicinal chemistry and pharmacology data to genomes

 

John P Overington, jpo@ebi.ac.uk, Team Leader, Chemogenomics and ChEMBL Databases, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
Although the majority of effective therapeutics are small molecules, there is relatively little readily accessible public domain data mapping drug to their molecular targets. When one considers clinical trial stage, or discovery stage data, the situation deteriorates further. However, this type of data is essential for Chemical Biology experiments, and is crucial for informed target selection in drug discovery. To address this issue, we have built a series of large scale databases, known as ChEMBL, that map small molecule structures to their target genes and also their functional effects. This data also captures a large ammount of human and model organism pharmacological data, systems often used in pre-clinical validation and safety pharmacology testing. A variety of applications of these databases in the area of target prioritisation, lead discovery, lead optimisation and drug repurposing will be described.

COMP 40

Pharmacoinformatics on very large annotated ligand databases

 

Rashmi Jain, rjain@evolvus.com and Aniket Ausekar, aniket@evolvus.com. Evolvus Group, 88, Shukrawar Peth, Prune, 411002, India
Ligand data excerpted, as a repository for past knowledge representation was prioritized. Accordingly, very large annotated ligand databases with more than 2 Million ligands, containing manual annotations on chemical and pharmacological data from Journals and Patents were used for mining compounds in an effort to identify de-novo candidates in a virtual screen, against selected, previously validated targets. Clustering on controlled datapoints against all major therapeutically relevant target families (GPCR, Ion-channel, Protease, Transporters, Kinase, Nuclear Hormone receptor) was performed and challenging compounds were designed. Designed compounds represented an unique set of chemistry and were synthesized for further validation.

COMP 53

Some observations on the quality of 3D QSAR data sets

 

Ryszard J. Czerminski, ryszard.czerminski@astrazeneca.com, AstraZeneca Pharmaceuticals LP, 35 Gatehouse Drive, Waltham, MA 02451, C Eyermann, Joe.Eyermann@astrazeneca.com, Infection Discovery, Cancer and Infection Research Area, AstraZeneca, R&D Boston Inc, 35 Gatehouse Drive, Waltham, MA 02451, and John I. Manchester, John.Manchester@astrazeneca.com, Infection Discovery, AstraZeneca Pharmaceuticals LP, 35 Gatehouse Drive, Waltham, MA 02451.
Oxazolidinones are a novel class of antibiotics. However, off-target activity has limited the number of agents in this class that have appeared on the market. One such activity is inhibition of monoamine oxidase A (MAO-A). We present a 3D QSAR study MAO-A inhibition by a new set of about a hundred oxazolidinones using the recently introduced Simple Atom-Type Mapping Following Alignment (SAMFA) method. In SAMFA, traditional molecular field-based descriptors are replaced with force-field-like atom types at the atomic centers giving rise to those fields. Although this approach reduces the number of descriptors to the number of atoms for each molecule in a given data set, we show that for nine data sets, including the steroid benchmark, there is no difference between SAMFA and Comparative Molecular Field Analysis (CoMFA). In fact, in many cases SAMFA descriptors can be further simplified to represent only whether certain atomic positions are occupied among aligned sets of molecules, without significantly affecting q2. We propose that this observation stems from artifacts that arise from incomplete sampling of the biologically relevant chemical space within those data sets. Two diagnostic approaches for characterizing this sort of undersampling will be presented and used to demonstrate that the oxazolidinone data set is less susceptible to artifact.

COMP 54

AutoGrow: A novel algorithm for protein inhibitor design

 

Jacob D. Durrant, jdurrant@ucsd.edu, Biomedical Sciences Program, UCSD, 9500 Gilman Drive #0685, La Jolla, CA 92093-0685, Rommie E Amaro, ramaro@mccammon.ucsd.edu, Department of Chemistry & Biochemistry, University of California San Diego, 9500 Gilman Drive, 4206 Urey Hall - MC 0365, La Jolla, CA 92093-0365, and J Andrew McCammon, jmccammon@ucsd.edu, Howard Hughes Medical Institute, Department of Chemistry and Biochemistry and Department of Pharmacology, Center for Theoretical Biological Physics, University of California at San Diego, 9500 Gilman Drive, Mail Code 0365, La Jolla, CA 92093-0365.
Trypanosoma brucei (T. brucei) is an infectious agent for which drug development has been largely neglected. T. brucei is endemic to Africa, where it can infect the central nervous system in humans and cause African sleeping sickness. One potential T. brucei drug target is RNA editing ligase 1 (TbREL1), a critical component of a unique mitochondrial mRNA-editing complex known as the editosome. TbREL1 is an excellent drug target because it is essential for T. brucei survival and has no close human homologues.

AutoGrow, a new program that combines the strengths of fragment-based growing, docking, and evolutionary algorithms, is used to add interacting moieties to NSC16209, a known TbREL1 inhibitor. Careful analysis of the top AutoGrow-generated ligands suggests that they bind TbREL1 in ways similar to ATP, the natural TbREL1 substrate. The compounds presented here may serve as valuable starting points for future drug-design efforts in the fight against Human African Trypanomiasis.


COMP 55

Computational models of the action of protegrin antimicrobial peptides: Transient ion diffusion and osmotic swelling

 

Dan Bolintineanu, boli0073@umn.edu1, Ehsan Hazrati, Allison A. Langham, langham@dtc.umn.edu1, Robert I. Lehrer2, H. Ted Davis1, and Yiannis N. Kaznessis, yiannis@cems.umn.edu1. (1) Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455, (2) Department of Medicine, UCLA, Los Angeles
Protegrins are a class of highly effective antimicrobial peptides, believed to act primarily by permeabilizing the bacterial cell membrane. We have conducted molecular dynamics simulations of the membrane-embedded pore structure formed by protegrin. We have then used structures extracted from these simulations as input to a continuum electrodiffusion model, in order to quantify the electrical conductance characteristics of such pores, and obtained good agreement with previously published experimental data. Finally, we have modeled the effects of multiple pores on an entire cell, using data obtained from the molecular and continuum electrodiffusion models. We have been able to estimate the number of pores required to reproduce the experimentally measured potassium release rate from an E. Coli cell, as well as quantify the effects of ion exchange processes on osmotic swelling of cells. Combined with experimental data, these models provide a comprehensive picture of the permeabilizing mechanism of protegrin antimicrobial peptides.

COMP 56

Design of new antibacterial drugs: Computational approaches that take advantage of the rapid generation of multiple co-crystal structures

 

John Finn, jfinn@triusrx.com, Trius Therapeutics, Inc, 6310 Nancy Ridge Dr, Suite 101, San Diego, CA 92121
New classes of antibacterial drugs with novel mechanisms of action are needed to combat bacterial resistance. To meet this challenge, we focus on novel (or underexploited) antibacterial targets and utilize structure-based drug design techniques. We have built a structural biology platform that generates multiple (2-6) structures per week per program. This structural data provides information needed to design compounds with spectrum, selectivity, antibacterial activity and drug properties. This talk will provide a critical overview of the role of computational chemistry in structure-centric antibacterial discovery programs. We will describe our computational chemistry experiences, including:

• Lead Discovery: virtual screening and de novo design

• Lead Optimization: LUDI evolution of leads, ligand docking and scoring, design of compounds with spectrum and selectivity, dealing with enzyme flexibility and design of drug-like properties


COMP 57

Heme oxygenase as antimicrobial target: Results from computer-aided drug design and experiment

 

Pedro E. M. Lopes, lopes@outerbanks.umaryland.edu, Angela Wilks, awilks@rx.umaryland.edu, and Alexander D. MacKerell Jr., amackere@rx.umaryland.edu. Department of Pharmaceutical Sciences, University of Maryland, 20 Penn St., Baltimore, MD 21201
A variety of life-threatening diseases including meningitis, pneumonia, cholera and dysentery are caused by Gram-negative pathogens. They have developed sophisticated mechanisms for iron acquisition, which is important for their proliferation and infectivity. In addition to iron acquisition many of these pathogens can also utilize heme as an iron source. The final step of iron acquisition from heme is oxidative cleavage by a heme oxygenase (HO). We hypothesize that HO may provide a potential target for drug development. In this work, we apply computer-aided drug design (CADD) virtual screening techniques to identify small molecules inhibiting Neisseria meningitidis HO. Several of the compounds were found to have KD values in the micromolar range for Neisseria meningitidis HO and Pseudomonas aeruginosa HO. Moreover, data from simple host-pathogen models indicates that such compounds have antimicrobial activity.

COMP 58

Methodologies for efficient knowledge-based antibody homology modeling

 

Johannes Maier, jmaier@chemcomp.com, Chemical Computing Group, Inc, 1010 Sherbrooke Street West, Suite 910, Montreal, QC H3A 2R7, Canada
Antibodies are globular proteins composed of two heterodimers with each set containing a heavy chain (VH) and light chain (VL). The binding to an antigen is in most antibodies facilitated by six loops, three originating from the VL domain, termed L1, L2 and L3, and three from the VH domain, termed H1, H2 and H3. Due to their modular composition and high target specificity antibodies have become increasingly attractive for use as drugs. Antibody Homology Modeling techniques have often been applied in generating therapeutically more effective antibodies. Here, we demonstrate a collection of procedures as well as an interface to meet the demands of effective antibody homology modeling. The application has flexible components allowing the integration of various work-flows associated with this specific form of modeling. The routines account for the particular structural composition of antibodies when searching for template candidates and building models. A knowledge-based approach is applied with an underlying database of antibody structures originating from the Protein Data Bank (PDB), clustered by class, species, subclass and framework sequence identity. A specially designed loop grafting routine allows for generation of xenogeneic antibody models.

Take-Home Message:

• Fast and efficient generation of antibody models

• Integration of various work-flows associated with antibody homology modeling

• Accounting for the structural composition of antibodies when searching for candidates and model building

• CDR Loop grafting


COMP 59

Prediction of drug resistance using all-atom molecular simulations

 

Robert C. Rizzo, rizzorc@gmail.com, Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600
Robust prediction of protein-ligand binding and drug resistance remains a difficult and challenging problem despite great strides made in both rigorous and more approximate free energy calculation methods. In this talk, we present our experiences using all-atom molecular dynamics followed by post-processing methods for estimation of binding free energies with application to the drug targets neuraminidase and epidermal growth factor receptor. Results of our optimization efforts, to improve virtual screening procedures using the program DOCK, will also be presented which focus on development of efficient protocols for reproduction of crystallographically observed binding poses using rigid, fixed anchor, and flexible ligand docking for a wide variety of targets.

COMP 67

Linking genomic knowledge to natural products and drugs

 

Minoru Kanehisa, kanehisa@kuicr.kyoto-u.ac.jp, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan
The large-scale datasets generated by genome sequencing and other high-throughput experimental technologies are the basis for understanding life as a molecular system and for developing medical, pharmaceutical, and other practical applications. The key to linking such large-scale datasets to practical values lies in bioinformatics technologies, not only in terms of computational methods, but also in terms of knowledge bases. In the KEGG database resource (http://www.genome.jp/kegg/) we organize our knowledge on higher-level systemic functions in computable forms, such as metabolism in KEGG pathway maps and therapeutic category of drugs in BRITE functional hierarchies. This enables bioinformatics analysis of genomic and molecular-level data to infer higher-level functions through the process of pathway mapping and BRITE mapping. A variant of this approach is to infer chemical structures of endogenous molecules that can be synthesized in a given organism, knowing the enzyme repertoire in the genome and the biosynthetic pathways, together with possible biological activities. I will report on our strategy to analyze the chemical architecture of natural products derived from enzymatic reactions (and enzyme genes) and the chemical architecture of marketed drugs derived from human made organic reactions in the history of drug development.

COMP 68

Metabolic liability and SAR analyses derived from bioactivity databases

 

Russ Hillard, russ.hillard@symyx.com, Product Marketing, Symyx Technologies inc, 2440 Camino Ramon, San Ramon, CA 94583
SAR analyses conducted on libraries taken from bioactivity databases can yield insight into the dependence of therapeutic activity (and/or adverse side effects) on variations in chemical structure. Often such studies make the assumption that administered compounds are, in fact, the active chemical agents. Metabolic transformations following administration but prior to key biochemical processes involved in observed activity can produce significant structural modifications in the actual bioactive entities. Ideally, then, SAR analyses should include examination of known metabolic outcomes for compounds under investigation. Mining this information from available electronic collections of known biotransformations and correlating it to SAR data will be discussed.

COMP 69

Discovery and data mining using the NCBI BioSystems database, a centralized repository linking small molecules to their biological function

 

Lewis Geer, lewisg@ncbi.nlm.nih.gov, National Center for Biotechnology Information, Bldg. 38A, Room 5S512, 8600 Rockville Pike, Bethesda, MD 20894
The NCBI BioSystems database contains biological relationships between the small molecule records found in PubChem and the gene and protein records found in Genbank. These relationships directly link the structure of small molecules to their biological function. By centralizing and standardizing these records and then linking them to multiple NCBI databases like PubMed and PubChem BioAssay, the BioSystems database is intended to be a convenient and extensive resource for fundamental structure-function information and associated annotations.

COMP 70

SAR studies using ChemBiobase, a knowledgebase on Target centric small molecules

 

Sooriya Kumar, sooriya_kumar@jubilantbiosys.com, Jubilant Biosys, # 96, 2nd Stage Industrial Suburb, Yeshwantpur, Bangalore, 560 022, India
Scientists involved in drug discovery process require broad range of information to assist their decision making process. To help in this task, they have access to large databases built in-house as well as provided by various vendors. In addition, they refer to vast amount of scattered information available as Patent and Journal literature. They further look for solutions which help to manage the data deluge. Given this, Jubilant has developed comprehensive set of target centric ligand databases i.e. ChemBioBase which provide useful and important complimentary information on small molecules that exhibit activity against targets in a particular family. These databases cover wide range of druggable targets including Kinases, Proteases, GPCR's, Ion channels and Nuclear Hormone receptors. Such thematic databases would help the researchers to know everything in the given field and carry out several virtual screening tasks. ChemBioBase would allow the researchers to perform structure-activity relationship (SAR) studies for molecules tested for a particular target at a given assay condition across the publications. Manually drawn chemical structures from ChemBioBase are used for clustering of molecules. This is done with respect to defined scaffold or activity and to create chemical libraries. Utility of these databases towards SAR along with content and coverage in terms of chemistry/biology spaces will be discussed.

COMP 146

Visualization of cyclic and multibranched molecules with VMD

 

Simon Cross, hodgestar@gmail.com1, John E. Stone, johns@ks.uiuc.edu2, James E. Gain1, and Michelle M. Kuttel, mkuttel@cs.uct.ac.za1. (1) Department of Computer Science, University of Cape Town, Private Bag X3, Rondebosch, Cape Town, 7701, South Africa, (2) Beckman Intitute, University of Illinois at Urbana-Champaign, 405 N. Mathews Ave., Urbana, IL 61801
We have added two new visualization algorithms, termed PaperChain and Twister, to the Visual Molecular Dynamics (VMD) package. These algorithms produce visualizations of complex cyclic and multi-branched molecular structures. PaperChain highlights each ring in a molecular structure with a polygon, which is coloured according to the ring pucker. Twister traces the glycosidic backbone with a ribbon that twists according to the relative orientation of successive sugar residues. Combination of these novel algorithms with the large set of visualizations already available in VMD allows for unprecedented flexibility in the level of detail displayed for glycoproteins, as well as other cyclic structures. We highlight the efficacy of these algorithms with selected illustrative examples, clearly demonstrating the value of the new visualizations, not only for structure validation, but for facilitating insights into molecular structure and mechanism.


COMP 147

PoseView: 2D Visualization of protein-ligand complexes

 

Katrin Stierand, stierand@zbh.uni-hamburg.de, Center for Bioinformatics, University of Hamburg, Bundesstr. 43, Hamburg, 20146, Germany and Matthias Rarey, Center for Bioinformatics (ZBH), University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany.
Although computer-aided molecular design and virtual screening software tools improve continuously, manual investigation of the resulting complexes a control task in modelling. In contrast to 3D visualization, information contained in 2D plots can be identified by a short glance and are therefore more appropriate for scanning through large datasets.

We present a new version of PoseView,[1,2] a computational method for the automatic generation of two-dimensional protein-ligand complex diagrams. The layout is computed considering hydrophilic, hydrophobic and metal contacts between ligand and receptor. While the ligand and protein residues forming hydrophilic interactions to the ligand are drawn according to chemical structure diagram conventions, the hydrophobic contacts are visualized by means of splines around the ligand and the appropriate residue labels. PoseView is based on a combinatorial layout optimization strategy which solves parts of the problem non-heuristically. The computation is performed in a sequential manner: An initial ligand structure diagram is created and subsequently modified in order to find a non-intersecting arrangement of interaction lines. In the following the initial placement of each hydrophilic interacting amino acid is computed. During the placement collisions are resolved by a branch & bound algorithm selecting an optimal relative arrangement of all amino acids and the ligand. Finally, the remaining components of the complex diagram are placed based on an underlying arrangement grid.

For validation, PoseView was applied to the protein-ligand complexes contained in the Brookhaven PDB database. Advantages and limitations of the approach will be discussed by means of representative test cases.

For examples see www.zbh.uni-hamburg.de/poseview

Literature:

1.Stierand, K., Maaß, P., Rarey, M. (2006) Molecular Complexes at a Glance: Automated Generation of two-dimensional Complex Diagrams. Bioinformatics, 22, 1710-1716.

2.Stierand, K., Rarey, M. (2007). From Modeling to Medicinal Chemistry: Automatic Generation of Two-Dimensional Complex Diagrams. ChemMedChem 2, 6, 853-860.


COMP 148

A general interface to quantum chemistry simulations in VMD

 

Jan Saam, saam@ks.uiuc.edu1, John E. Stone, johns@ks.uiuc.edu2, Axel Kohlmeyer, akohlmey@cmm.chem.upenn.edu3, and Klaus Schulten, kschulte@ks.uiuc.edu1. (1) Beckman Institute, University of Illinois at Urbana-Champaign, 405 N. Mathews Ave., Urbana, IL 61801, (2) Beckman Intitute, University of Illinois at Urbana-Champaign, 405 N. Mathews Ave., Urbana, IL 61801, (3) Center for Molecular Modeling, Chemistry Department, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104
We describe our efforts in supporting quantum chemistry data in the VMD software package. VMD has long been used for visualization and analysis of classical molecular dynamics simulations, but representation of results from quantum chemistry software was limited to coordinates or precomputed orbital grids (e.g. cube files). Recent advances in the use of multi-core processors and massively parallel graphics processor provided an opportunity for truly interactive dynamic trajectory visualization of orbitals, the molecular electrostatic potential, etc. In combination with VMD's other powerful graphics capabilities this lays a foundation for new visualization paradigms for quantum chemistry data appealing to the chemist's intuition. Further, arbitrary postprocessing and analysis steps can be applied interactively or through scripting. New extensions to the VMD plugin interfaces allow the easy import of various data from a wide variety of quantum chemistry packages into VMD. Additional plugins assist in generating input for quantum chemical calculations.

COMP 149

Boltzmann 3D simulations for visualizing molecular motion in the classroom and laboratory

 

Randall B. Shirts, randy_shirts@byu.edu, Department of Chemistry and Biochemistry, Brigham Young University, C100 Benson Building, Provo, UT 84602
In addition to visualization of chemical structures, computers can also help in visualizing molecular motion. In particular, the distribution of molecular velocities is an essential concept in understanding gas laws, rates of diffusion and effusion, rates of evaporation, rates of chemical reaction, and the nature of equilibrium. Boltzmann 3D is a free Java application available at http://people.chem.byu.edu/rbshirts/research/boltzmann_3d that performs real-time simulation of hard spheres for classroom demonstrations or hands-on interactive laboratories from high school chemistry to graduate statistical mechanics. I will demonstrate the capabilities of this freeware including new modules for doing isothermal or adiabatic expansions/compressions and for kinetics and equilibrium.
 

COMP 150

Visualization of molecular orbitals and the related electron densities

 

Maciej Haranczyk, mharanczyk@lbl.gov1, Gunther Weber1, and Maciej S J Gutowski, m.gutowski@hw.ac.uk2. (1) Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Mail Stop 50F-1650, Berkeley, CA 94720, (2) Chemistry-School of Engineering and Physical Sciences, Heriot-Watt University, William H Perkin Building, Edinburgh EH14 4AS, United Kingdom
When plotting different molecular orbitals and the related electron densities with consistent contour values, one can create illusions about the relative extension of charge distributions. We have recently suggested that the comparison is not biased when plots reproduce the same fraction of the total charge. We developed an algorithm and software that facilitate this type of visualization. This presentation will illustrate the application of our tools in the analysis of molecular orbitals, the related electron densities, and the total electron densities of molecules. In addition, we will present approaches that can be useful in the analysis of the electron density fields but they have not yet been implemented in the mainstream visualization packages. An example of such approaches is the field topology analysis using contour trees representations.

This work is supported by the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.


COMP 151

Molekel: A program for the visualization of quantum chemistry data

 

Ugo Varetto, uvaretto@cscs.ch, Maria G. Giuffreda, mgg@cscs.ch, and Yun Jang, jangy@cscs.ch. Swiss National Supercomputing Centre - CSCS, Galleria 2 - Via Cantonale, Manno, 6928, Switzerland
Molekel is a multi-platform open-source molecular visualization program that can display 3-D models of chemical structures as well as the results of quantum chemistry computations. The presentation gives an overview of the main program features with a focus on the visualization and analysis of data read from the output of popular quantum chemistry packages such as ADF, Gaussian and GAMESS. The final part of the presentation covers the new hardware-accelerated visualization techniques available in future versions of Molekel which are used to enhance depth perception of 3-D structures and achieve very fast and high-quality display of electron density and molecular orbitals.


COMP 152

WebMO: Web-based, state-of-the-art, and cost effective computational chemistry

 

William F. Polik, Department of Chemistry, Hope College, 35 E. 12th Street, Holland, MI 49423 and Jordan R. Schmidt, Department of Chemistry, University of Wisconsin - Madison, Madison, WI 53706.
WebMO is a web-based interface to modern computational chemistry programs (GAMESS, Gaussian, Molpro, MOPAC, NWChem, PQS, Q-Chem). Using just a web-browser, users can draw 3-D structures, run calculations, and visualize results. WebMO is simple enough for novice users (reasonable defaults are provided; result are presented graphically) but flexible enough for experts (full access to input and output files is provided; templates allow customization of calculation types). WebMO is ideal for teaching at the undergraduate and graduate levels, for research students learning and using computational chemistry, and for creating input files and visualizing computed results.

Division of the History of Chemistry - Abstracts


HIST 1
Austin M. Patterson: Words About Words and his contributions to nomenclature

John B Sharkey, jsharkey@pace.edu, Department of Chemistry and Physical Sciences, Pace University, Pace Plaza, New York, NY 10038
Early in his long and distinguished career (1876-1956), Austin Patterson recognized the significant, ever-growing need for a better language of chemistry, and he went to work to help in providing such a language. He virtually devoted his life to the development of chemical nomenclature and to fostering good usage. He became widely recognized as the world's leading authority in this field. Perhaps the culmination of his life's work, undertaken during the last five years of his life, was the writing of a regular column in Chemical and Engineering News. This column, entitled “Words about Words” in the beginning and later just labeled “Nomenclature,” was widely read. In 1957, the ACS published e reproduction of his columns. According to E. J. Crane, who wrote the Preface, “This book is produced for use, but it is also offered for memory and for inspiration.” This paper will highlight Patterson's contributions to the field of nomenclature and review some of his more interesting columns.

HIST 2
Metaphorical matter: The language of alchemy

Anke Timmermann, Chemical Heritage Foundation, 315 Chestnut Street, Philadelphia, PA 19106
Alchemists wrote down their experiments, theories and observations in their own way long before the concepts of atoms and molecules, chemical formulae and reactions were articulated in the images and formulae familiar to us today. The practice and theory of alchemy were rooted in ancient traditions which had come to the Western world from Egypt, ancient Greece and the Islamic countries. It was believed that the convoluted language of alchemical writings could only be deciphered by initiated alchemists. Altogether, the language of alchemy provides a rather intriguing combination of alchemical information and symbolic expression. How, then, was it possible for alchemists to communicate practical and theoretical knowledge? This talk will discuss alchemy and its symbols with the help of examples (word and image) from the rare book collections in the Othmer Library.

HIST 3
Méthode de Nomenclature Chimique
revisited

Carmen J. Giunta, giunta@lemoyne.edu, Department of Chemistry and Physics, Le Moyne College, 1419 Salt Springs Rd, Syracuse, NY 13214-1399
The Méthode de Nomenclature Chimique, published in 1787, provided the basis for the systematic nomenclature of binary inorganic compounds still in common use more than two centuries later. The presentation will examine the component parts of this publication, particularly Lavoisier's memoir that advocated reforming and perfecting chemical nomenclature, Guyton de Morveau's memoir on developing the principles of the proposed systematic nomenclature, and glossaries of chemical names old and new.

HIST 4
Documenting the history of chemical nomenclature and symbolism

William B. Jensen, jensenwb@email.uc.edu, Department of Chemistry, University of Cincinnati, ML 172, Cincinnati, OH 45221-0172
The talk will review attempts by past chemical historians to document the history of chemical nomenclature and symbolism, ranging from coverage in standard histories of chemistry, such as those by Kopp and by Ihde, to specialized monographs, such as those by Caven and Cranston and by Crosland.

HIST 5Systematizing chemical nomenclature: IUPAC's Red Book and Blue Book

Roger A. Egolf, rae4@psu.edu, Pennsylvania State University, Lehigh Valley Campus, 8380 Mohr Lane, Fogelsville, PA 18051-9999
One of the original purposes of the International Union of Pure and Applied Chemistry at it foundation in 1919 was the unification of chemical nomenclature. Commissions of IUPAC published reports suggesting standardized nomenclature over many years, but it was not until 1955 that tentative rules for inorganic and organic nomenclature were published in Comptes Rendus. These rules were ratified at the 19th IUPAC Conference in 1957, then published as Nomenclature of Inorganic Chemistry – 1957, better known as the Red Book; and Nomenclature of Organic Chemistry – 1957, Section A Hydrocarbons, and Section B Fundamental Heterocyclic Systems, better known as the Blue Book. This paper will discuss the process by which these rules were agreed upon and published.

HIST 6
What's in a name?

Natalie Foster, nf00@lehigh.edu, Department of Chemistry, Lehigh University, 6 East Packer Ave, Bethlehem, PA 18015
“Organic Chemistry: The Name Game” is a good-humored text that explores the origins of contemporary terms in organic chemistry. The forward to this little gem of a book reminds us that in science, just as in literature, “language does not serve mankind only for communication any more than food serves only for nourishment.” This paper presents a selection of the stories behind the trivial names that are part of the language of organic compounds and chemical concepts. This excursion through the origin of names coined with reference to animals (felicene), architectural elements (peristylane), musical instruments (fidecene), food (sandwich compounds), and even head-coverings (diademane) illuminates the human side of chemistry highlights the strong links between words and pictures (names and shapes) that describe how chemists view the world.

HIST 7
mmCIF: A computer language for the representation of macromolecular structure

Julie B. Ealy, jbe10@psu.edu, Department of Chemistry, Pennsylvania State University, 8380 Mohr Lane, Academic Building, Fogelsville, PA 18051
The language of the macromolecular crystallographic information file will be described as presented in: Bourne, P. E., Berman, H. M., McMahon, B., Watenpaugh, K. D., Westbrook, J. D., and Fitzgerald, P. M. D. Methods in Enzymology, 1997, 277, 571-590. The language was developed to extend the Crystallographic Information File (CIF) data representation that is used to describe molecular structure. Visually, aspects of the Protein Data Bank will used to demonstrate various aspects of the language.

HIST 8
Putting it on the line: The Wiswesser line-formula notation system (WLN)

James J. Bohning, jjba@lehigh.edu and Ned Heindel. Department of Chemistry, Lehigh University, 6 E. Packer Ave, Bethlehem, PA 18015
The effort to reduce chemical structures of any complexity to a single line of letters, numbers and symbols began in the eighteenth century, but did not receive serious attention until the early computer age when in 1949 the IUPAC Commission on Codification, Ciphering, and Punched Card Techniques invited designers to submit their proposals for an internationally suitable notation system. Although IUPAC selected a system developed by G. M. Dyson, it was the WLN that won the most users, primarily through the determined efforts of its founder, William J. Wiswesser, who outlined the principles of the WLN in his 1954 monograph "A Line-Formula Chemical Notation." As Wiswesser explained, the WLN never enjoyed any IUPAC recognition, and had no other official approval. It earned user support “simply because it solved various information-managing needs with less cost and confusion than other internationally recognized alternatives.”

HIST 9
CAS REGISTRY: Its history and principles

Roger J. Schenck, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202
The CAS REGISTRYSM is the master collection of disclosed chemical substance information, with more than 45 million organic and inorganic molecules. This talk will focus on the nature of the CAS REGISTRY® Number as a unique identifier, and the principles and criteria for substances being added to the CAS REGISTRY. Examples will be given illustrating the breadth and depth of the CAS REGISTRY.

Division of Small Chemical Business - Abstracts


SCHB 5
Navigating social networking and collaboration tools

Christine Brennan Schmidt, c_schmidt@acs.org, Web Strategy & Operations, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036
The popularity of social networking and online collaboration tools is growing rapidly. The use of these tools, including LinkedIn, Yammer, Plaxo, and CollectiveX, in the professional world is becoming more common. Even the purely networking sites such as Facebook and Twitter are finding business use. Among these tools is the ACS Network, released in 2008. Learn more about various existing social networking and collaboration tools, the features they have, their differences and overlaps in functionality, and the activities they support. Hear about the current and future offerings of the ACS Network.

SCHB 6
Growing your chemical business? Let SciFinder be part of the process!

Marsha J. Davenport, mdavenport@cas.org, Chemical Abstracts Service, 12525 Plantation Drive, Brandywine, MD 20613
SciFinder, Chemical Abstract Service's computerized literature searching system, is now web-based! With SciFinder you can explore one single source for scientific information in journal and patent literature from around the world with new browser-based searching capabilities. This presentation will review the changing face of SciFinder, with the goal of familiarizing attendees with its new functionality

SCHB 7
Finding gold: Using internet resources to help make good business decisions

Anne Caputo, Special Libraries Association, 1025 Connecticut Avenue, NW, Suite 1103, Washington, DC 20036
The Internet provides access to a myriad of business resources useful to small businesses. Business directories, catalogs of specialized suppliers, market and competitive intelligence sources, and global business opportunities are the special resources used daily by information professionals in specialized libraries and information centers. Learn about the sources and methods used by these skilled professionals which translate into tools and opportunities for small businesses. Practical tools and search techniques feature access to Internet-based sources offering the greatest value to those wishing to maximize the potential of web content for business development and management.

SCHB 8
ChemSpider: Building a knowledge-based community for chemists using social and data networking technologies

A. Williams, antony.williams@chemspider.com, ChemZoo Inc, 904 Tamaras Circle, Wake Forest, NC 27587
In less than 2 years ChemSpider has become one of the primary online resources for chemists providing access to an unsurpassed aggregate of free-access knowledge and data. ChemSpider was developed with the intention of providing a structure centric community for chemists that would be enhanced by data depositions, curations and annotations by the community. The system presently hosts over 21.5 million chemical compounds from over 200 data sources. Working with a network of advisors, collaborators and data providers ChemSpider has created a unique resource of integrated information for chemists. These efforts have enabled us to support the curation of the Wikipedia chemistry pages, the production of a community supported Open Access chemistry journal and provision of web services integrated to spectrometer systems distributed around the world. This talk will provide an overview of how ChemSpider utilized social and data networking to create a community for chemistry.

SCHB 9
The "design approach" to creating effective websites

Mark D. Carpenter, M_Carpenter@acs.org, Web Strategy and Operations, American Chemical Society, Washington, DC 20036
Designing and maintaining a customer friendly web site is crucial to any small business success, as nearly all customers are relying on the Internet to find information about the companies they do business with and the products and services they buy. This presentation will explore how ACS uses customer feedback to build user centric web experiences that enable members to get information quickly and easily. Other examples of the best practices for building successful and engaging web sites will be presented, so that small and growing businesses can increase their exposure in the crowded Internet.

SCHB 10
Effective use of the Internet to improve market share, drive sales, and increase customer loyalty

Aaron R. Warner, awarner@idtdna.com, Integrated DNA Technologies, Inc, 1710 Commercial Park Rd., Coralville, IA 52241
This presentation will reveal unorthodox marketing methods that have been used to gain market share; the principles and technologies that have helped streamline a complex design and ordering process; and the systems that have been assembled to calculate the return on investment for the marketing and ordering tools. IDT has employed a unique mix of custom applications and third-party tools to exceed the expectations of customers before, during, and after orders are placed. Topics presented will include IDT's no-charge bioinformatics offerings, the use of search engines, external integration methods, and internal software development. For each, discussion will focus on what has worked, what has not, and reasons why. IDT is a custom manufacturer of synthetic oligonucleotides and genes for the research and diagnostic markets, with over 80,000 active customers worldwide, and accepts more than 85% of it's orders through its website.