#238 - Abstracts

ACS National Meeting
August 16-20, 2009
Washington, DC

 1 U.S. EPA computational toxicology programs: Central role of chemical-annotation efforts and molecular databases
Ann M. Richard1, richard.ann@epa.gov, Maritja A. Wolf2, wolf.marti@epa.gov, ClarLynda R. Williams- Devane3, williams.clarlynda@epa.gov, and Richard Judson1, judson.richard@epa.gov. (1) National Center for Computational Toxicology, U.S. EPA, Research Triangle Park, NC 27711, (2) Lockheed Martin, Contractor to the U.S. EPA, Research Triangle Park, NC 27711, (3) National Health & Environmental Effects Research Lab, U.S. EPA, Research Triangle Park, NC 27711

EPA's National Center for Computational Toxicology is engaged in high-profile research efforts to improve the ability to more efficiently and effectively prioritize and screen thousands of environmental chemicals for potential toxicity. A central component of these efforts involves the construction and integration of toplevel chemically indexed, structure-searchable databases of historical toxicology data, including: 1) high quality structure-annotated toxicity data files from public resources (DSSTox); 2) a relational database of detailed toxicology studies from EPA regulatory programs (ToxRef DB); and 3) a publicly available relational database broadly spanning chemical data resources pertaining to environmental toxicology on the Internet (ACToR). Challenges of chemical structure annotation and indexing of public resources broadly pertaining to environmental toxicology will be described, highlighting DSSTox, ACToR, and recent efforts to annotate and publish chemical (treatment)-experiment index files for the primary public microarray database repositories, GEO and ArrayExpress. This abstract does not necessarily reflect EPA policy.

 2 Linking public and commercial chemical data: ChemSpider and SureChem
Nicko Goncharoff, nicko.goncharoff@surecheminc.com, SureChem, Inc, 2255 Van Ness Avenue, Suite 101, San Francisco, CA 94109

The scientific community is calling for better integration of public and commercial databases, particularly in chemistry. This presentation discusses the linking of ChemSpider, the Open Access internet chemistry database, with SureChem, a proprietary online structure-searchable patent database. Topics will include a technical review of the integration, use of chemical identifiers to ensure consistent searches across both databases, the end user search experience and benefits to the scientific community.

 3 Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources
A J Williams, tony@chemspider.com, ChemZoo, 904 Tamaras Circle, Wake Forest, NC 27587

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles call now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.

 4 Turning mining inside out
Colin R Batchelor, batchelorc@rsc.org, Royal Society of Chemistry, Thomas Graham House, Milton Road, Cambridge CB4 0WF, United Kingdom

The Royal Society of Chemistry now has several years of experience of identifying chemistry through text mining, and combining this with editorial QA and publishing standards to enhance our publications. We are using our award winning project RSC Prospect to show some of the benefits of applying new standards such as ontologies and the InChI to our journal articles. By apply them specifically to our areas of chemical science publishing, we have added a layer of semantic enrichment to articles that enable them to be found and link to other sources whether other publication, databases or reference information. The movement to use the results of traditional text mining, done on a limited of internal set of documents, to the wider scientific information world through the application of standards, will be hugely significant for the publication and use of scientific information in the years to come.

 5 Chemreader: A tool for extracting chemical structure information from digital raster images
Jungkap Park, jungkap@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan, Department of Mechanical Engineering, 3211 EECS, 2350 Hayward Street, Ann Arbor, MI 48109, United Kingdom, Kazu Saitou, kazu@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan, Department of Mechanical Engineering, 3211 EECS, 2350 Hayward Street, Ann Arbor, MI 48109, Kerby Shedden, kshedden@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan, Department of Statistics, 461 West Hall, 1085 S University, Ann Arbor, MI 48109, and Gus R. Rosania, grosania@umich.edu, Michigan Alliance for Cheminformatic Exploration, University of Michigan College of Pharmacy, Department of Pharmaceutical Sciences, 428 Church Street, Ann Arbor, MI 48109

Annotation of virtual libraries of small molecules ultimately involves linking entries in a database to relevant patents and research articles. Chemreader is a machine vision tool to automate conversion of chemical diagrams in analogue images into standard chemical file formats. Chemreader builds on advances in chemical object character recognition made over the past fifteen years. To facilitate database annotation, algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams are run independently and in sequence -so that input parameters can be tailored to a desired chemical database annotation scheme. Introduction of a chemical spell-checker functionality can be used to automatically assess errors in Chemreader's output. Furthermore, pre-processing filters can be used to eliminate sub-standard (ie. low resolution or noisy) images from the input. For a database annotation task, Chemreader can be adjusted at a user-defined level of accuracy, to optimize the relevance or number of useful links.

 6 Exploiting a hidden treasure: Automated chemical entity recognition in Chemisches Zentralblatt
Valentina Eigner-Pitto, ve@infochem.de, Heinz Saller, and Peter Loew, InfoChem GmbH, Landsberger Strasse 408, Munich 81241, Germany

The German publication Chemisches Zentralblatt was the first chemistry abstract collection in history starting in 1830 and contains 140 years of research progress in chemistry and chemical knowledge. Modern scan- and OCR-software technology was utilized to make the entire content of this unique reference work available for full-text retrieval, but a solution offering chemical structure search seemed to be unfeasible as this work is written in German, the original document quality is not consistent, and numerous obsolete compound names occur. This talk describes our approach to identify and extract chemical compounds automatically from the text and convert them into a structure database. The process is based on the systematic training and enhancing of the OCR, the Annotation and the Name-to-Structure process using specifically developed German dictionaries. A web-based prototype application is implemented providing structure, substructure and similarity search with the hits linked back directly to the original pages of Chemisches Zentralblatt

 7 NIH public access policy
Neil M. Thakur, Special Assistant to the NIH Deputy Director for Extramural Research, National Institutes of Health, One Center Drive, Building One - Room 140, Bethesda, MD 20892-0152, Fax: 301- 402-3469

This presentation will provide an overview of the NIH Public Access Policy and compliance strategies for NIH authors and awardees. It will describe key policy details, methods to submit papers in compliance with the Policy, and methods to document compliance with the Policy. Time will be allotted for audience comments and questions.

8 Three revolutions
George O. Strawn, Office of Information and Resource Management, National Science Foundation, Arlington, VA 22230

The digital computer has spawned many important developments, one of which, electronic communication of scholarly information, may be maturing at this time. This talk will compare electronic communication of scholarship with two other computer-related disruptions, review recent developments in this area, and speculate on its importance for the future of science and engineering. In addition to this historical perspective, NSF policies and activities will also be described.

9 STM publishers and author rights
Eric S. Slater, e_slater@acs.org, Publications Division, Copyright Office, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112

This session will feature discussion pertaining to author rights, and ways that ACS and other STM publishers are addressing this broad issue. Many publishers that require transfer of copyright grant a number of rights back to authors and their employers; however, there is a perception that publishers don't do this. The reality is publishers are more “generous” than what is perceived, and achieve this in a positive way by carefully attempting to balance its own rights as copyright holder with the rights of authors and the user community at large.

10 Perils of parallel publishing systems: The ramp-up of institutional and subject-matter repositories and potential impact on journal subscriptions
Mark Seeley, Legal Department, Elsevier, 30 Corporate Drive - Suite 400, Burlington, MA 01803, Fax: 781-313-4814

The copying and sharing of journal articles by authors has been accepted by journal publishers, either expressly (journal publishing agreements which permit sharing/posting of some versions of articles) or implicitly (by lack of enforcement or objection) for many years. Generally journal publishers accept that scholars need to quickly share their work with colleagues and researchers in the field, as part of the “informal” communication systems, and the view has been that the risk to traditional subscription and “pay by the drink” document delivery businesses from such informal sharing is low to moderate. The development of major pre-print servers such as arXiv.org, subject repositories run by funding agencies such as NIH (the Public Access database in PMC), and institutional repositories run on a systematic basis (with mandates such as those proposed at Harvard, MIT, Southampton and others), raises the question about whether such repositories and the versions of articles posted on such sites, are “good enough” to serve as compelling substitutes for journal subscriptions and individual article sales.

 11 Trends in rights management: A copyright clearance perspective
Edward Colleran, Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, Fax: 978-646- 8600

Trends in the Use of Copyrighted Works-a Copyright Clearance Center Perspective.

12 Online chemical modeling environment: database
Sergii Novotarskyi1, Iurii Sushko1, Robert K_rner1, Anil Kumar Pandey1, and Igor V. Tetko2, itetko@vcclab.org. (1) Helmholtz Zentrum Muenchen German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology, Ingolstaedter Landstrasse 1, Neuherberg D-85764, Germany, (2) Helmholtz Zentrum Muenchen German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology, Ingolstaedter Landstrasse 1, Neuherberg 85764, Germany

The main goal of our database http://qspr.eu is to collect, store and manipulate chemical data with the purpose of their use for model development (see our presentation at COMP). It's main features, that distinguish it from other available databases include 1) the database is open and it is based on Wiki-style principles. We encourage users to submit data and to correct inaccurate submitted data; 2) the database is aimed at collecting high-quality data. To achieve this we require users to submit references to the article, where the data was published. The reference may include the article name, journal name, date of publication, page number, line number, etc. 3) Since the compound properties may vary depending on the conditions, under which they were measured, we store the measurement conditions with the data to provide the users with more accurate information about each data point. The examples of the use of the database within national and EU projects will be exemplified.

13 Public molecular databases: How can their value be increased by generation of additional data in silico?
Vladimir V. Poroikov1, vladimir.poroikov@ibmc.msk.ru, Dmitry Filimonov1, dmitry.filimonov@ibmc.msk.ru, and Marc C. Nicklaus2, mn1@helix.nih.gov. (1) Russian Academy of Medical Science, Institute of Biomedical Chemistry, Pogodinskaya Str., 10, Moscow 119121, Russia, Fax: 007-499-245-0857, (2) Laboratory of Medicinal Chemistry, National Cancer Institute, National Institutes of Health, Frederick, MD 21702

Public molecular databases (PubChem, ChemIDplus, NCI, DSSTox, ChemBank, PDSP, libraries of commercially available samples, etc.) contain data on compounds' identifiers, structural formulae, physicochemical characteristics, biological activities, etc. In addition to experimentally determined properties, various calculated data (log P, number of H-donors and acceptors, drug-likeness, biological activity spectra, a.o.) are also included in some databases. Since for the majority of compounds in public molecular databases numerous experimentally determined parameters are unknown, the question arises: Could one generate the missing data in silico to obtain the total data profile for each molecule? Due to the continued progress in both accuracy of computational methods and performance of computer techniques, this task looks quite realistic for the near future. However, the possibility of transforming such calculated data into useful information and thence into true knowledge depends not only on the accuracy of the data itself but also on the ability of the end users to perceive these data. Thus, the increase in value of calculated data could be achieved through the development of intelligent computational-informational systems, which should combine reliable computational methods with explanatory scenarios suitable for solving practical tasks. We discuss the possibilities and limitations of creating such systems.

14 Chemical space management of large libraries for new active small molecules selection for prostate cancer treatment.
Andrew V. Scorenko, avs@iihr.ru, Computational Chemistry, Chemical Diversity Research Institute, Rabochaya St. 2-a, Khimki, Moscow Region 141400, Russia, Fax: +7-495-626-9780, Andrei A. Gakh, gakhaa@ornl.gov, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6242, Andrey V. Sosnov, sva@iihr.ru, Chemical Diversity Research Institute, 2a Rabochaya St., Khimki, Moscow Region 114401, Russia, and Mikhail Yu. Krasavin, MedChem, Chemical Diversity Research Institute, Rabochaya St. 2-a, Khimki, Moscow Region 141400, Russia

Identification of primary hit compounds through large databases (over million compounds) and their further optimization into high-content series in the early drug discovery are described in this work. We carry out permanent global monitoring of new biotargets and active compounds related to cancer treatments, modify our bank of compounds and use them for effective drug development and chemical space management. We use available approaches for effective selection of new small molecules for prostate cancer treatment. More effective biotargets were generally selected, classified and studied. Base on the knowledge, we developed computational algorithms of bioisosteric transformation rules, comprehensive chemical space filters, ligand-based model search, etc. The selected massive was classified by chemotypes. The final selection was performed by ORNL specialists. As a result in this work about 5,000 compounds were recommended for high-throughput screening. About 300 compounds were identified as active and went to further optimization. This research was supported by the Global IPP program through the International Science and Technology Center (ISTC). Oak Ridge National Laboratory is managed and operated by UT-Battelle, LLC, under U.S. Department of Energy contract DE-AC05-00OR22725. This paper is a contribution from the Discovery Chemistry Project.

15 Crowdsourcing nonaqueous solubility and synthesis using Open Notebook Science
Jean-Claude Bradley1, bradlejc@drexel.edu, Khalid Mirza1, Rajarshi Guha2, rguha@indiana.edu, Andrew Lang3, gameshoncho@hotmail.com, and A. Williams4, antony.williams@chemspider.com. (1) Department of Chemistry, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, (2) School of Informatics, Indiana University, 1130 Eigenmann Hall, 1900 E 10th Street, Bloomington, IN 47406, (3) Department of Mathematics, Oral Roberts University, 7777 S. Lewis Ave, Tulsa, OK 74171, (4) ChemZoo Inc, 904 Tamaras Circle, Wake Forest, NC 27587

The use of Open Notebook Science to collect and make publicly available the solubility measurements of aldehydes, primary amines and carboxylic acids will be described. This involves the real time sharing of all experiments and associated raw data by a community of collaborators who are geographically distributed and may have never communicated using channels other than this project. Monthly cash prizes were awarded to participating students by means of the ONS Challenge Submeta Awards. The laboratory notebook pages are recorded on a public wiki and the solubility measurements, including relevant calculations, are stored in public Google Spreadsheets. A combination of ChemSpider, the GoogleDoc visualization API and web services is used to enable flexible searching and display of desired subsets of the data. The utility of the project will be illustrated by exploring optimal solvent selection for a Ugi reaction.

16 ChemXSeer: A cyberinfrastructure for environmental chemical kinetics
Karl T. Mueller1, ktm2@psu.edu, William J. Brouwer1, wjb19@psu.edu, C. Lee Giles2, clg20@psu.edu, Prasenjit Mitra2, pum10@psu.edu, and Carl Lagoze3, lagoze@cs.cornell.edu. (1) Department of Chemistry, Penn State University, 104 Chemistry Building, University Park, PA 16802, Fax: (814) 863- 8403, (2) College of Information Sciences and Technology, Penn State University, 104 Information Sciences and Technology Building, University Park, PA 16802, (3) Computing and Information Science, Cornell University, 301 College Avenue, Ithaca, NY 14850

A main goal of interdisciplinary geochemistry and environmental chemistry research at Penn State has been the integration of experimental, analytical, and simulation results from the molecular to the field scales. Such an undertaking requires the synthesis of large amounts of data, especially those data related to measurements and modeling of chemical kinetics. We will report here on our development of the ChemXSeer architecture as a portal for academic researchers in the area of environmental chemical kinetics, which integrates the scientific literature with experimental, analytical and simulation datasets. ChemXSeer (chemxseer.ist.psu.edu) offers unique aspects of search not yet present in other scientific search services: for example, we will demonstrate tools for the extraction of tables, figures, equations and formulae from scientific documents. Included in ChemXSeer are searchable databases of computational results on molecular and surface structures using a number of methods (DFT, molecular dynamics, Monte Carlo, etc.). Future directions will include the deployment of oreChem, the chemistry domain implementation of the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) Project, a developing standardized, interoperable, and machine-readable methodology to express information about compound information objects on the web. oreChem will provide a model to aggregate documents, data, and metadata in chemistry.

17 WITHDRAWN:Mining a large reaction database with name reaction patterns
Matthew A. Kayala1, mkayala@ics.uci.edu, Qian-Nan Hu1, qhu@uci.edu, Jonathan H. Chen1, chenjh@uci.edu, James S. Nowick2, jsnowick@uci.edu, and Pierre Baldi1, pfbaldi@uci.edu. (1) Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697, (2) Department of Chemistry, University of California, Irvine, 4126 Natural Sciences 1, Irvine, CA 92697-2025

Over the past several years, comprehensive data sets of small chemical compounds, such as our own ChemDB (http://cdb.ics.uci.edu), have been made publicly available for statistical analysis and data mining purposes. However, access to reaction data resources is comparatively restricted. With data largely unavailable, how to approach knowledge discovery in reaction databases is an open question. One potential method for data mining is to classify reactions using pattern matching rules. We present initial results on mining 2,000,000+ well-annotated reactions from a database of published reactions (SPRESI). Here, we have hand-composed 500+ SMIRKS language patterns to cover 306 common `Name Reactions'. The rules provide a broad classification of the database into a small number of classes based on net structural changes. To facilitate future research, a tool to classify reactions using the patterns has been made available as part of ChemDB.

18 Reliable reactions and stable structures
Jonathan M Goodman, J.M.Goodman@ch.cam.ac.uk, Unilever Centre for Molecular Science Informatics, Cambridge University, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362

Databases are not useful unless they are reliable. Public databases have the potential to be more reliable than ones that are protected by carefully designed license conditions, because it may be possible to check the data and publish the results without the constraint of restrictive intellectual property agreements. I will describe ways in which we look at the reliability of databases and try to improve the quality of chemical information. The small on-line molecular databases available on our website have been used to test the utility of InChI-based descriptions of molecules and how they can be effectively searched and retrieved.

19 Copyright and student theses: Challenges of the modern world
Kevin P. Gable, kevin.gable@oregonstate.edu, Department of Chemistry, Oregon State University, 153 Gilbert Hall, Corvallis, OR 97331-4003, Fax: 541-737-2062

Migration of publishing and distribution to electronic media has a number of important implications for preparation and publication of student theses. This presentation will frame the challenges from the perspective of the academic interest in training the student.

20 Implementing an open access policy at Trinity University
Steven M. Bachrach, sbachrach@trinity.edu, Department of Chemistry, Trinity University, 1 Trinity Place, San Antonio, TX 78212, Fax: 210-999-7569, Jorge Gonzalez, Department of Economics, Trinity University, 1 Trinity Place, San Antonio, TX 78212, and Diane Graves, Coates Library, Trinity University, 1 Trinity Place, San Antonio, TX 78212

Ever-increasing costs of journals, particularly in the STM fields, have caused a decreasing ability of scientists and laymen to access needed information. The Open Access initiatives were developed in part to create universal access to the publications. Following on the lead of the Faculty of the School of Liberal Arts and Sciences at Harvard University, the talk presents how Trinity University is maneuvering to adopt a similar policy. This policy will require Trinity faculty to assign limited copyrights to the University. This limited set of rights includes the ability to distribute the articles at no cost, principally through an institutional repository. Efforts to coordinate adoption of this policy at other undergraduate institutions will be discussed.

21 Copyrights, contracts, and the common good: Making noncongressional law, and making it work for us
Sherwin Siy, Global Knowledge Initiative, Public Knowledge, 1875 Connecticut Avenue, NW, Suite 650, Washington, DC 20009, Fax: 202-986-2539

Most often, copyright affects us not through the operation of the law directly, but in contracts and licenses that leverage the powers that those laws grant to authors. In evaluating whether those licenses are good policy, we need to examine their effects on the two entities that copyright law exists to benefit: authors and the public. Copyright law confers a benefit on authors by granting them limited monopoly rights. This benefit, though an integral part of copyright, is a means to a larger end—a benefit to the public by incentivizing the creation of new works that the public can access. Contracts and licensing agreements that manipulate these rights can have positive or negative policy effects—all without having any effect on the underlying law. The NIH open access policy works as just such an agreement, and can be evaluated in comparison with other contracts and licenses—like free/open source licenses or end user license agreements.

22 SPARC: The Scholarly Publishing and Academic Resources Coalition
Heather Joseph, SPARC, American Research Libraries (ARL), 21 Dupont Circle, Suite 800, Washington, DC 20036

SPARC, the Scholarly Publishing and Academic Resources Coalition

23 Science Commons: A project of Creative Commons
Michael W. Carroll1, Thinh Nguyen2, and John Wilbanks2. (1) Washington College of Law, American University, 4801 Massachusetts Ave., N.W, Washington, DC 20016, Fax: 202-730-4756, (2) Science Commons, Creative Commons, 171 Second Street, Suite 300, San Francisco, CA 94105

Science Commons.

24 An integrated approach in the search of GABA aminotransferase inhibitors
Savita Bhutoria, savita_rs@iicb.res.in and Nanda Ghoshal, nghoshal@iicb.res.in, Structural Biology and Bioinformatics Division, Indian Institute of Chemical Biology, Jadavpur, kolkata, India

gamma aminobutyric acid is the inhibitory neurotransmitter in the mammalian central nervous system. The major pathway for its degradation involves the pyridoxal phosphate (PLP) dependent enzyme, GABA aminotransferase (GABA-AT). Designing GABA-AT inhibitors is trivial task, first because of very small enzyme active site and second the inhibitor should first react with PLP for enzyme inactivation. The inhibitors can attack the enzyme reversibly and irreversibly depending on the fact that inhibitor binds to only PLP or with PLP and protein. The solution applied here involved a set of multiple approaches together for designing new inhibitors. Using a virtual library, created by LUDI based fragments, substructures and subsequent isosteric group replacement, molecules were screened with structure guided multiple pharmacophores having the reversible and irreversible attacking functionalities. A set of similarity assessment methods and clustering was employed to recommend compounds for screening in a prospective docking experiment. The inhibitors should first react with the PLP and then with the enzyme, so a strategy was used in which hits were analyzed and validated by their tendency to react with PLP and formation of ternary complex. The hits selected were further evaluated and prioritized using QSAR analysis, which included the shape of the molecule into account and other important electronic and structural attributes of the molecules. Thus here a combined virtual screening and QSAR methodology is used to target the GABA-AT enzyme, reversibly and irreversibly. The new actives contained different underlying chemical architecture to the known inhibitors, results indicative of successful scaffold-hopping.

25 WITHDRAWN: Descriptor importance of HIV-1 protease crystal structures for QSAR using random forest
Gene M. Ko1, gko@sciences.sdsu.edu, A. Srinivas Reddy2, asvreddy@gmail.com, Sunil Kumar3,skumar@mail.sdsu.edu, and Rajni Garg1, rgarg@mail.sdsu.edu. (1) Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-1245, (2) Electrical and Computer Engineering Department, San Diego State University, 5500 Campanile Drive, C/O Sunil Kumar, San Diego, CA 92182-1309, (3) Electrical and Computer Engineering Department, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-1309

Random forest (RF) is a machine learning classifier that comprises of a collection of unpruned classification trees generated by using bootstrap samples of the data with random feature selection. Unlike many other machine learning techniques, RF has the advantage of determining the importance of all the variables in the dataset. The crystal structures of 62 HIV-1 protease binding pockets complexed with one of the nine FDA approved protease inhibitors deposited in the Protein Data Bank were studied. Quantitative understanding of the nature of the binding pockets would drive us to design novel inhibitors for HIV-1 protease. The descriptors have been computed for the binding pocket of each crystal structure, yielding 462 constitutional, topological, geometric, electrostatic, and quantum mechanical descriptors which can be used for deriving the Quantitative structure-activity relationship (QSAR). The optimal tree size (ntree) using the default sampling parameter (mtry) of 21 was determined to be 334 with an out-of-bag error of 45.2%. Adjusting the mtry parameters using 334 trees consistently produced the same highly ranked descriptors in the top ranked group of features, which confirms the stability of the classifier trees. The top ranked descriptors will be used to derive a QSAR model for bioactivity prediction.

26 Finding renewable energy materials one screensaver at a time
Roel S. S_nchez-Carrera1, rsanchez@fas.harvard.edu, Leslie Vogt2, lvogt@fas.harvard.edu, Roberto Olivares-Amaya2, olivares@fas.harvard.edu, and Al_n Aspuru-Guzik2, aspuru@chemistry.harvard.edu. (1) Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, (2) Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford St, Cambridge, MA 02138

Renewable energy technologies rely on materials with the ability to efficiently harness and transport energy from renewable sources. Recent advances in the field of computational chemistry have brought us closer towards an accurate prediction of the photovoltaic properties of a given molecular material even before experimental synthesis. However, scanning the vast chemical space in a single computer represents a difficult proposition. Working together with IBM's World Community Grid effort, we developed a screensaver (http://cleanenergy.harvard.edu), which allows individual users anywhere in the world to contribute their idle computer time to perform electronic structure calculations on combinatorial molecular libraries derived from fused aromatic molecules. The deployment of such a world-wide distributed computational engine has the potential to quickly find novel materials for the next generation of solar cells. The preliminary results of our combinatorial strategy will be presented. The preparation of a publicly available database of molecular structures and calculated properties will be also discussed.

27 WITHDRAWN: Mining a large reaction database with name reaction patterns
Matthew A. Kayala1, mkayala@ics.uci.edu, Qian-Nan Hu1, qhu@uci.edu, Jonathan H. Chen1, chenjh@uci.edu, James S. Nowick2, jsnowick@uci.edu, and Pierre Baldi1, pfbaldi@uci.edu. (1) Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697, (2) Department of Chemistry, University of California, Irvine, 4126 Natural Sciences 1, Irvine, CA 92697-2025

Over the past several years, comprehensive data sets of small chemical compounds, such as our own ChemDB (http://cdb.ics.uci.edu), have been made publicly available for statistical analysis and data mining purposes. However, access to reaction data resources is comparatively restricted. With data largely unavailable, how to approach knowledge discovery in reaction databases is an open question. One potential method for data mining is to classify reactions using pattern matching rules. We present initial results on mining 2,000,000+ well-annotated reactions from a database of published reactions (SPRESI). Here, we have hand-composed 500+ SMIRKS language patterns to cover 306 common `Name Reactions'. The rules provide a broad classification of the database into a small number of classes based on net structural changes. To facilitate future research, a tool to classify reactions using the patterns has been made available as part of ChemDB.

28 Predicting metabolic transformation by cytochrome P450 main isoforms
Maayan Elias1, maayan.elias@mail.huji.ac.il, David Marcus2, david.marcus1@mail.huji.ac.il, and Amiram Goldblum2, amiram@vms.huji.ac.il. (1) Department of Medicinal Chemistry, School of Pharmacy, The Hebrew University of Jerusalem, Jerusalem 91120, Israel, (2) Department of Medicinal Chemistry, Hebrew University of Jerusalem, Grass Center for Drug Design and Synthesis, and Sudarsky Center for Computational Biology, Jerusalem 91120, Israel

Cytochrome P450 is a heme containing protein superfamily, responsible for most of the metabolic transformations taking place in the human body. Several isoforms are responsible for most xenobiotic transformations in the liver. Iterative Stochastic Elimination (ISE) was used to build classification models for predicting substrates and inhibitors of the isoforms 3A4,2D6,1A2 and 2C9. We constructed curated databases of substrates and inhibitors from compounds published in the literature. Models used molecular properties (2D descriptors) that were picked by optimizing the huge combinatorial problem of choosing a small subset of properties and their ranges from a large set of descriptors, by ISE. ISE models may be applied to molecular databases of any size and used to score each molecule's fitness to a specific model. We applied the models of P450 isoforms to search for new substrates and inhibitors and constructed a library of molecules that have high probabilities for becoming substrates or inhibitors of these isoenzymes. Isoform selectivity was studied by providing a matrix containing cross information from individual models. With this tool we can predict the metabolic potentials of investigated compounds and use these models as a screening tool for molecules in the drug discovery pipeline.

29 Sphericity and oblate-prolate indices: 3D shape descriptors for fast shape comparison
Sunghwan Kim, kimsungh@ncbi.nlm.nih.gov, Evan Bolton, bolton@ncbi.nlm.nih.gov, and Stephen H. Bryant, bryant@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894

If molecules are structurally similar to each other, they are likely to have similar biological and physicochemical properties. This so-called "similarity principle" is an important concept in ration drug design, which is applied to find potential drug candidates at the initial stage of drug discovery and development. Although the shape-Tanimoto (ST) value between two molecules is considered to be an accurate measure for 3-dimensional (3D) molecular shape comparison, the ST value computation is not fast enough to screen a huge molecular library which typically contains more than billions of conformers. In the present study, the shape quadrupole moments of a molecule were used to devise two 3D shape descriptors (the sphericity index and the oblate-prolate index) that allow a simple and fast shape comparison between molecules.

30 Federated search. An in-depth introduction
Abe Lederman, abe@deepwebtech.com, Deep Web Technologies, 301 N. Guadalupe Street, Ste. 201, Santa Fe, NM 87501

What is federated search? How do its users benefit? How does the technology unleash access to high quality scientific content hidden in the "deep web" and why is it that Google and the other "web crawlers" don't find much of this content? This comprehensive introduction to federated search will answer these questions and more. Topics include how the technology works and also the finer points of how quality of results, user interface, and results delivery matter. The important concepts of federated search will be reinforced through chemistry research via a demonstration using Deep Web Technologies' recently released science search portal, ScienceResearch.com.

31 Scitopia.org: Case study on using federated search to enable science and engineering research
Naveen K. Maddali, n.maddali@ieee.org, Product Management and Business Development, IEEE, 59 Woodhill St, Somerset, NJ 08873

Scitopia.org is a federated search portal that searches on Engineering and science society publisher websites. It was designed to provide a single location for researchers to retrieve high quality research articles. Scitopia was created and managed by 20+ societies, whose primary motivations to create the partnership were to promote the value of society literature and to bring new traffic into their society libraries. To date, Scitopia has had success, but there have been some challenges and obstacles. Integrating the content of 20+ diverse societies, competing in a highly competitive market and working with limited resources are some of the challenges that have been presented to the partnership. But in the end, has Scitopia been able to address the needs of the research community and has the federated search model worked? These will be evaluated.

32 SeerSuite for distributed indexing, federated search, and meta search
C. Lee Giles, giles@ist.psu.edu, Information Sciences and Technology, Pennsylvania State University, 101 Information Sciences and Technology Building, University Park, PA 16802, Fax: 814-865-7882, Prasenjit Mitra, Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, and Karl Mueller, Chemistry, Pennsylvania State University, University Park, PA 16802

Scalability has always been a challenge for search and retrieval systems. As more information becomes available, indexing and retrieval mechanisms to provide accurate and precise coverage of the objects and stores becomes more complex and less reliable. An example of this can be seen in the existing search engines on the world wide web. Scalability for information retrieval with search has been addressed by various methods, prominent among these include distributed indexing. This approach is in contrast to federated search, where the goal is to assemble multiple indices, covering different data sources and making them accessible, through a single interface. Federated search improves both performance and scope of information presented. While it has been argued that metasearch is a form of federated search since both span multiple data sources, a distinction can be made. In federated search the data sources may not overlap, and, therefore, coverage of the indices may not necessarily overlap. While different approaches exist to address the scalability issue, the tasks performed are consistent; query assembly and transformation, communication with multiple indices, extracting information from query results, mapping and merging these results into ordered lists in user defined formats. These approaches to scalability and distribution are not mutually exclusive; for example, a federated search can occur over multiple distributed indices. To address many issues in academic document search and indexing, we have developed many unique search tools which we call SeerSuite. These tools have allowed us to build a collection of academic search services - CiteSeerX,ChemXSeer, ArchSeer – which offer some of the largest publicly available collections of scientific literature and data on the web plus many unique metadata extraction features. We propose incorporating the SeerSuite services to give access to niche scalable search through a scalable federated search interface that has unique trainable metadata.

33 Delivering content to end users at their point of need means going beyond federated search
Brian P. Cannan, brian_cannan@oclc.org, Licensed Content Portfolio, OCLC, Inc, 6565 Kilgour Place, Dublin, OH 43017, Fax: 614-718-7073, Matthew Goldner, matthew_goldner@oclc.org, End User Services, OCLC, Inc, 6565 Kilgour Place, Dublin, OH 43017, and Mindy Pozenel, WorldCat Discovery Services, OCLC, Inc, 6565 Kilgour Place, Dublin, OH 43017

In a world where users only want to view the content that most directly relates to their research requirements, OCLC has undertaken a major initiative to better connect library audiences globally with the content libraries license for them. Beginning in April 2007, over 60 million article citations have been added to WorldCat from NLM, ERIC, GPO, Elsevier, the British Library and the ArticleFirst_ databases. This action reflects the need to connect users with the content licensed for them where users are working – web destinations such as search engines, FaceBook, Google Book Search, Google Scholar or Yahoo! Search, as well as their library. Continuing this effort to enrich the search experience of WorldCat users and improve the discoverability of this authoritative content, means addressing the challenge of increasing its visibility to these users where they are working, while protecting the Intellectual Property Rights of content providers.

34 21st century library: The preferred starting point for serious research?
Helle Lauridsen, helle.lauridsen@serialssolutions.com, SerialsSolutions, Discovery Services, ProQuest, Kastedvej 37, 8200 Aarhus, Denmark

The move from print library to e-library in the past 10 years is causing huge and rapid changes in the access to information. Increasingly complicated web pages has been built by both publishers and libraries in order to show case the virtual cornucopia in the best possible way. But do they work? Can researchers and students find their way to the best possible resource instead of just the best known or the most convenient? Why is it that Google is barging ahead as the preferred start point for search – when research clearly shows that users do know that the library has the most reliable and trustworthy resources? This talk will investigate some of the attempts there has been to solve this problem and discuss the latest solution.

35 Using federated search to improve your ROI and boost research capabilities
Stephen R. DiStasio III, stephen.distasio@serialssolutions.com, Product Management- Resource Discovery, Serials Solutions/ProQuest, 501 N. 34th Street, Suite 200, Seattle, WA 98103

Federated Search exists today to serve a basic need- allowing the search of many resources from a single search box saving time and effort. However, by solving one problem federated search has created anotherhow is a researcher supposed to navigate through thousands of results from many resources with disparate areas of expertise? How many types of results will a researcher see if they search for "Magellan" in 300 different resources? "Magellan" means a lot of things... This session will explore methods of results handling that will help your patrons dig out from underneath the "avalanche" of results and find the information they need with minimal clicks. We will also discuss how federated search enables the "discoverability" of the expensive subscription resources in your library and ensures that you see a return on investment for those subscription dollars.

36 100 Years of Houben-Weyl and Science of Synthesis: Why you should care
Thomas Krimmer, thomas.krimmer@thieme.de, Thieme Chemistry, Georg Thieme Verlag, Ruedigerstrasse 14, Stuttgart 70469, Germany, Fax: +49-711-8931777

Today's researchers are overwhelmed by the myriad of synthetic methods available. Their personal experience as a practicing chemist usually covers only a few narrow fields. This dilemma cannot be solved by studying the journal literature alone. To fully assess the utility of a published method for lab use, ideally one must personally try and test a method. This is what Theodor Weyl wrote in the preface to the first edition of ‘Weyl's Methods in Organic Chemistry' in 1909. 100 years later the medium of publication has changed, but the basic problem remains the same. To benefit from the wealth of chemical information resources available, you need to understand them. This talk will discuss the chemical information landscape using the 100 year history of Houben-Weyl and Science of Synthesis as a thread to provide a clearer picture on what is out there and what it is good for in chemical information.

37 Ninety editions and still going strong: The CRC Handbook of Chemistry and Physics
Fiona Macdonald, Fiona.macdonald@taylorandfrancis.com, Taylor and Francis/CRC Press, 6000 Broken Sound Parkway NW, Boca Raton, FL 33411, Fax: 561-998-2559

Publishing the 90th edition of the CRC Handbook of Chemistry and Physics is a true milestone in the history of CRC Press. Since its first publication in 1913 – as a 116-page pocket-sized book priced at $2 – the Handbook has developed into a 2800-page tome that no longer fits anyone's pocket but still finds a place on every scientist's bookshelf. This journey, and other milestones in the 96-year history of the book, will be discussed, and along the way we will take a look at the Editors who shaped the book over the years

38 Chemical handbooks in the electronic age: Assuring data quality
David R. Lide, drlide@post.harvard.edu, CRC Press, Editor, 13901 Riding Loop Dr, Gaithersburg, MD 20878, Fax: 301-738-7147

While compilations of data in the form of printed handbooks have served chemists for almost two centuries, computer technology has produced major changes in data access in the last 20 years. Recent trends and future expectations in data dissemination will be discussed. Particular emphasis will be given to the questions of quality control that are raised by the ease of posting chemical data on the Internet and the highly effective search tools for retrieving the data. A case will be made for the continued utility of concise, carefully documented handbooks as data sources.

39 CAS databases: Where do we get all that chemistry?
Roger J. Schenck and Rebecca Kopelman, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: 614-461-7140

Over the years, CAS has published many handbooks, from the CA Index Guide™ to the Registry Number Handbook. Concurrent with the waning of such printed products at CAS, more and more chemical information, beyond the traditional sources such as journals and patents, has been included in the CAS databases. This presentation will focus on the types of handbook information being added to the various CAS databases and the challenges in harmonizing these eclectic collections. Examples of how CAS has added its own handbook information to its electronic collections as well as examples of adding other handbook information, from both printed and electronic sources, to the CAS databases will be showcased. In conclusion, the future of chemical handbooks at CAS will be explored.

40 Seeking solutions with federated search tools
Grace Baysinger, graceb@stanford.edu, Swain Library of Chemistry and Chemical Engineering, Stanford University Libraries, 364 Lomita Drive, Organic Chemistry Building, Stanford, CA 94305-5081, Fax: 650- 725-2274

Multidisciplinary approaches are needed to address increasingly complex problems. Federated and multidatabase search tools reduce barriers and enable users to discover information resources outside the boundaries of traditional academic fields. While technologies used to provide federated search services continue to evolve and improve at a rapid rate, outstanding issues remain. This presentation will highlight tools and services being used by the Stanford Libraries to support interdisciplinary collaboration and learning on campus.

41 Fedora: A network overlay approach to federated searching
Leah R. Solla, lrm1@cornell.edu, Physical Sciences Library, Cornell University, 293 Clark Hall, Ithaca, NY 14853-2501, Fax: 607-255-5288

Fedora (Flexible Extensible Digital Object Repository Architecture) is a very flexible framework for aggregating, organizing, and making use of a mix of metadata and content records. It has been used by the National Science Digital Library (NSDL) project to aggregate metadata records from over a hundred OAIPMH providers in order to provide a central search service over that metadata (and a limited amount of crawled text from the resources themselves) with search results that link out to the original web-based resources. Search exposure is critical as increasing numbers of repositories become available and cyber-research expands across traditional disciplines. The need for standards-based search solutions that can flexibly aggregate and combine information about resources from multiple repositories and other information sources is becoming increasingly evident. This talk will give an overview of the current status of using Fedora-based network overlays to search across repositories.

42 Fee-based abstracting and indexing services vs. free federated searching
Valerie K. Tucci, vtucci@tcnj.edu, Library, The College of New Jersey, 2000 Pennington Road, Ewing, NJ 08628, Fax: 609-637-5177

Library budgets are facing drastic cuts given the current economic crisis. Unfortunately, the fee-based abstracting and indexing services are now in the spotlight and in many cases, on the chopping block. These A&I services were considered sacred and essential in the seventies and the birth of online searching only strengthened their position. However, in the intervening forty years a new paradigm has evolved. Open access and free federated search services such as Google Scholar, Scitopia, Scirus and CiteSeer are now making librarians question the need for fee-based services. This presentation will examine the changing landscape for secondary A&I services and explore the possibility that these services may indeed follow the downward spiral that newspapers are on today.

43 Application of the Modular Chemical Descriptor Language (MCDL) methodology to SARand QSAR in prostate cancer chemotherapy
Michael N. Burnett and Andrei A. Gakh, gakhaa@ornl.gov, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6242

In the Modular Chemical Descriptor Language (MCDL), the atomic composition of a molecule is specified with structural fragments, each consisting of a nonterminal atom and all terminal atoms attached to it. For example, the MCDL composition module of 2-bromobutane is CBrH;CHH;2CHHH, which shows there are three different structural fragments. In a study of how MCDL structure fragments containing halogen versus hydrogen might contribute to biological activity, a Free-Wilson analysis was performed on 200-300 literature examples of compounds studied for potential use in cancer chemotherapy. The results of this study will be presented and compared with SAR and QSAR results taken from the literature on the effects of halogenated fragments. This research was supported by the Global IPP program. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, under contract DE-AC05-00OR22725 for the U.S. Department of Energy. This paper is a contribution from the Discovery Chemistry Project.

44 3D QSPR for general use: Structure standardization
George D. Purvis III1, gpurvis@us.fujitsu.com, David T. Stanton2, stanton.dt@pg.com, William D Laidig3, and John D. Shaffer3. (1) Biosciences Group, Fujitsu, 15455 NW Greenbrier Pkwy, Suite 125, Beaverton, OR 97006, (2) Procter & Gamble, Miami Valley Innovation Center, 11810 East Miami River Road, Cincinnati, OH 45252, (3) Modeling and Simulations Group, Procter & Gamble, Miami Valley Innovation Center, 11810 E. Miami River Road, Cincinnati, OH 45253

Quantitative structure property relationships (QSPR) models are increasingly used to estimate properties of chemicals and to screen them for new product applications. Chemists who are not expert in modeling often use these models. Consequently, the models must be robust. In particular predictions must be insensitive to structure entry. Ideally, the same prediction is produced whether the QSPR model is given a structure in linear notation (e.g. SMILES), connection table format, or any of a number of 3D conformations regardless of the order of atom entry. Arguably, in the hands of experts, the best predictions and mechanistic interpretations are produced when the most structural information is available such as a fully optimized 3D conformation or an ensemble of conformations. However, 3D models come at the risk of sensitivity to structural input, not only for conformations of the same structure, but variability of conformations for similar structures. Here we address the question, "Can 3D based QSPR models be robust enough for general use by non experts or do the advantages of more unambiguous structure information of topological methods offset their possible lower accuracy?"

45 Chemical space network topology through atom typing
N. Sukumar1, nagams@rpi.edu, Mike Krein2, kreinm2@rpi.edu, and Curt M. Breneman2, brenec@rpi.edu. (1) Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute / RECCR Center, 110 8th St., Troy, NY 12180-3590, Fax: 518-276-4887, (2) Department of Chemistry / RECCR Center, Rensselaer Polytechnic Institute, 110-8th Street, Center for Biotechnology and Interdisciplinary Studies, Troy, NY 12180

The topological characteristics of chemical spaces and structure-activity landscapes set upper bounds to the predictivity of models constructed within these spaces. Here we analyzed the PubChem and ZINC databases (about 19 million and 2.5 million molecules, respectively) and the topological characteristics of the resulting networks. These are defined independent of biological activity, with nodes (molecules) within a preset level of 2-D similarity being connected by edges. Pairwise “Atomtyper distances” (the number of atoms in one molecule that are different from any atom in the other, to within a specified level of similarity) and “alchemical distances” (the number of atoms that have to be added, deleted or substituted to “transmute” one molecule into another) between molecules were determined, with pairs randomly sampled until the network characteristics converged. We also study the degree distributions of various subspaces at different similarity thresholds and the effects of employing other standard similarity measures.

46 Screening databases of hypothetical porous materials
Maciej Haranczyk1, mharanczyk@lbl.gov, Kevin Theisen2, Bei Liu2, and Berend Smit3, Berend- Smit@Berkeley.edu. (1) Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Mail Stop 50F-1650, Berkeley, CA 94720, Fax: 510-486-5812, (2) Chemical Engineering, UC Berkeley, Berkeley, CA 94720, (3) Chemical Engineering, UC Berkeley, 101B Gilman #1462, Berkeley, CA 94720-1462

Porous materials, e.g. zeolites, have many applications in the chemical industry. The number of possible zeolite structures has been estimated to be larger than 2.5 millions. Databases of hypothetical zeolite structures are being developed and they could in principle be screened for zeolites of any desired property. The current state-of-the-art molecular simulations allow for accurate prediction of zeolite properties but the computational cost of such calculations prohibits their application in the characterization of the entire database of hypothetical structures, which would be required to perform brute-force screening for novel structures with useful properties. Our work focuses on the development of an efficient screening technique that requires such expensive characterization only for carefully selected and statistically relevant subset of a database. Then, the database is screened employing the similarity principle. The developed screening technique, structural descriptors and similarity measures will be presented. This work is supported by the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

47 Data mining cluster analyses of zeolite crystals
Mohammed Lach-hab1, mlachhab@gmail.com, Shujiang Yang1, syangf@gmu.edu, Iosif Vaisman2, ivaisman@gmu.edu, and Estela Blaisten-Barojas3, blaisten@gmu.edu. (1) Computational Materials Science Center, George Mason University, 4400 University Dr., MSN 6A2, Fairfax, VA 22030, (2) Department of Bioinformatics and Computational Biology, George Mason University, 10900 University Dr, Manassas, VA 20110, (3) Computational Materials Science Center, George Mason University, 4400 University Dr, MS 6A2, Fairfax, VA 22030

Computationally predicted inorganic solid materials are becoming increasingly available. Zeolite crystals are one example. Mining such source of information is challenging since the hypothetical compounds lack their crystallography description. Unsupervised classification of these compounds is useful for the material designer. In this work we train an unsupervised clustering model for identifying zeolites into four superclasses sharing common structural properties. The clustering algorithm is based on the probabilistic expectation maximization, which is trained on a set of 1400 zeolite crystals from the Inorganic Crystal Structure Database. A thorough feature importance analysis is carried out, resulting in two groups of features allowing classifications with up to 97 % accuracy. (Work supported under the National Science Foundation grant CHE-0626111. ICSD data are courtesy of the National Institute of Standards and Technology).

48 Knowledge acquisition from reaction database for metabolic pro?ling
Lothar Terfloth1, terfloth@molecular-networks.com, Thomas Klein_der1, J_rg Marusczyk1, Christof H. Schwab2, schwab@molecular-networks.com, and Johann Gasteiger2, gasteiger@molecular-networks.com. (1) Molecular Networks GmbH, Henkestrasse 91, D-91052 Erlangen, Germany, (2) Molecular Networks GmbH, Henkestrasse 91, Erlangen D-91052, Germany

In the drug discovery process multiple – partly competing – objectives have to be optimized in order to come up with a new lead structure. The identification of a potent and selective compound is not sufficient. Furthermore, a lead compound should possess a favourable pharmacokinetic profile. A lot of papers in the field of the in silico prediction of ADMET (absorption, distribution, metabolism, elimination, toxicity) properties were published. In comparison to the number of models which are available for the prediction of absorption it seems that less interest was dedicated to the modeling of metabolism. This paper focuses on the knowledge acquisition from reaction databases and its application to the metabolic profiling of drugs. The performance of metabolite prediction is investigated on an external validation data set of drugs and their metabolites which are reported in the literature. The merit of the consideration of the intrinsic reactivity of the substrates estimated by physico-chemical descriptors will be presented.

49 Crystal structure information aids drug discovery and development
Frank H. Allen, allen@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre (CCDC), 12 Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44-1223-336-033

Experimental observations of conformational preferences and intermolecular interactions in small-molecule crystal structures have been of fundamental importance in computational drug discovery since the early 1980s. The Cambridge Structural Database (CSD) now contains almost half a million crystal structures, and is used to generate two searchable knowledge-based libraries: Mogul, containing substructure-based distributions and statistics derived from more than 20 million bond lengths, angles and torsions, and IsoStar, containing over 20,000 scatterplots of non-bonded interactions between chemical functional groups. The talk will summarise applications of structural knowledge in chemistry, drug discovery and, more recently, in drug development and formulation. The inter-relationship between experimental information and computational results will also be discussed.

50 Creating data resources for biology: Lessons from the PDB and the PSI SGKB
Helen M. Berman, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ 08854

Many issues need to be considered when building resources that enable a variety of scientific communities. One is the necessity of a scalable infrastructure that can handle vast amounts and different types of data. This infrastructure must also be extensible to handle new and evolving technologies. Another concern is how to solicit and incorporate the needs and wants of a variety of user communities. Two global resources for science – the Protein Data Bank (PDB) and the Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB) – will be presented. The PDB has been the archive for the three-dimensional coordinates for experimentally-determined biological structures for the last 30 years. The PSI SGKB, launched in 2007, expands upon this information by integrating available structural, experimental, biological, and modeling data for all protein sequences. Today, both resources are used by researchers and students in a variety of disciplines who are studying these biological macromolecules and their relationships to sequence, function, and disease.

51 Community Structure-Activity Resource: Data repository to improve docking and scoring
James B. Dunbar Jr., jbdunbar@umich.edu, College of Pharmacy Department of Medicinal Chemistry, University of Michigan, 428 Church Street, Ann Arbor, MI 48109

The Community Structure-Activity Resource is a center at the University of Michigan funded by the National Institutes of Health, specifically the National Institute of General Medical Sciences. The function of this center is to collect, curate, and disseminate data sets of crystal structures, biological binding affinities, and thermodynamic data to aid in the refinement of docking and scoring methodologies. These data sets are to come from in-house projects at the University of Michigan, other academic labs, a most importantly from industrial, large pharma, sources. Part of our remit is to fill in the gaps with synthesis, crystallography and biology targeted to augment, as best we can, what is currently available in terms of the full range of properties, binding affinities and other relevant characteristics involved in docking and scoring. This presentation will detail what we have done so far and what we plan for the future.

52 Bio-Activity databases
Albert J. Leo, aleo@biobyte.com and Alka Karup, akurup60@gmail.com, BioByte Corp, 201 W, 4th St. #204, Claremont, CA 91711

Following the Hansch-Fujita research that established the usefulness of a hydrophobic parameter in biological QSARs, the early databases constructed and compiled at Pomona College concentrated on collecting measured partition coefficients of as many solutes in as many solvent pairs as possible. Octanol/water, oil/water, and ether/water proved to be the solvent pairs most useful in the biological field, but other pairs soon found applications in fields such as ore enrichment for rare earths and uranium. The QSAR database of Hammett-type free energy equations, explaining bio-activity in quantitative terms, using hydrophobic, steric and electronic parameters, was at first kept separate from the properties database (Masterfile) of log Ps and pKas. The merger of the two, allowing for parameter ‘range searching' and automatic loading, has been accomplished, which makes user training much easier. The final product was called BioLoom to emphasize the need to "weave into fabric" the current "meteoric shower of facts."

53 Trust … but verify! On the importance of experimental data curation prior to building(Q)SAR models
Alexander Tropsha, alex_tropsha@unc.edu, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, Eugene Muratov, 00dqsar@ukr.net, Laboratory of Molecular Modeling, School of Pharmacy, The University of North Carolina at Chapel Hill, CAMPUS BOX 7360, Chapel Hill, NC 27599, and Denis Fourches, fourches@email.unc.edu, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, Beard Hall, Chapel Hill, NC 27599

Molecular modelers are always at the mercy of the primary data providers. We argue and illustrate with examples that the quality of data predefines the accuracy and predictive power of models irrespective of the rigor and thoroughness used in building (Q)SAR models. The primary data may contain errors in both chemical structures, values of the biological data, and associations between structure and bioassay results; frequently, there are duplicates. We show that many publicly available datasets including those recently used for QSAR competitions contain erroneous information that is sometimes sufficient to undermine the virtue of the competition. We further show that the data errors influence significantly if not dramatically the accuracy of the resulting models. Conversely, we demonstrate that rigorously built (Q)SAR models can help identifying and correcting gaps and possible errors in primary datasets. Finally, we propose simple protocols for primary data analysis and curation.

54 Learning from a drug guru: Part of a new wave of cheminformatic analysis
Kent D. Stewart, kent.d.stewart@abbott.com, Global Pharmaceutical Research & Development, Abbott Laboratories, 100 Abbott Park Road, Abbott Park, IL 60064, Fax: 847-937-2625

DRUG GURUTM (Drug Generation Using Rules) is a computer program that applies medicinal chemistry “rules-of-thumb” to an input structure to design new analogs [K. D. Stewart et al., Bioorg. Med. Chem. 14, 7011-7022, 2006]. This presentation will review the basics of the Drug Guru program and compare and contrast results with related software programs BROODTM (Open Eye), BIOSTERTM (Accelrys) and EMILTM (CompuDrug). This work will be placed into the context of current research in “Compound Pairs Analysis” that is under active investigation in many research groups.

55 Hyperparametric modeling, i.e. modeling
Anthony Nicholls, OpenEye Scientific Software, Inc, 9d Bisbee Court, Santa Fe, NM 87508

The field of molecular modeling, from quantum mechanics to QSAR, is beset with parameters. Is this a problem? Even good physical theory requires a parameter or two, but forty, a hundred, a thousand? This talk will propose that the entire field of molecular modeling is over-parameterized, hyper-parameterized even, because methods of assessing a judicious number of constraints are either unknown, ignored or improperly applied. The consequences are profound but not irredeemable.

56 Template-constrained topomer CoMFA
Richard D. Cramer, cramer@tripos.com, Tripos Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241

While topomer CoMFA (using topomer poses in 3D-QSAR) is showing remarkable promise (1), in particular by providing an R-group virtual screening to lead optimization projects (2), there remains a concern, if only when depicting the results, about the physicochemical improbability of many topomer poses. Our belief is that the success of topomer CoMFA results from the self-consistency of topomer poses, and therefore that other rigorously self-consistent pose generation methodology should also succeed. Constrained topomer CoMFA allows the user to provide template conformations. Under this circumstance, wherever there is a suitable mapping of 2D structures between a template conformation and the fragment whose topomer is being generated, the template coordinates are copied directly to the topomer, with the topomer rules are used as usual elsewhere. In this context, “suitable mappings” must begin at the fragment root, include exact matches of (heavy) atom and bond types to the maximum possible extent, and then finish whenever heavy atom topologies no longer conform. Results from using such constrained topomer poses in the 25 published topomer CoMFA models and 50 R-group searches will be presented.

1) Cramer, R. D. Topomer CoMFA: A Design Methodology for Rapid Lead Optimization, J. Med. Chem., 2003, 46, 374-389.

2) Cramer, R. D.; Cruz, P.; Stahl, G.; Curtiss, W. C.; Campbell, B.; Masek, B. B.; Soltanshahi, F. Virtual Screening for R-groups, including Predicted pIC50 Contributions, within Large Structural Databases, Using Topomer CoMFA. J. Chim, Inf. Mod., 2008, 48, 2180-2196.

57 Power to the people: Integrating data and analysis in one easy application
Derek A. Debe, Discovery Informatics, Abott Laboratories, Mailstop AP10/R42T, 100 Abbott Park Rd., Abbott Park, IL 60064, Fax: 847-937-2625

This talk will discuss the successful development and deployment of a Drug Discovery data integration and analysis platform at Abbott Laboratories. This application serves as the data analysis centerpiece for Abbott's Discovery chemists and biologists. Specific use case examples will be presented, including functionality useful for Hit-to-Lead analysis and Lead Optimization efforts. Attendees will gain an understanding of 1) the successful deployment of a very well-received data integration and analysis platform to our research scientists and 2) how software vendor available tools can be integrated together to produce a successful small molecule discovery research centerpiece application.

58 A probabilistic approach to compound subset selection for virtual and high-throughput screening
Philip Hajduk, Advanced Technology Division, Abbott Laboratories, AP10 LL, 100 Abbott Park Rd, Abbott Park, IL 60064, Fax: 847-937-2625

A probabilistic approach to compound subset selection is described using a belief theory framework for chemical similarity and bioactivity. The approach outperforms conventional methods of subset selection and enables a quantitative assessment of the risk of missing bioactives when testing only a subset of the available compounds. Applications of this approach in assessing chemical diversity in various compound collections will be described.

59 Application of belief theory to similarity data fusion for use in analog searching and lead hopping
Steven Muchmore, R4DG, Abbott Laboratories, 100 Abbott Park Rd., Abbott Park, IL 60064

Computational approaches to detecting chemical similarity have been developed using diverse strategies that strive to capture the features of molecules that are salient to some activity. Methods have been developed that exploit both 2D and 3D descriptions of molecules, and it has long been recognized that different measures of similarity will give rise to different rankings of a collection of molecules to the query. The use of the similarity measures to find effective substitutions for a known active molecule is commonly undertaken, and this technique has been referred to as “lead hopping”. One difficulty in effective lead hopping is in combining results from different measures of similarity in a meaningful and productive way. This work presents a probabilistic approach, which attempts to reconcile different lead hopping techniques by establishing a common framework for comparison.

60 Ligand-based drug discovery in an era of structure-based drug discovery
Yvonne C. Martin, yvonnecmartin@comcast.net, 2230 Chestnut St., Waukegan, IL 60087, Fax: 847-937- 2625

With increasing numbers and types of 3D ligand-macromolecule structures becoming available every year, it is time to ask whether ligand-based methods are obsolete when one has a structure on which to base a design. This presentation will present observations that suggest that careful analysis of ligand structureactivity relationships provides independent information that contributes the discovery of ligands with the desired profile of potency, novelty, selectivity, etc.

61 One search, many answers: Bringing together results from multiple databases through theDiscoveryGate platform
Carmen I Nitsche, Carmen.Nitsche@symyx.com, Vice President Content, Symyx Technologies, 254 Rockhill Drive, San Antonio, Texas, TX 78209

Despite technological advances, chemists are still faced with having to learn a myriad of online search systems from which they are trying to retrieve pertinent chemical information. In this paper we will review various approaches employed on the DiscoveryGate_ platform to bring together over a dozen different commercial and no fee databases across various vendors. In particular we will discuss the Compound Index, as a means of retrieving related information. We will also explore how newly developed technologies based on web services readily bring together information from varied sources, and deliver the sought information into standard search/browse applications, into customer built applications, and directly into scientific workflow applications.

62 Federated search in commercial and noncommercial structure and reaction databases: A ?exible approach
Valentina Eigner-Pitto, ve@infochem.de and Josef Eiblmaier, InfoChem GmbH, Landsberger Strasse 408, Munich 81241, Germany

Chemically relevant databases often are located either in the company's intranet or the internet. The approach described here provides access to multiple structure and reaction databases, commercial and noncommercial. User access is provided via an intuitive, easy to use web interface. The search can be conducted as structure, reaction, or factual data search. A challenge faced in the implementation of a federated structure search is the heterogeneity of the different data sources as regards the technical interface and the content and format of the results. Moreover the query must be translated into the specific query language of each of the foreign target systems. Our approach connects to any database that provides either an Oracle cartridge or a web service and that can handle a structure query in MDL Molfile, Smiles or Rosdal format. The meta search engine utilizes database specific connectors that use one of multiple protocols such as SOAP, SRU, SQL*Net/Net8. The query is sent to the distributed search services. Results in different native formats are collected, consolidated and presented to the user. In a results overview, hits are grouped in different contexts such as “Structures”, “Reactions” or “Documents” which gives the user the possibility to view the hit in the desired context. The hit lists themselves provide hyperlinks for direct access to the original display or document page.

63 Oops and downs of resolving InChIs for the chemistry community
A J Williams, tony@chemspider.com, ChemZoo, 904 Tamaras Circle, Wake Forest, NC 27587

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

64 BioMart: Federating public and proprietary data
Arek Kasprzyk, arek.kasprzyk@oicr.on.ca, Bioinformatics and Biocomputing, Ontario Institute for Cancer Research, 101 College Street, suite 800, Toronto, ON M5G 0A3, Canada

BioMart is an open source data management system focused on 'data mining'-like searches of complex descriptive data. The power of the system comes from integrated querying of data sources regardless of their geographical locations through a single web interface. BioMart Central Portal (www.biomart.org) offers a one-stop shop solution to access to over 20 biological databases distributed in multiple locations. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape. The system also supports programmatic access through Perl API as well as RESTful and SOAP oriented web services. Recently, BioMart has been adapted as a data management platform for the International Cancer Genome Consortium (ICGC). The BioMart-based ICGC portal will provide unified access to new generation sequencing data from 50,000 genomes distributed among different cancer research institutes around the world. Additional data sources from public domain will be federated in order to add more annotations to the data generated by the ICGC. BioMart can easily be adapted as an in-house data management solution. Furthermore, once deployed it will facilitate federation with publically available data sources thus bringing the wealth of public domain data into integrated querying of proprietary data.

65 Half a million crystal structures in the CSD: A unique teaching resource in 3D structural chemistry
Frank H. Allen, allen@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre (CCDC), 12 Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44-1223-336-033

Crystallography is the method of choice for characterising chemical structures. The Cambridge Structural Database (CSD) now contains data for half a million crystal structures, representing a massive library of 3D chemical information that can be interrogated and displayed using state of the art software. Apart from its well known research applications in pharmaceutical and structural chemistry, the CSD System provides chemistry teachers with a unique opportunity to incorporate all aspects of 3D chemistry into their courses. These include, inter alia, molecular dimensions, the conformations and stereochemistry of cyclic and acyclic moieties, reaction pathways, and the geometrical and directional aspects of hydrogen bonds and other non-bonded interactions.

66 Bond lengths, crystal structure determinations, and research in the undergraduate classroom
Guy Crundwell, CrundwellG@mail.ccsu.edu, Department of Chemistry, Central Connecticut State University and STaRBURSTT CyberDiffraction Consortium, 1615 Stanley St., New Britain, CT 06050, Neil M. Glagovich, glagovichn@ccsu.edu, Department of Chemistry, Central Connecticut State University, 1615 Stanley Street, PO Box 4010, New Britain, CT 06050, and Barry L Westcott, westcottb@ccsu.edu, Department of Chemistry, Central Connecticut State University, New Britain, CT 06050

At CCSU, the Cambridge Structural Database (CSD) is used in undergraduate research, inorganic laboratory, and in our special topics course in crystallography. When encountering topics for the first time in textbooks, student often find hand-picked data aimed to illustrate fundamental topics in structure and bonding. However, the raw data mined from the CSD challenges students to think more critically about these fundamental topics of bonding and molecular structure since the data does not present itself as neatly as a vetted table in a textbook. The use of the CSD allows a professor to test student backgrounds of previously learned material, to highlight to students the limitations in methods of data collection, and to work with students to gain the ability to synthesize broader applications and connections between bonding and structure.

67 CRYSTMET: Inorganic crystal structures in chemical education and materials design
J Rodgers, jrodgers@innovativematerials.com, Innovative Materials Technologies Inc, 12B Charles Bagot Street, Gatineau, QC J8X4E1, Canada

CRYSTMET is a database of crystal structures of compounds that do not contain a C-H bond – metals, alloys, minerals and other inorganic compounds. CRYSTMET contains 130,000 crystal structure entries classified according to structure type and other criteria. Software for searching and structure visualisation is supplied with the database. CRYSTMET is a rich source of examples of both common and uncommon structure types in inorganic and materials chemistry that are essential in chemical education at various levels. The talk will describe the database and its information content, and also indicate how the accumulated data is being used in modern materials design.

68 Using the Cambridge Structural Database to explore concepts of symmetry
Dean H. Johnston, djohnston@otterbein.edu, Department of Chemistry and Biochemistry, Otterbein College, Westerville, OH 43081, Fax: (614) 823-1968

The Cambridge Structural Database provides a rich and virtually unlimited source of example molecules for teaching concepts of symmetry. Various exercises have been developed for use in basic and advanced undergraduate courses in Inorganic Chemistry. In one exercise, students used the CSDSymmetry database along with the Cambridge Structural Database to identify molecules with interesting point group symmetry and then presented their findings to the other students in the class. Several of these examples have been incorporated into an online symmetry gallery that includes an interactive display of the full set of symmetry elements and operations for each molecule.

69 Teaching crystallography in physical chemistry
Virginia B. Pett, pett@wooster.edu, Department of Chemistry, The College of Wooster, 943 College Mall, Wooster, OH 44691, Fax: 330-263-2386

In a computer-based laboratory session physical chemistry students visualized the packing diagram of a crystal structure. They examined both "real space"—the unit cell of the crystal—and "reciprocal space"— the diffraction pattern. The students calculated the density of the crystal, measured bond lengths and bond angles to compare the experimental measurements with valence bond ideas of hybridization, and found hydrogen bonds in the crystal. In an advanced physical chemistry topics course the students accessed the Cambridge Structural Database to visualize the three-dimensional crystal structure of molecules and to investigate symmetry, packing, and intermolecular interactions in the solid state. Each project was organized so that the students made discoveries, drew conclusions, and presented their results in writing. They examined an organic bicyclic ring structure to investigate ring symmetry and ring pucker; they were challenged to find an example of unusual hydrogen bonding in the packing diagram of another organic molecule.

70 Teaching molecular structure using Jmol
Robert M. Hanson, hansonr@stolaf.edu, Department of Chemistry, St. Olaf College, 1520 St. Olaf Avenue, Northfield, MN 55057

In this presentation the current principal programmer and project director of the Jmol molecular visualization applet will illustrate recent advances in Jmol that are particularly relevant to crystal structure visualization.

71 Modeling and simulation in biochemistry: A guide for users and consumers ofcrystallographic information
Katherine A. Kantardjieff, kkantardjieff@fullerton.edu, Department of Chemistry and Biochemistry, California State University Fullerton, 800 N. State College Blvd., Fullerton, CA 92834-6866, Fax: 734- 939-4225

A biomolecular crystal structure is a hypothesis based upon model agreement with the diffraction data. Models validated by established criteria present an opportune starting point for additional computation that may provide further insights into biochemical function and mechanism, as well as successfully guide drug discovery efforts, including target selection, synthesis, and design modification to optimize binding affinity and pharmacokinetic properties. Crystal structures provide the basis for comparative protein structure modeling which, by matching accuracy with intended use, may be used for virtual screening, defining antibody epitopes, protein engineering, rational mutagenesis, molecular replacement phasing, and fitting low resolution electron density. Given a structure, molecular dynamics or QM/MM approaches may further elucidate catalytic mechanism and contribute meaningfully to inhibitor design. As we shall see in this presentation, exploiting biomolecular crystal structure in modeling and simulation can be quite powerful in addressing a research problem or learning about fundamental chemistry. However, caveat emptor.

72 Education and certi?cation of patent information professionals in Europe
Bob Stembridge, Bob.Stembridge@thomsonreuters.com, Customer Relations, Thomson Scientific, 77 Hatton Garden, EC1N 8JS London, United Kingdom

The work of the patent information professional is central to the patent system from identifying the prior art necessary to establish the patentability of an invention, through determining freedom to operate within a given territory, to helping to detect infringement of IP rights and providing support for proceedings against alleged infringers. But how does one learn the skills involved and, perhaps more importantly, how can an individual demonstrate that they possess the necessary knowledge and experience to conduct patent information work competently and reliably? Although University courses exist which include modules for IP education, these are scant basis on which to equip the student with the wide range of knowledge about search systems and languages, databases, patent systems, claims construction etc. required to be considered a competent professional. In practice, this knowledge has traditionally been acquired and accumulated through experience and learning “on the job”. In today's fast-moving environment, there is a need for trained professionals ready to step up to the plate straight out of training. This presentation will describe initiatives in Europe to formalize the education of tomorrow's patent information professionals and put in place a system to assess and certify both existing and aspiring patent information professionals to assure the necessary quality required for the future health of the patent system.

73 PERI Patent Information Course
Edlyn S. Simmons, edlyns@earthlink.net, Simmons Patent Information Service LLC, 5528 Brewer Rd., Mason, OH 45040, Fax: 513-398-3660

In 1989, the Patent Committee of the Pharmaceutical Manufacturers Association's Information Management Subsection introduced a course on the fundamentals of patent law and patent information resources. The course was developed because existing patent search training covered only database content and search techniques, while training in basic patent law and principles was left to informal interactions with mentors and colleagues. The course continues to be presented by PERI, filling the training gap for new patent searchers in the 21st century. This presentation will review the content of the course.

74 Law librarianship
Renate Chancellor, School of Library Science, Catholic University of America, 620 Michigan Avenue, NE, 246 Marist Hall, Washington, DC 20064

Intellectual property education in law librarianship.

75 USPTO: Education of the inventor community
John Calvert, Supervisory Patent Examiner, USPTO, Alexandria, VA 22313-1450

The USPTO has offered education and assistance to individual inventors for many years. Recently, the concept of the individual inventor has progressed from the mom and pop garage inventor to small business inventors and university inventors. From this change has come a need to educate a growing number of individuals. With the increased need for education and limited resources the USPTO has begun to provide many education opportunities using the vast resources of the electronic age. The USPTO now provide online chats, video links on their web-site, educational sessions from various web sources and live web casts of inventor conferences.

76 Copyright basics
Eric S. Slater, e_slater@acs.org, Publications Division, Copyright Office, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112

This session will feature a general discussion of basic United States Copyright Law, including, but not limited to, such topics as subject matter of copyright, exclusive rights of copyright, duration of copyright and application of copyright law to new technology and methods of distribution. Additionally, the speaker will discuss different “movements” (e.g., Open Access, Creative Commons, etc.) and how these have affected copyright law and practices of publishers. Finally, the session will conclude with a primer on the permissions process, and why it is important to be aware of copyright when using material posted on the Internet.

77 Recent developments in intellectual property
Hans Sauer, Biotechnology Industry Organization, Washington, DC 20024, and Pamela J. Scott, Pamela.J.Scott@pfizer.com, Legal Division, Pfizer, Inc, Eastern Point Road, MS 8260-1611, Groton, CT 06340

Overview to intellectual property education

78 Using crystallographic databases in the ACA summer course in small moleculecrystallography
John C. Woolcock, woolcock@iup.edu, Department of Chemistry, Indiana University of Pennsylvania, 239C Weyandt Hall, Indiana, PA 15705

The American Crystallographic Association (ACA) Summer Course is a ten-day intensive program that teaches both single-crystal and power diffraction. Participants are encouraged to bring their own samples for structure determination and during the course they have access to both the Cambridge Structural Database (CSD) and the Powder Diffraction File (PDF). This presentation will focus on the ways the CSD and the PDF are incorporated into the lecture and lab components of the ACA course. The previous knowledge that participants have about crystallographic databases and how they use them to support structure determination in the course will also be examined.

79 Conceptualizing reaction mechanisms using crystallographic data
Kraig A. Wheeler, kawheeler@eiu.edu, Department of Chemistry, Eastern Illinois University, 600 N Lincoln Avenue, Charleston, IL 61920

Classroom discussions of organic reaction mechanisms offer students useful opportunities to explore the intimate details of reaction processes. The advantage of having students study chemical reactions from a mechanistic view rather than pattern recognition (memory recall) is obvious; students with a fundamental understanding of mechanisms are more able to predict reaction outcomes and can transfer prior mechanistic insight to new reaction schemes. In general, attention to such course material is limited to 2-D drawings and arrow-pushing exercises. Since the Cambridge Structural Database contains a wealth of structural information that has served to support existing reaction theories and unravel mechanistic details, this resource should also provide a valuable teaching tool. Well-placed discussions that combine the advantages of crystallographic data and traditional approaches help students gain a more lucid grasp of this material. This presentation will highlight several examples of the application of crystallographic data to reaction mechanisms in the organic classroom.

80 An interactive online teaching subset of the Cambridge Structural Database
Gary M Battle, battle@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, United Kingdom

The Cambridge Structural Database (CSD) serves as the worldwide repository of small-molecule crystal structure data. As such, this unique database of approaching one-half million molecules is a crucially important resource for chemical education. However, despite the obvious benefits of using experimentallymeasured 3D structures, this powerful resource is under utilised in undergraduate teaching. This talk will illustrate how, through use of a free interactive online teaching subset of the CSD, crystal structure information can be made readily accessible to students. A range of associated teaching exercises will also be discussed. Together, these online resources aimed at non-crystallographers can be used to enhance learning across the chemistry curriculum.

81 Teaching with the Cambridge Structural Database from general chemistry to advancedinorganic chemistry
Stephen A. Koch, stephen.koch@sunysb.edu, Department of Chemistry, State University of New York at Stony Brook, Stony Brook, NY 11794-3400, Fax: 631-632-7960

Stony Brook has had a site license for many years for the Windows version of the Cambridge Structural Database. This has enabled the author to use the CSD in his classes including Honors General Chemistry, Organic Lab, and Advanced Inorganic Chemistry as well as Graduate Inorganic Chemistry. General Chemistry students can easily learn how to use this research level database and use it to explore the diverse structural chemistry of molecular inorganic compounds. I also use the CSD to introduce 2nd and 3rd year chemistry majors to the concept of structure and substructure searching before teaching them to use more expensive, seat limited databases. The Web version of the CSD will make it much easier for undergraduate students to use the CSD in their classes.

82 Utilization of the Cambridge Structural Database system in the undergraduate chemistrycurriculum
Gregory M. Ferrence, ferrence@ilstu.edu, Department of Chemistry, Illinois State University, Campus Box 4160, Normal, IL 61790-4160

Spatial ability, the ability to manipulate 3-D objects in our heads, has a relationship to undergraduate chemists' performance. Commonly, weakness in this area impedes his/her progress. Technological advances both in e-learning tools and information availability have greatly enhanced the chemical educations community's ability to address this skill set. The Cambridge Structural Database System includes an atomic coordinate database for nearly half a million compounds. The information available is fundamentally 3-D in nature and may be easily rendered as visual graphics using CSDS programs. Academic access to the CSD has been available since the 1960's; however, it was primarily used as a tool for crystallographers. During the past decade, the tools used to extract and manipulate information from the CSD have evolved into a powerful, user-friendly suite of software (the CSDS) which industrial sector researchers in the life-sciences have come to regard as invaluable. Contrastingly in academia, use of this powerful set of research and teaching tools remains dominantly regarded as useful to “hard-core”crystallographers alone. Through a Discovery Corps Senior Fellowship (DCF) from the National Science Foundation, the speaker has been facilitating site license access to the CSDS for more than 30 Primarily Undergraduate Institutions and providing workshops at these institutions to help and encourage faculty to integrate use of the CSDS into their undergraduate chemistry curriculum. This talk will discuss the overall DCF project and illustrate examples of how CSDS information can be used to enhance chemistry learning throughout the span of organic, analytical, physical, biochemical, inorganic, etc chemistry.

83 Using the Cambridge Structural Database as a resource for undergraduate research and teaching
Barbara A. Reisner, reisneba@jmu.edu, Department of Chemistry and Biochemistry, James Madison University, MSC 4501, Harrisonburg, VA 22807

The Cambridge Structural Database (CSD) has become an important resource for undergraduate research and teaching in the Department of Chemistry and Biochemistry at James Madison University, a Primarily Undergraduate Institution (PUI). This presentation will focus on the role that the CSD plays in early research experiences in inorganic chemistry both through the Integrated Inorganic/Organic Laboratory and in undergraduate research. CSD-centered learning objects (small instructional units) that have been developed for implementation in the sophomore-and senior-level inorganic chemistry lecture courses during the 2009-2010 academic year will be presented. Finally, the role of the Virtual Inorganic Pedagogical Electronic Resource (VIPEr, http://www.ionicviper.org) as a platform for dissemination and as a mechanism for obtaining community feedback will be discussed.

84 Evaluation of FTrees in terms of scaffold hopping on different targets by retrospective andprospective virtual screening
R_bert Kiss, r.kiss@richter.hu, Gedeon Richter Plc, Gy_mr_i _t 30-32, Budapest H-1103, Hungary, andGy_rgy M. Keser_, gy.keseru@richter.hu, Gedeon Richter Plc, P.O.Box 27, Budapest H-1475, Hungary

FTrees was reported as a useful approach for finding novel hits by using information about known actives. Trees can be classified as a mixed 2D/3D approach. It uses a molecular descriptor (Feature Tree) that is a reduced graph representation of molecules containing connectivity information and pharmacophore features only. Several publications suggested the efficiency of FTrees in scaffold hopping. Our group evaluated the efficiency of FTrees on four different targets in comparison with simple 2D fingerprint similarity searching. The evaluation was carried out by analyzing the highest enrichment factors, speed and diversity of active compounds discovered. The influence of the query reference compound was also investigated. We also conducted prospective virtual screens and subsequent pharmacological evaluation of the virtual hits.

85 Anticancer activity of SERT binding sulfur-substituted ?-alkyl phenethylamines
Andrew JS. Knox1, andrew.knox@tcd.ie, Suzanne Cloonan2, cloonans@tcd.ie, John J Keating3,jj.keating@ucc.ie, Stephen G Butler4, butlersg@tcd.ie, Georgia Golfis5, ggolfis@tcd.ie, Anne M J_rgensen6, anmj@lundbeck.com, Gunther H. Peters6, Dilip Rai7, dilip.rai@ucd.ie, David G. Lloyd1,david.lloyd@tcd.ie, D Clive Williams2, and Mary J Meegan4. (1) Molecular Design Group, School of Biochemistry and Immunology, Trinity College Dublin, College Green, Dublin, Dublin D2, Ireland, Fax:353-676-2400, (2) School of Biochemistry and Immunology, Trinity College Dublin, Dublin D2, Ireland,(3) School of Pharmacy and Department of Chemistry, University College Cork, Cork, Ireland, (4) School of Pharmacy and Pharmaceutical Sciences, Trinity College Dublin, Dublin D2, Ireland, (5) Molecular Design Group, School of Biochemistry and Immunology, Trinity College Dublin, Dublin D2, Ireland, (6)Department of Chemistry, The Technical University of Denmark, Lyngby, Denmark, (7) Centre for Synthesis and Chemical Biology, School of Chemistry & Chemical Biology, University College Dublin, Dublin D4, Ireland

The recent revelation that certain serotonin reuptake transporter (SERT) targeting ligands may act as proapoptotic agents in the treatment of cancer adds greatly to their diverse potential pharmacological application. 4-methylthioamphetamine (MTA) is a potent inhibitory ligand for SERT. In this study, a novel library of structurally diverse 4-MTA analogues were synthesised with or without N-alkyl and /or C-Α methyl or ethyl groups and their potential SERT-dependent antiproliferative activity was assessed. A number of these novel SERT-targeting agents displayed potential anti-cancer effects with EC50's within the low micromolar range. Computational analyses were carried out to determine any possible relationship between SERT activity and pro-apoptotic activity on several cell lines. Using in silico 'Target-Fishing' techniques we propose possible mechanisms of action for these compounds.

86 Merging and growing fragments interactively
Marcus Gastreich, marcus.gastreich@biosolveit.de and Christian Lemmen, BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, Germany, Fax: +49 2241 2525 525

Fragments experience a buzz these days: Upon detection of fragment binders in protein active sites, the general strategy is to merge, grow, or link them to enhance binding of a resulting 'composite' ligand.

Computational chemistry ideally supports this workflow by sensible proposals for synthesis and modifications. However, on the computational side complications are the lack of time and the quality of checks for synthetic accessibility of the proposals. Our tool ReCore was extended to identify linker motifs from excessively large fragment libraries which do not only connect fragment binders in their experimentally observed position, but also comply with the binding motifs using pharmacophores and other features. ReCore is fast enough to provide instant feedback to the user – thereby enabling an interactive query refinement. The algorithm moreover favors synthetically feasible solutions by the setup of its search libraries and upon forming the resulting composite ligands. Validations across different targets are reported.

87 Operating in chemical spaces: Novel methods for lead identi?cation and library design
Matthias Rarey, Center for Bioinformatics (ZBH), University of Hamburg, Bundesstrasse 43, 20146Hamburg, Germany, and J. Robert Fischer, Center for Bioinformatics (ZBH), University of Hamburg, Bundestrasse 43, 20146 Hamburg, Germany

Computational methods for lead identification and library design are frequently applied in cheminformatics. In case that structures should be designed de novo, chemical fragment spaces which result either from retrosynthetic breakdown of (drug-like) compounds or from analysis and combinatorial synthesis protocols substantially improve the drug-likeness and synthetic accessibility of the resulting compounds. Several cheminformatics tasks which are traditionally defined over individual molecules can also be formulated for chemical fragment spaces. Since the enumeration of all individual molecules of such a space is prohibitive, the challenge is to find computational methods solving modeling tasks without this step. In this talk, we overview several methods developed for chemical fragment spaces. Among others, computational approaches to fragment space modeling and searching with various search criteria (property, structure-, and similarity-based) are presented. Recently, we developed a novel method for deriving focused libraries from fragment spaces. Besides multiple physico-chemical property ranges, the similarity and dissimilarity to query molecules as well as the internal diversity can be considered. Some practical examples will demonstrate the capabilities as well as the limitations of these approaches.

88 Chemical information from single crystal neutron crystallography
Xiaoping Wang, wangx@ornl.gov and Christina M. Hoffmann, Neutron Scattering Science Division, Oak Ridge National Laboratory, P.O. Box 2008, MS-6475, Oak Ridge, TN 37831, Fax: 865-574-6080

The important contributions from neutron crystallography in the study of hydrogen bonding, metal-hydride complexes will be presented. The ability of accurate hydrogen atom location from neutron diffraction has helped understand key intermediate chemicals and pathways in catalytic processes. Cases where single crystal neutron crystallography has played important roles in understanding chemical and hydrogen bonding, including text-book examples such as agnostic interactions will be discussed. Although neutron single crystallography is a powerful technique for accurate structure analysis, its potential has not been fully realized because of the limitation by the crystal size it requires. However, this will be changed with the revolutionary TOPAZ neutron single crystal diffractometer at Oak Ridge Laboratory, which allows data collection on samples of submillimeter sizes, similar to those used for X-rays. This research is supported by UT Battelle, LLC under Contract No. DE-AC05-00OR22725 for the U.S. Department of Energy, Office of Science.

89 Systematics of toroidal carbon nanotubes and high-genus fullerenes
Chern Chuang, r96223127@ntu.edu.tw and Bih Yaw Jin, byjin@ntu.edu.tw, Department of Chemistry, National Taiwan University, No. 1 Roosevelt Road, Sec 4, Taipei 10617, Taiwan

We develop a generalized classification scheme for toroidal (TCNT) and high-genus fullerenes containing both pentagons and heptagons simultaneously. We show that a particular class of TCNTs with n-fold rotational symmetry and well-defined latitude coordinates can be uniquely characterized by a set of four indices, and each of the indices can be linked to the relative arrangement of pentagons and heptagons in the corresponding torus. Chiral isomers or the corresponding helical derivatives, HCNTs, can also be readily derived either by introducing a chiral vector or dissecting a distorted TCNT through certain longitude. Moreover, we show that a family of HGFs can be decomposed into identical "neck" structures derived from the inner-rim of TCNTs. By replacing the faces of a uniform polyhedron with these necks, an HGF polyhedron corresponding to the vertex configuration of the polyhedron can be obtained.