Symposium Abstracts

ACS Boston, Fall 2010 - CINF & Related Abstracts

bulletCHED Abstracts
bulletCINF Abstracts
bulletCOMP Abstracts
bulletSCHB Astracts

Please note: abstracts are provided "as is" by the American Chemical Society.

horizontal rule

Chemical Information Division

horizontal rule

1 - Semantic envelopment of cheminformatics resources with SADI

Leonid L Chepelev, Egon Willighagen, Michel Dumontier. Department of Biology, School of Computer Science, and Institute of Biochemistry, Carleton University, Ottawa, Ontario, Canada; Department of Pharmaceutical Sciences, Uppsala University, Uppsala, Sweden

The distribution of computational resources as web services and their execution as workflows has enabled facile computation and data integration for bio- and cheminformatics. The Semantic Automated Discovery and Integration (SADI) framework addresses many shortcomings of similar frameworks, such as SSWAP and BioMoby, while allowing for more efficient semantic envelopment of computational chemistry services, resource discovery, and automated workflow organization. In this work, we apply the CHEMINF ontology and Chemical Entity Semantic Specification and demonstrate the usability of the SADI framework in solving common cheminformatics problems starting from RDF-based chemical entity representations. Our eventual goal is to convert all of the functions and functionalities of the Chemistry Development Kit (CDK) into distinct SADI services. This would enable the formulation of all cheminformatics problems currently addressed by CDK, as SPARQL queries, returning meaningful RDF output which can then be easily integrated with existing RDF-based knowledgebases or used for further processing.

horizontal rule

2 - RESTful RDF web services for predictive toxicology

Dr. Nina Jeliazkova PhD. Ideaconsult Ltd., Sofia, Bulgaria

The Open Source Predictive Toxicology Framework, developed by partners of the EC FP7 OpenTox project , aims at providing a unified access to toxicity data and predictive models, as well as validation procedures. This is achieved by i) an information model, based on a common OWL-DL ontology ii) flexibility by linking with related ontologies; iii) availability of data and algorithms via a standardized REST web services interface, where every compound, data set or predictive method has an unique web address, used to retrieve its RDF representation, or initiate the calculations. The OpenTox framework allows building user-friendly applications for toxicological experts or model developers, or direct access by an application programming interface for development, integration and validation of new algorithms. The work presented describes the experience of building RESTful web services, based on RDF representation of resources, to incorporate diverse IT solutions into a distributed and interoperable system.

horizontal rule

3 - Linking the resource description framework to cheminformatics and proteochemometrics

Dr. Egon L. Willighagen, Prof. Jarl E.S. Wikberg. Department of Pharmaceutical Biosciences, Uppsala University, Uppala, Sweden

Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situation.

The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we show how IC50 and Ki values are modeled for a number of biological targets using data from the chEMBL database.

We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies, like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical)
is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows.

horizontal rule

4 - Chemical e-Science Information Cloud (ChemCloud): A semantic web based eScience

Prof. Dr. Adrian Paschke PhD, Stephan Heineke. FIZ Chemie, Berlin, Germany; Department of Mathematics and Computer Science, FU Berlin, Berlin, Germany

Our Chemical e-Science Information Cloud (ChemCloud) - a Semantic Web based eScience infrastructure - integrates and automates a multitude of databases, tools and services in the domain of chemistry, pharmacy and bio-chemistry available at the Fachinformationszentrum Chemie (FIZ Chemie), at the Freie Universitaet Berlin (FUB), and on the public Web. Based on the approach of the W3C Linked Open Data initiative and the W3C Semantic Web technologies for ontologies and rules it semantically links and integrates knowledge from our W3C HCLS knowledge base hosted at the FUB, our multi-domain knowledge base DBpedia (Deutschland) implemented at FUB, which is extracted from Wikipedia (De) providing a public semantic resource for chemistry, and our well-established databases at FIZ Chemie such as ChemInform for organic reaction data, InfoTherm the leading source for thermophysical data, Chemisches Zentralblatt, the complete chemistry knowledge from 1830 to 1969, and ChemgaPedia the largest and most frequented e-Learning platform for Chemistry and related sciences in German language.

horizontal rule

5 - Use of semantic web services to access small molecule
ligand database

Anay P Tamhankar, Aniket S Ausekar. Software Solutions Group, Evolvus, Pune, Maharashtra, India

Resource Description Framework (RDF) and a set of associated technologies like OWL, SPARQL etc..., which form the W3C's semantic web technology stack, are renewing interest in semantic chemistry. Semantic Web Services not only specify syntactic interoperability but also specify and enforce the semantic constraints of messages being transmitted and objects being accessed.

Liceptor database is a small molecule ligand database consisting of approximately 4 million compounds. The database schema consists of fields like molecular properties (2D-structure, molecular weight, molecular formula etc...), molecular descriptors (H-donors, H-acceptors, logP, logD number of rotational bonds etc...) and pharmacological properties (bio-assays, receptors, enzymes, parameters, animal models, therapeutic indications etc...). Pharmaceutical and Bio-Technology companies use this database to mine chemical space for internal research, to prioritize QSAR and pharmacophore studies, for synthetic chemistry endeavors and for advancing hit-to-lead patterns.

The database records are available in multiple formats (relational database, XML, Rdfile etc...) as well as available online through an interactive web application (html format).

The soon to be released version of the database includes access using semantic web services. The ontology is expressed in OWL and RDF defines the overall framework. Typical consumers of the data using this access mechanism are expected to be third-party tool vendors and data aggregators.

Use of semantic web services allows evolution of the schema over time without explicitly communicating the change as well as requiring all data consumers to be changed.

horizontal rule

6 - Usage metrics: Tools for evaluating science monograph collections

Asst Univ Librarian Michelle M Foss, Dr. Vernon Kisling, Ms. Stephanie Haas. Department of Marston Science Library, University of Florida, Gainesville, FL, United States

As academic libraries are increasingly supported by a matrix of databases functions, the use of data mining and visualization techniques offer significant potential for future collection development based on quantifiable data. While data collection techniques are not standardized and results may be skewed because of granularity problems, or faulty algorithms, useful baseline data is extractable and broad trends identified. The purpose of the study is to provide an initial assessment of data associated with the science monograph collection at the Marston Science Library (MSL), University of Florida. The sciences fall within the major Library of Congress Classification schedules of Q, S, and T, excluding TN, TR, TT, and R. The overall strategy of this project is to analyze audience-based circulation patterns, e-book usage, purchases, and interlibrary loan statistics from the academic year July 1, 2008 to June 30, 2009. Such analyses provide an evidence-based framework for future collection decisions.

horizontal rule

7 - Happily ever after or not: E-book collection usage analysis and assessment at USC

Norah Xiao. University of Southern California, United States

With more and more e-book collections being launched by publishers, USC Science and Engineering Library initiated its e-book collection acquisition since late 2008, and one of first and biggest acquired collections is Springer e-books. Now after two years, are users satisfied with this e-book collection? Are they accessing and using it? Like any other e-collection, how well have we, librarians and staff, been coping with this collection in collection development (e.g.  e-book packages from other publishers), access services (e.g. interlibrary loan, off-campus access, e-books technical issues), outreach (e.g. e-book market strategies), and information literacy?

This presentation will overview our assessment of this e-book collection after 2 years. What have we learned from the usage data? And by analyzing the data, how did and can we improve our services to users? It is hoped to our experience can present a proactive implementation plan for others considering comprehensive digital migration of their content, with the goal of not only better coping with the current economic environment, but of spurring development, innovation, and efficiency in the long run.

horizontal rule

8 - From Chemical Abstracts to SciFinder: Transitioning to SciFinder and assessing customer usage

Susan Makar, Stacy Bruss. National Institute of Standards and Technology, United States

The Research Library of the National Institute of Standards and Technology (NIST) monitors SciFinder usage to ensure customers have ready access to the database and to determine who uses it. Usage statistics played a critical role in determining whether to increase the number of seats and which heavy users should help pay for those additional seats. While most NIST researchers were very excited to acquire access to this product, many, who were well acquainted with using the print version of Chemical Abstracts, needed to learn best techniques for searching and browsing the chemistry literature using SciFinder. Transitioning from the printed Chemical Abstracts to SciFinder posed significant challenges to one research project. This presentation will describe how the NIST Research Library used SciFinder usage statistics to make collection development decisions and how library staff worked with NIST researchers to successfully transition from the printed Chemical Abstracts to SciFinder.

horizontal rule

9 - Using Web of Knowledge to identify publishing and
citation patterns of campus researchers at the University of Arkansas

Lutishoor Salisbury, Jeremy S. Smith. University of Arkansas, United States

This presentation will provide information on a project undertaken at the University of Arkansas in Fayetteville to study publications by the campus researchers with an emphasis on the STEM (agricultural sciences, physical science, biological sciences, engineering and mathematics, etc.) disciplines at the macro-level for a three-year period. The overall objective of the study was (1) to provide an overview of the productivity of faculty and researchers in the various departments which could be used in allocating resources for collection development  and (2) to provide evidence-based data of periodical use to assist with collection decisions and to identify collection strengths at the university level. We used the Web of Knowledge database (Science Citation Index, Social Science Citation Index and Arts and Humanities Citation Index) to identify the periodical literature in which our researchers published and those that they cite in their publications to do several analysis including determining the extent to which our researchers are publishing in and citing periodicals from the Elsevier, Wiley and IEEE journal packages. A methodology for extracting citations from Web of Knowledge into an Excel spreadsheet will also be presented. The strengths and weaknesses of the Web of Knowledge for this study will also be highlighted.

horizontal rule

10 - Don't forget the qualitative: Including focus groups in the collection assessment process

Susan Shepherd, Teri M. Vogel. University of California San Diego, United States

To complement our ongoing quantitative collection evaluations based on cost and usage data, the UC San Diego Science & Engineering Library conducted a series of focus groups with graduate students and faculty in our core departments. Our objective was to learn more about how they use the collection for research and teaching, so that we could make more informed decisions about collection management, as well as how best to deploy our staff resources for increased promotion, outreach and instruction. Participants were asked about the resources they use, how they use them, and what gaps they perceived. We also probed their familiarity with the top licensed resources in their fields.

In this presentation we will discuss our focus group methods, results and the next steps we have taken in this assessment, including a follow-up survey to the same departments to obtain more quantitative information about usage of the collection.

horizontal rule

11 - Strategies for the identification and generation of informative compound sets

Michael S Lajiness. Computer Aided Drug Discovery, Eli Lilly & Company, Indianapolis, IN, IN, United States

Mounting pressures in drug discovery research dictate more efficient methods of picking the winners: molecules that actually have a chance to be the drugs of the future. Clearly, these methods need to navigate a highly, multi-dimensional landscape. It is also clear that hard filters should never be used and that a more continuous treatment or prioritization has clear advantages. Further, structural diversity needs to be considered in order for the best structural ideas to be found most efficiently. In addition, history and external sources of information also must be examined. This presentation will describe some of the methods, techniques, and strategies that have been employed by the author over the past 25 years working in cheminformatic that attempt to identify compounds that are likely to provide the most useful information so that one might discover solid leads more rapidly.

horizontal rule

12 - Public-domain data resources at the European Bioinformatics Institute and their use in drug discovery

Christoph Steinbeck. European Bioinformatics Institute, EMBL Outstation - Hinxton, Hinxton, Cambridge, United Kingdom

Small molecules are of increasing interest for bioinformatics in areas such as metabolomics and drug discovery. The recent release of large open chemistry databases into the public domain calls for flexible, open toolkits to process them. These databases and tools will, for the first time, create opportunities for academia and third-world countries to perform state-of-the-art open drug discovery and translational research - endeavors so far a domain of the pharmaceutical industry. This talk will describe a couple of relevant data resources at the European Bioinformatics Institute and will also outline our research on and development of toolkits such as the Chemistry Development Kit and CDK-Taverna to support the exploitation of these data sources.

horizontal rule

13 - Decision making in the face of complicated drug discovery data using the Novartis system for virtual medicinal chemistry (FOCUS)

Donovan Chin. Global Discovery Chemistry, Novartis Institutes for BioMedical Research, Cambridge, MA, United States

This talk will describe some of the broad concepts that led to the development of the Novartis software system for data analysis & virtual medicinal chemistry (FOCUS). The system, which is routinely used globally, is designed to present the scientist with an accessible interface that permits iterative hypothesis testing of many possible chemical candidates while accounting for undesirable ADMET properties. Some of the key principles are to present the data in a way that reflects stored knowledge and facilitates the decision about what compound to make next. We will highlight some of these concepts in applications spaning the range from target identification to drug optimization.

horizontal rule

14 - Integrating chemical and biological data: Insights from 10 years of VERDI

Susan Roberts, W. Patrick Walters, Ryan McLoughlin, Philppe Gabriel, Jonathan Willis, Trevor Kramer. Vertex Pharmaceuticals, Cambridge, MA, United States

VERDI is a software system, originally developed in 2000 at Vertex Pharmaceuticals, for integrating chemical and biological data and delivering this information to drug discovery teams. In addition to traditional table views, VERDI incorporated a number of modules designed to enable scientists to understand relationships between chemical structure and biological data. Over the last 10 years, VERDI has been the primary data access tool for hundreds of scientists at multiple sites around the world. A retrospective evaluation of VERDI has provided us with a number of 'lessons-learned', which come from a multitude of revisions, improvements and new feature additions. Some of these lessons, which are being used as the basis for development of the next generation of data analysis and visualization tools at Vertex, will be presented and discussed in detail.

horizontal rule

15 - Collaborative database and computational models for tuberculosis drug discovery decision making

Dr. Sean Ekins PhD, Dr Justin Bradford PhD, Krishna Dole, Anna Spektor, Kellan Gregory, David Blondeau, Dr Moses Hohman PhD, Dr Barry A Bunin. Collaborative Drug Discovery, Burlingame, CA, United States; Collaborations in Chemistry, Jenkintown, PA, United States; Department of Pharmaceutical Sciences, University of Maryland, Baltimore, MD, United States; Department of Pharmacology, Robert Wood Johnson Medical School, University of Medicine & Dentistry of New Jersey, Piscataway, NJ, United States

Drug discovery is being re-shaped involving large scale collaborations that connect individual researchers using collaborative computational approaches and crowdsourcing. Future drug discovery decisions will ultimately still be made based on massive multidimensional datasets. As an example, the search for molecules with activity against Mycobacterium tuberculosis (Mtb) is employing many approaches in collaborating national and international laboratories. We have developed a database (CDD TB) to capture public and private Mtb data while enabling data mining and collaborations with other researchers. We have also used the public data along with several computational approaches including Bayesian classification models for 220,463 molecules and tested them with external molecules, enabling the discrimination of active or inactive substructures from other datasets in CDD TB. The combination of the database, dataset analysis, and computational models provides new insights into molecular properties and features that are determinants of whole cell activity, allowing prioritization and decision making around molecules.

horizontal rule

16 - Data drive life sciences: The Pyramids meet the Tower of Babel

Dr. Rajarshi Guha. Department of Informatics, NIH Chemical Genomics Center, Rockville, MD, United States

A characteristic feature of modern life science research is the fact that it has become data intensive. As a result we are faced with datasets of massive size and wide variety in terms of the type of data. Examples include massive datasets from next generation sequencing to more complex datasets of chemical structure and activity from high-throughput small molecule screens. In this talk I will discuss some aspects of how one can handle datasets of such size and variability. I will consider examples from computational science and distributed services that allow us to easily and cheaply handle massive datasets to integration approaches that attempt to merge data from multiple sources to obtain a systems level view of the biological effects of small molecules. In all cases, the focus will be data generated from and for small molecule studies.

horizontal rule

17 - Design principles for diversity-oriented synthesis: Facilitating downstream discovery with upfront design

Lisa Marcaurelle. Chemical Biology Platform, Broad Institute, Cambridge, MA, United States

To expand the diversity of our screening collection to access a broad range of biological targets, we aspire to produce libraries of small-molecules that combine the structural complexity of natural products and the efficiency of high-throughput processes. Moreover, we aim to synthesize the complete matrix of stereoisomers for all library members. We reason that this unique collection will enable the rapid development of stereo-structure/activity relationships (SSAR) upon biological testing providing valuable information for the prioritization and optimization of hit compounds. Although our library products may be distinct compared to traditional compound collections, we are faced with fundamental questions relevant to library design: How do you prioritize scaffolds for synthesis? How do you select products with desirable physicochemical properties? In designing DOS libraries we employ a number of cheminformatic methods to tackle such issues and select compounds for synthesis/screening. An overview of our design criteria and decision-making process will be presented.

horizontal rule

18 - Overview: Data-intensive drug design

John H Van Drie. R&D, Van Drie Research, Andover, MA, United States

How do we best make med chem decisions in the face of a lot of data? This is an issue that confronts us at many stages of the drug discovery process: screening, hit-to-lead, early lead optimization, and late-stage lead optimization.  In this session, speakers representing each of these stages will describe how they have successfully tackled these issues, emphasizing general principles over specific computational tools.  Our brains can conveniently handle only about 7 things at a time, and most traditional med chem. decision-making processes reflect that. Already when the number of molecules being considered is in the range of dozens, things get tricky; when that number is in the thousands to hundreds of thousands, one must re-orient one's perspective

horizontal rule

19 - Data-driven development: How ACS Publications uses data to
enhance products and services, and respond to customer needs

Melissa Blaney, Sara Rouhi. ACS Publications, United States

As the scholarly publishing landscape continues to rapidly transform in unprecedented ways, publishers and libraries have had to quickly pivot to accommodate the changing preferences that users have for accessing, collecting, and consuming digital information. ACS Publications has used a data-driven approach to handle these changing customer and end-user needs. Everything from our ACS Mobile iPhone application to our transition from print to online Web products has been shaped by this approach. This presentation will address the role of data in developing new products, enhancing our web presence, and responding to user behavior on the ACS Web Editions Platform.

horizontal rule

20 - Objective collections evaluation using statistics at the MIT Libraries

Mathew Willmott, Erja Kajosalo. Engineering & Science Libraries, Massachusetts Institute of Technology, United States

Recent budget pressures have forced many libraries to reevaluate their collections and substantially cut back on their subscription spending. The task of evaluating a large collection of subscription-based materials, however, is a difficult one. Journals from different subject areas are used differently, and journals from different publishers have their usage measured differently.  Evaluating each individual journal subscription separately would be a monumental task bordering on infeasibility. This paper will discuss the approach taken by the MIT Engineering and Science Libraries in the spring of 2009 and 2010 to evaluate their journal collections, specifically for Springer, Elsevier, and Wiley-Blackwell, the three journal publishers with which these libraries hold the most subscriptions. Discussion will include the gathering and analysis of usage data, publication data, and citation data, as well as the process by which these data were combined to create an objective ranking for each journal. These objective rankings were not final decisions; librarians with subject expertise then evaluated the lower-ranked journals to determine if they were appropriate choices for cancellation, often taking into consideration many additional factors.  However, these objective evaluations helped librarians to more efficiently use their time by indicating which journals may be strong candidates for cancellation, and they helped department liaisons to defend final cancellation choices to a very data-driven faculty. The end result was a more efficient cancellation process as well as a more comprehensive understanding of the library's journal collections.

horizontal rule

21 - Getting the biggest bang for your buck: Methods and strategies for managing journal collections

Grace Baysinger. Stanford University, United States

Chemistry journals have the highest average cost per title of all subject areas. Library collection budgets have not kept pace with price increases and funds to acquire new titles are scarce. Signing big deals for journals has limited flexibility in adapting to changes. These factors have made acquiring journals to support programmatic needs more of a challenge than ever before. This presentation will cover methods, strategies, and tools than can be used to help assess how resources are allocated when developing and managing journal collections.

horizontal rule

22 - Taking a collection down to its elements: Using various assessment techniques to revitalize a library

Leah Solla. Cornell University, 283 Clark Hall, Ithaca, NY, United States

What are the elements of a research literature collection in the physical sciences? How are they being used and what roles are they playing in research and teaching and learning? Who is using them- students, faculty, related disciplines? These are the questions that drove the extensive analyses conducted on the print and electronic literature collections in the Physical Sciences Library at Cornell University in preparation for transitioning the service model from a print-based facility to electronic collections and services. General trends indicated the usage of the collection had been well over 90% electronic for years and the acquisition of books and journals in print had been reduced to minimal levels under budget pressures. But there were significant gaps in the electronic holdings and there remained a small but very active core of the print collection, both warranted further study to enable us to provide the best possible access to these crucial materials in the new service model. The library management system was mined for a variety of data points and complemented with external data sources and user input to build the transition map for the physical sciences literature collections.

horizontal rule

23 - Predicting specific inhibition of cyclophilins A and B using docking, growing, and free energy perturbation calculations

Somisetti V Sambasivarao, Orlando Acevedo. Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, United States

Cyclophilins (Cyp) belong to the enzyme class of peptidyl-prolyl isomerases which catalyze the cis-trans conversion of prolyl bonds in peptides and proteins. Twenty human Cyp isoenzymes have been reported and many are excellent targets for the inhibition of hepatitis C virus replication and multiple inflammatory diseases and cancers. Given the complete conservation of all active site residues between many of the enzymes, i.e., CypA, CypB, CypC and CypD, a better understanding of how to specifically inhibit individual targets could potentially reduce reported side effects in current treatments. Docking and growing programs have been used to construct protein-ligand complexes for a variety of reported selective inhibitors, including acylurea and aryl 1-indanylketone derivatives. Free-energy perturbation/Monte Carlo (FEP/MC) calculations have been utilized to quantitatively reproduce the free energies of binding for the inhibitors in multiple Cyp active sites in order to elucidate the origin of the specificity for the compounds.

horizontal rule

24 - Using aggregative web services for drug discovery

Dr. Qian Zhu PhD, Dr. Michael S. Lajiness PhD, Dr. David J. Wild PhD. School of Informatics and Computing, Indiana University, Bloomington, IN, United States

Recent years have seen a huge increase in the amount of publicly-available information pertinent to drug discovery, including online databases of compound and bioassay information; scholarly publications linking compounds with genes, targets and diseases; and predictive models that can suggest new links between compounds, genes, targets and diseases. However, there is a distinct lack of data mining tools available to harness this information, and in particular to look for information across multiple sources. At Indiana University we are developing an aggregative web service framework to solve this kind of problems. It offers a new approach to data mining that crosses information source types to look at the "big picture" and to identify corroborating or conflicting information from models, assays, databases and publications.

horizontal rule

25 - Semantifying polymer science using ontologies

Dr. Edward O. Cannon PhD, Dr. Adams Nico, Prof. Peter Murray-Rust. Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom

Ontologies are graph based, formal representations of information in a domain. Currently, there is a large interest in ontologies for biology and medicine, though little effort has been concentrated in the field of chemistry, let alone polymer science. We have developed a number of ontologies for polymer science: properties, measurement techniques and measurement conditions, using the Web Ontology Language. These ontologies will help facilitate the standardization of data exchange formats in polymer science by providing a common domain of knowledge. The properties ontology contains over 150 properties and has been integrated with the measurement techniques and conditions ontology, to give information on how a property is measured and under what conditions. The ontologies will be of use to polymer scientists wishing to reach a consensus in this area of knowledge. The ontologies also have the advantage that they can be integrated into software applications to leverage this knowledge.

horizontal rule

26 - Toxicity reference database (ToxRefDB) to develop predictive toxicity models and prioritize compounds for future toxicity testing

Hao Tang, Hao Zhu PhD, Liying Zhang, Alexander Sedykh PhD, Ann Richard PhD, Ivan Rusyn MD, PhD, Prof. Alexander Tropsha PhD. Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States; Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States; National Center for Computational Toxicology, Office of Research&Developoment, U.S. Environmental Protection Agency, Chapel Hill, NC, United States; Department of Environmental Sciences and Engineering, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

EPA's ToxCast program aims to use in vitro assays to predict chemical hazards and prioritize chemicals for toxicity testing. We employed the predictive QSAR workflow to develop computational toxicity models for ToxCast compounds with historical animal testing results available from ToxRefDB. To ensure model stability and robustness, multiple classifiers and 5-fold external cross-validation were applied. Results show that for three of the 78 toxicity endpoints, including one chronic and two reproductive endpoints, the Correct Classification Rate for external validation datasets was above 0.6 for all types of QSAR models. Our studies suggest that it is feasible to develop QSAR models for some endpoints, which could be further augmented by in vitro assay measures. The validated toxicity models were used for virtual screening of 50,000 chemicals compiled for the REACH program. The compounds predicted as toxic could be regarded as candidates for future toxicity testing. Abstract does not reflect EPA policy.

horizontal rule

27 - OrbDB: A database of molecular orbital interactions

Matthew A. Kayala, Chloe A. Azencott, Dr. Jonathan H. Chen PhD, Prof. Pierre F. Baldi PhD. Department of Computer Science, University of California - Irvine, Irvine, CA, United States

The ability to anticipate the course of a reaction is essential to the practice of chemistry. This aptitude relies on the understanding of elementary mechanistic steps, which can be described as the interaction of filled and unfilled molecular orbitals. Here, we create a database of mechanistic steps from previous work on a rule-based expert system (ReactionExplorer). We derive 21,000 priority ordered favorable elementary steps for 7800 distinct reactants or intermediates. All other filled to unfilled molecular orbital interactions yield 106 million unfavorable elementary steps. To predict the course of reactions, one must

recover the relative priority of these elementary steps. Initial cross-validated results for a neural network on several stratified samples indicate we are able to retrieve this ordering with a precision of 98.9%. The quality of our database makes it an invaluable resource for the prediction of elementary reactions, and therefore of full chemical processes.

horizontal rule

28 - Novel approach to drug discovery integrating chemogenomics and QSAR modeling: Applications to anti-Alzheimer's agents

Rima Hajjo, Dr. Simon Wang PhD, Prof. Bryan L. Roth MD, PhD, Prof. Alexander Tropsha PhD. Department of Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States; Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Chemogenomics is an emerging interdisciplinary field relating the receptorome-wide biological screening to functional or clinical effects of chemicals. We have developed a novel chemogenomics approach combining QSAR modeling, virtual screening (VS), and gene expression profiling for drug discovery. Gene signatures for the Alzheimer's disease (AD) were used to query the Connectivity Map (cmap, to identify potential anti-AD agents. Concurrently, QSAR models were developed for the serotonin, dopamine, muscarinic and sigma receptor families implicated in the AD. The models were used for VS of the World Drug Index database to identify putative ligands. 12 common hits from QSAR/VS and cmap studies were subjected to parallel binding assays against a panel of GPCRs. All compounds were found to bind to at least one receptor with binding affinities between 1.7 - 9000 nM. Thus, our approach afforded novel experimentally confirmed GPCR ligands that may be implied as putative treatments for the AD.

horizontal rule

29 - Cheminformatics improvements by combining semantic web technologies, cheminformatical representations, and chemometrics for statistical modeling and pattern recognition

Dr. Egon L. Willighagen. Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Uppland, Sweden

My research focuses on the methods needed for large-scale molecular property prediction, using semantic web, cheminformatics, and chemometrics methods. Originally starting with a Dictionary on Organic Chemistry website, research was started to find methods to accurately disseminate molecular knowledge, resulting in participation in Open Source cheminformatics projects, including Jmol, JChemPaint, and the Chemical Markup Language project, and an oral presentation at the "2000 Chemistry & Internet" conference. In that year, the applicant founded together with the Jmol and JChemPaint project leaders the Chemistry Development Kit (CDK), which is now a highly cited Open Source cheminformatics toolkit. Between 2001 and 2006 the applicant continued research in the area of data analysis with a PhD thesis on the "Representation of Molecules and Molecular Systems in Data Analysis and Modeling" with Prof. dr L.M.C. Buydens at the Analytical Chemistry Department at the Radboud University Nijmegen. The thesis studies the interaction of representation and the statistics and shows how tightly these need to match. Topics of the thesis include: a critical analysis of the use of proton and carbon NMR in QSAR; the use of Open Source, Open Data, and Open standards in interoperability in cheminformatics; the clustering of crystal structures using a novel similarity measure; and, the use of new supervised self-organizing maps in pattern recognition in crystallography. Part of the research was performed in the group of dr P. Murray-Rust at Cambridge University. Later research focused on the use of semantic technologies to reduce error in the aggregation and exchange of molecular data. Recent work applies developed technologies to cheminformatics in general and QSAR and metabolite identification in particular, with dr C. Steinbeck at Cologne University in Germany, and with dr R. van Ham at Wageningen University within the Netherlands Metabolomics Center. The applicant recently joined the development team of the award-winning cheminformatics-platform Bioclipse in Uppsala with Prof. J. Wikberg in Sweden, to continue his research in improving interoperability and reproducibility in cheminformatics and pharmaceutical bioinformatics and proteochemometrics in particular. This implies continued CDK development, development of semantic methods in computational chemistry, and making these technologies accessible to the non-programming chemist by supporting the development of cheminformatics in bench-chemist-oriented platforms such as Bioclipse and Taverna.

horizontal rule

30 - Prediction of consistent water networks in uncomplexed protein binding sites based on knowledge-based potentials

Michael Betz, Gerd Neudert, Gerhard Klebe. Pharmaceutical Chemistry, Philipps-University Marburg, Marburg, Germany

Within the active site of a protein water fulfills a variety of different roles. Solvation of hydrophilic parts stabilizes a distinct protein conformation, whereas desolvation upon ligand binding may lead to a gain of entropy. In an overwhelming number of cases, water molecules mediate interactions between protein and the bound ligand. Therefore, a reliable prediction of water molecules participating in ligand binding is essential for docking and scoring, and is necessary to develop strategies in ligand design. We require some reasonable estimates about the free energy contributions of water to binding.
Useful parameters for such estimations are the total number of displaceable water molecules and the probabilities for their displacement upon ligand binding. These parameters depend on specific interactions with the protein and other water molecules, and thus the positions of individual water molecules.

The high flexibility of water networks makes it difficult to observe distinct water molecules at well defined positions in structure determinations. Thus, experimentally observed positions of water molecules have to be assessed critically, bearing in mind that they represent an average picture of a highly dynamic equilibrium ensemble. Moreover, there are many structures with inconsistent and incomplete water networks.

To address these deficiencies we developed a tool that predicts possible configurations of complete water networks in binding pockets in a consistent way. It is based on the well established knowledge-based potentials implemented into DrugScore, which also allow for a reasonable differentiation between "conserved" and "displaceable" water molecules. The potentials used were derived specifically for water positions as observed in small molecule crystal structures in the CSD.
To account for the flexibility and high intercorrelation we apply a clique-based approach, resulting in water networks maximizing the total DrugScore.
To incorporate as much known information as possible about a given target, we also allow to include constraints defined by experimentally observed water positions.

Our tool provides a useful starting point whenever a possible configuration of water molecules need to be estimated in an uncomplexed protein, and suggests their spatial positions and their classification with respect to some kind of affinity prediction.

In first tests we were able to get classifications and positional predictions which are in good agreement with crystallographically observed water molecules with remarkably small deviations.

horizontal rule

31 - Functional binders for non-specific binding: Evaluation of virtual screening methods for the elucidation of novel transthyretin amyloid inhibitors

Carlos J.V. Simőes, Trishna Mukherjee, Prof. Richard M. Jackson PhD, Prof. Rui M.M. Brito PhD. Department of Chemistry, Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Institute of Molecular and Cellular Biology, University of Leeds, Leeds, West Yorkshire, United Kingdom

Inhibition of fibril formation by stabilization of the native form of transthyretin (TTR) is a viable approach for the treatment of Familial Amyloid Polyneuropathy that has been gaining momentum in the field of amyloid research. Herein, we present a benchmark of five virtual screening strategies to identify novel TTR stabilizers: (1) 2D similarity searches with chemical hashed fingerprints, pharmacophore fingerprints and UNITY fingerprints, (2) 3D-searches based on shape, chemical and electrostatic similarity, (3) LigMatch, a ligand-based method employing multiple templates, (4) 3D- pharmacophore searches, and (5) docking to consensus X-ray crystal structures. By combining the best-performing VS protocols, a small subset of molecules was selected from a tailored library of 2.3 million compounds and identified as representative of multiple series of potential leads. According to our predictions, the retrieved molecules present better solubility, halogen fraction and binding affinity for both TTR pockets than the stabilizers discovered to date.

horizontal rule

32 - Using the oreChemexperiments ontology: Planning and enacting chemistry

Prof Jeremy G Frey, Mark I Borkum, Prof Carl Lagoze, Dr. Simon J Coles. School of Chemistry, Univeristy of Southampton, Southampton, Hants, United Kingdom; Department of Information Science, Cornell Univeristy, Ithica, NY, United States

This paper presents the oreChem Experiments Ontology, an extensible model that describes the formulation and enactment of scientific methods (referred to as “plans”), designed to enable new models of research and facilitate the dissemination of scientific data on the Semantic Web. Currently, a high level of domain-specific knowledge is required to identify and resolve the implicit links that exist between digital artefacts, constituting a significant barrier-to-entry for third parties that wish to discover and reuse published data. The oreChem ontology radically simplifies and clarifies the problem of representing an experiment to facilitate the discovery and re-use of the data in the correct context. We describe the main parts of the ontology and detail the enhancements made to the Southampton eCrystals repository to enable the publication of oreChem metadata.

horizontal rule

33 - CHEMINF: Community-developed ontology of chemical information and algorithms

Leonid L Chepelev, Janna Hastings, Egon Willighagen, Nico Adams, Christoph Steinbeck, Peter Murray-Rust, Michel Dumontier. Department of Biology, School of Computer Science, and Institute of Biochemistry, Carleton University, Ottawa, Ontario, Canada; Chemoinformatics and Metabolism Team, European Bioinformatics Institute, Cambridge, United Kingdom; Department of Pharmaceutical Sciences, Uppsala University, Uppsala, Sweden; Department of Chemistry, Unilever Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom

In order to truly convert RDF-encoded chemical information into knowledge and break out of domain- and vendor-specific data silos, reliable chemical ontologies are necessary. To date, no standard ontology that addresses all chemical information representation and service integration needs has emerged from previously proposed ontologies, ironically threatening yet another “Tower of Babel” event in cheminformatics. To avoid resultant substantial ontology mapping costs, we hereby propose CHEMINF, a community-developed modular and unified ontology for chemical graphs, qualities, descriptors, algorithms, implementations, and data representations/formalisms. Further, CHEMINF is aligned with ontologies developed within the OBO Foundry effort, such as the Information Artifact Ontology. We present the application of CHEMINF to efficiently integrate two RDF-based chemical knowledgebases with different representation structures and aims, but common classes and properties from CHEMINF. Finally, we discuss the steps taken to ensure applicability of this ontology in the semantic envelopment of computational chemistry resources, algorithms, and their output.

horizontal rule

34 - Chemical entity semantic specification: Knowledge representation for efficient semantic cheminformatics and facile data integration

Leonid L Chepelev, Michel Dumontier. Department of Biology, School of Computer Science, and Institute of Biochemistry, Carleton University, Ottawa, Ontario, Canada

Though the nature of RDF implies the ability to interoperate and integrate diverse knowledgebases, designing adequate and efficient RDF-based representations of knowledge concerning chemical entities is non-trivial. We hereby describe Chemical Entity Semantic Specification (CHESS), which captures chemical descriptors, molecular connectivity, functional composition, and geometric structure of chemical entities and their components. CHESS also handles multiple data sources and multiple conformers for molecules, as well as reactions and interactions. We demonstrate the generation of a chemical knowledgebase from disparate data sources, using which we conduct an analysis of the implications of design choices taken in CHESS on the efficiency of solutions for some classical cheminformatics problems, including molecular similarity searching and subgraph detection. We do this through automated conversion of SMILES-encoded query fragments into SPARQL queries and DL-Safe rules. Finally, we discuss approaches to identification of potential reaction participants and class members in chemical entity knowledgebases represented with CHESS.

horizontal rule

35 - Semantic assistant for lipidomics researchers

Alexandre Kouznetsov, Rene Witte, Christopher J.O. Baker. Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, New Brunswick, Canada; Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada

Lipid nomenclature has yet to become a robust research tool for lipidomics or lipid research in general. This is in part because no rigorous structure based definitions exist for membership of specific lipid classes has existed. Recent work on the OWL-DL Lipid Ontology with defined axioms for class membership and has provided new opportunities to revisit the lipid nomenclature issue [1], [2]. Also necessary is a framework for sharing these axioms with scientists during scientific discourse and the drafting of publications. To achieve this we introduce here a new paradigm for Lipidomics researchers in which a client side application tags raw text about lipids with information, such as canonical name or relevant functional groups, derived from the ontology and is delivered using web services. Our approach includes following core components: (i)Semantic Assistant Framework [6]; (ii) Lipid ontology [4]; (iii) Ontological NLP methodology; (iv) Ontology Axiom-extractor for the GATE framework. The Semantic Assistant Framework is aservice-oriented architecture used to enhancing existing end-user clients, such Open Office Writter, with online Lipidomics text analysis capabilities provided as a set of web services. The Ontological NLP methodology links Lipid named entities occurred in a document opened on client side with existing ontologies on server side. The Ontology Axiom-extractor annotates each named entity with canonical name, class name and related class axioms providing annotation for documents on the client side. The proposed system is scalable and extensible allowing researchers to easily customize the information to be delivered as annotations depending on the availability of chemical ontologies with defined axioms linked to canonical names for chemical entities.

[1] Baker CJO, Low HS, Kanagasabai R, and Wenk MR, (2010) Lipid Ontologies, 3rdInterdisciplinary Ontology Conference, Tokyo, Japan, February 27-28, 2010

[2] Low HS, Baker CJO, Garcia A and Wenk M., OWL-DL (2009), Ontology for Classification of Lipids, International Conference on Biomedical Ontology, Buffalo, New York, July 24-26

[3] Witte R., Gitzinger T., (2008), A General Architecture for Connecting NLP Frameworks and Desktop Clients Using Web Services, 13th International Conference on Applications of Natural Language to Information Systems

[4] Lipid Ontology available at

horizontal rule

36 - ChemicalTagger:
A tool for semantic text-mining in chemistry

Dr Lezan Hawizy, Dave M Jessop, Professor Peter Murray-Rust. The Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom

The primary method for scientific communication is in the form of published scientific articles and theses and the use of natural language combined with domain-specific terminology. As such, they contain unstructured data.

Given the unquestionable usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. Using chemical synthesis procedures as an exemplar, we present ChemicalTagger. ChemicalTagger is a tool that combines chemical entity recognisers such as OSCAR with tokenisers, part-of-speech taggers and shallow parsing tools to produce a formal structure of reactions.

This extracted data can then be expressed in RDF. This allows for the generation of highly informative visualisations, such as visual document summaries, structured querying and further enrichment can be provided by linking with domain specific ontologies.

horizontal rule

37 - From canonical numbering to the analysis of enzyme-catalyzed reactions: 32 years of publishing in JCIM (JCICS)

Prof. Johann Gasteiger, Prof. Johann Gasteiger. Computer-Chemie-Centrum, University of Erlangen-Nuremberg, Erlangen, Germany; Molecular Networks GmbH, Erlangen, Germany

In 1972 we embarked on the development of a program for computer-assisted synthesis design which eventually led to the present system THERESA. Along the way many fundamental problems had to be solved such as the unique representation of chemical structures published in 1977. This work laid the foundation for building the Beilstein database. Methods had to be developed for the computer representation of chemical reactions which formed the basis for constructing the ChemInform reaction database. Recent work has concentrated on the analysis of biochemical reactions, the prediction of metabolism and the risk assessment of chemicals.

horizontal rule

38 - Fifteen years of JCICS

Dr. George W Milne. NCI, NIH (Retd), Williamsburg, VA, United States

During the period 1989-2004 when I was Editor of the Journal of Chemical Information and Computer Sciences (JCICS), the predecessor of the Journal of Chemical Information and Modeling (JCIM), many papers appeared addressing contemporary problems in computational chemistry.

Some of these problems were completely settled and significant progress was made with others. A third group, in spite of numerous publications, defied attempts at resolution and remain to this day as challenges to computational chemists.

As JCIM, aka JCICS, aka J. Chem. Doc embarks upon its second 50 years, the progress recorded during the 1990s and the advances in computer hardware and software are reviewed. With a longer perspective, the impact of computers on chemistry is considered resolved.

horizontal rule

39 - Fifteen years in chemical informatics: Lessons from the past, ideas for the future

Dimitris Agrafiotis PhD. Pharmaceutical Research & Development, Johnson & Johnson, Spring House, Pennsylvania, United States

A unique aspect of chemical informatics is that it has been heavily influenced and shaped by the needs of the pharmaceutical industry. As this industry undergoes a profound transformation, so will the field itself. In this talk, we reflect on the experiences of the past and explore the possibilities we see for the future. These possibilities lie on the convergence of chemistry, biology, and information technology, and will require thinking and working across scientific and organizational boundaries in a way that has never been previously possible.

horizontal rule

40 - Applications of wavelets in virtual screening

Prof Val Gillet PhD, Mr Richard Martin, Dr Eleanor Gardiner, Dr Stefan Senger. Department of Information Studies, University of Sheffield, Sheffield, United Kingdom; Computational and Structural Chemistry, GlaxoSmithKline, Stevenage, Hertfordshire, United Kingdom

The interactions which a small molecule can make with a receptor can be modelled using three-dimensional molecular fields, such as GRID fields, however, the cumbersome nature of these fields makes their storage and comparison computationally expensive. Wavelets are a family of multiresolution signal analysis functions which have become widely used in data compression. We have applied the non-standard wavelet transform to generate low-resolution approximations (wavelet thumbnails) of finely sampled GRID fields, without loss of information. We demonstrate various applications of wavelet thumbnails including the development of an alignment method to enable the comparison of the wavelet representations of GRID fields in arbitrary orientation.

horizontal rule

41 - Privileged substructures revisited: Target community-selective scaffolds

Jürgen Bajorath. Department of Life Science Informatics, University of Bonn, Germany

Molecular scaffolds that preferentially bind to a given target family, so-called “privileged” substructures, have long been of high interest in drug discovery. Many privileged substructures have been proposed, in particular, for G protein coupled receptors and protein kinases. However, the existence of truly privileged structural motifs has remained controversial. Frequency-based analysis has shown that many scaffolds thought to be target class-specific also occur in compounds active against other types of targets. In order to explore scaffold selectivity on a large scale, we have carried out a systematic survey of publicly available compound data and defined target communities on the basis of ligand-target networks. The analysis was based on compound potency data and target pair potency-derived selectivity. More than 200 hierarchical scaffolds were identified, each represented by at least five compounds, which exclusively bound to targets within one of ca. 20 target communities. By contrast, currently available compound data is too sparsely distributed to assign target-specific scaffolds. Most scaffolds that exclusively bind to a single target within a community are only represented by one or two compounds in public domain databases. However, characteristic selectivity patterns are found to evolve around community-selective scaffolds that can be explored to guide the design of target-selective compounds.

horizontal rule

42 - Automated retrosynthetic analysis: An old flame rekindled

Prof Peter Johnson PhD, Anthony P Cook, James Law, Mahdi Mirzazadeh, Dr Aniko Simon PhD. School of Chemistry, University of Leeds, Leeds, United Kingdom; Simbiosys Inc, Toronto, Ontario, Canada

The last century saw truly innovative research aimed at the creation of systems for computer aided organic synthesis design (CAOSD). However, such systems have not achieved significant user acceptance, perhaps because they required manual creation of reaction knowledge bases, a time consuming task which requires considerable synthetic chemistry expertise. More recent systems like ARChem1 circumvent this problem by automated abstraction of transformation rules from very large databases of specific examples of reactions. ARChem is still a work in progress and specific problems which are being addressed include:

a) dentification of precise structural characteristics of each reaction, often requiring knowledge of reaction mechanism;
b) treatment of interfering functional groups;
c) minimising the combinatorial explosion inherent in automated multistep retrosynthesis;
d) treatment of the results of extensive recent research into enantioselective and stereoselective reactions.

1 Law et al J. Chem. Inf. Model., 2009, 49 (3), pp 593-602

horizontal rule

43 - Dietary supplements: Free evidence-based resources for the cautious consumer

MLS Brian Erb. McGoogan Library of Medicine, University of Nebraska Medical Center, Omaha, NE, United States

Vitamin, mineral and dietary supplements are a 70 billion dollar industry. With marginal FDA regulation, it can be difficult to evaluate the health claims of a given product. How can the skeptical consumer distinguish a promising nutritional supplement from a substance that lacks the evidence to back its nutritional claims? This short presentation will highlight some evidence-based Internet sources that will help the consumer navigate the dietary supplement minefield. These sources will not only help the consumer separate bogus claims from research supported evidence, but also help the consumer make informed nutritional decisions regarding which supplements might be a relevant and useful part of their healthy diet and lifestyle. The resources to be explored have been collected in a UNMC libguide at for ease of navigation and dissemination.

horizontal rule

44 - What lessons learned can we generalize from evaluation and usability of a health website designed for lower literacy consumers?

Mary J Moore PhD, Randolph G. Bias PhD. Department of Health Informatics, University of Miami Miller School of Medicine, Miami, FL, United States; Department of Information, University of Texas at Austin, Austin, Texas, United States

Objectives: Researchers conducted multifaceted usability testing and evaluation of a website designed for use by those with lower computer literacy and lower health literacy. Methods included heuristic evaluation by a usability engineer, remote usability testing and face-to-face testing. Results: Standard usability testing methods required modification, including interpreters, increased flexibility for time on task, presence of a trusted intermediary, and accommodation for family members who accompanied participants. Participants suggested website redesign, including simplified language, engaging and relevant graphics, culturally relevant examples, and clear navigation. Conclusions: User-centered design was especially important for this audience. Some lessons learned from this experience are echoed in usability and evaluation of commercial sites designed for similar audiences, and may be generalizable.

horizontal rule

45 - National Library of Medicine resources for consumer health information

Michelle Eberle. National Network of Libraries of Medicine - New England, Shrewsbury, MA, United States

Come learn about free, high quality web resources for consumer health information from the National Library of Medicine. We will cover MedlinePlus, a resource for health information for the public. The presenter will take you on a guided tour of and other specialized web resources for consumer health information including the Drug Information Portal, DailyMed and the Dietary Labels Supplement Database. The program will wrap up with a brief introduction to You will leave this program equipped with expertise to find, critically appraise, and use online health information more effectively.

horizontal rule

46 - Better prescription for information: Dietary supplements online

Gail Y. Hendler MLS. Hirsh Health Sciences Library, Tufts University, Boston, MA, United States

Dietary supplements are becoming staples in the health regimens of a growing number of consumers worldwide. According to the most recent National Health and Nutrition Examination Survey, 52% percent of adults in the United States reported taking a nutraceutical in the past month. Consumers turn to these products believing they are safe and effective because they are “all natural.” Supplementing knowledge about the benefits and the potential risks associated with nutraceutical use requires information resources that are authoritative, accurate and readable to a large and general audience. This presentation will provide recommendations for locating high-quality, freely available online resources that today's consumers need to support decision-making. Featured resources will include books, databases and websites that discuss the pros and cons and provide the evidence for better use of dietary supplements, herbs and functional foods.

horizontal rule

47 - Overview of the linking open drug data task

Eric Prudhommeaux, Egon Willighagen, Susie Stephens. , W3C/MIT, Cambridge, MA, United States; Uppsala University, Uppsala, Sweden; , Johnson and Johnson, United States

There is much interesting information about drugs that is available on the Web. Data sources range from medicinal chemistry results, to the impacts of drugs on gene expression, through to the results of drugs in clinical trials.

Linking Open Drug Data (LODD) is a task within the W3C's Health Care Life Sciences Interest Group. LODD has surveyed publicly available data sets about drugs, created Linked Data representations of the data sets and interlinked them together, and identified interesting scientific and business questions that can be answered once the data sets are connected. The task also actively explores best practices for exposing data in a Linked Data representation.

The figure below shows part of the data sets that have been published and interlinked by the task so far.

The LODDse data sets are represented in dark gray, while light gray represents other Linked Data from the life sciences, and white indicates data sets from different domains. Collectively, the LODD data sets consist of over 8 million RDF triples, which are interlinked by more than 370,000 RDF links. This presentation will introduce the LODD task and show examples of recent.

horizontal rule

48 - Control, monitoring, analysis and dissemination of laboratory physical chemistry experiments using semantic web and broker technologies

Prof Jeremy G Frey, Stephen Wilson. School of Chemistry, Univeristy of Southampton, Southampton, Hants, United Kingdom

A suite of software was developed to control and monitor experimental and environmental data and used for probing of the air/water interface using Second Harmonic Generation. A centralised message broker enabled a common communication protocol between all objects in the system; experimental apparatus, data loggers, storage solutions and displays. The data and context are captured and represented in ways compatible with the Semantic Web. Experimental plans and the enactment are described using the oreChem experiments ontology; this provides the means to capture the metadata associated with the experimental process and the resulting data. Environmental data was stored in the Open Geospatial Consortium Sensor Observation Service (SOS). The SOS is part of the Sensor Web Enablement architecture; this describes a number of interoperable interfaces and metadata encodings for integrating sensors webs into the cloud. A mashup web interface was produced to link all these sources of information from a single point.

horizontal rule

49 - Semantic analysis of chemical patents

Dave M Jessop, Dr Lezan Hawizy, Prof. Peter Murray-Rust, Professor Robert C Glen. The Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom

Chemical patents are a rich source of technical and scientific information. They include meta-data, such as bibliographic information, as well as scientific data relating to reactions and synthesis experiments. However, they are lengthy, largely unstructured and rich in technical terminology such that it takes a signification amount of human efforts for analyses. This would make them an ideal candidate for 'semantification'. As a demonstration, an RDF triplestore of chemical patents is created. The patents, provided by the European Patent Office, are in an XML format. Document segmentation is used initially to extract the relevant information, mainly bibliographic information and experimental paragraphs. The experimental paragraphs are then processed using Natural Language Processing tools to extract the various components of the chemical reaction; roles, such as reactant, product or solvent, are then assigned. This extracted information is then converted into RDF and stored in a triplestore where it can then be queried, visualised and basic inferences can be made.The ultimate goal of this semantic representation, is to make data available and re-usable by the scientific community.

horizontal rule

50 - Data mining and querying of integrated chemical and biological information using Chem2Bio2RDF

Dr David J Wild, Bin Chen, Dr Ying Ding, Xiao Dong, Huijun Wang, Dazhi Jiao, Dr Qian Zhu, Madhuvanti Sankaranarayanan. School of Informatics and Computing, Indiana University, Bloomington, IN, United States; School of Library and Information Science, Indiana University, Bloomington, IN, United States

We have recently developed a freely-available resource called Chem2Bio2RDF ( that consists of chemical, biological and chemogenomic datasets in a consistent RDF framework, along with SPARQL querying tools that have been extended to allow chemical structure and similarity searching. Chem2Bio2RDF allows integrated querying that crosses chemical and biological information including compounds, publications, drugs, genes, diseases, pathways and side-effects. It has been used for a variety of applications including investigation of compound polypharmacology, linking drug side-effects to pathways, and identifying potential multi-target pathway inhibitors. In the work reported here, we describe a new set of tools and methods that we have developed for querying and data mining in Chem2Bio2RDF, including: Linked Path Generation (a method for automatically identifying paths between datasets and generating SPARQL queries from these paths); an ontology for integrated chemical and biological information; a Cytoscape plugin that allows dynamic querying and network visualization of query results; and a facet-based browser for browsing results.

horizontal rule

51 - Mining and visualizing chemical compound-specific chemical-gene/disease/pathway/literature relationships

Dr. Qian Zhu, Prajakta Purohit, Jong Youl Choi, Seung-Hee Bae, Dr. Judy Qiu, Prof. Ying Ding, Prof. David Wild. School of Informatics and Computing, Indiana University, Bloomington, IN, United States; School of Library & Information Science, Indiana University, Bloomington, IN, United States; Department of Computer Science, Indiana University, Bloomington, IN, United States

In common with most scientific disciplines, there has in the last few years been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery, owing to a variety of factors including improvements in experimental technologies. So the big challenge for us is how we can use all of this information together in an intelligent way, in an integrative fashion.

We are developing an application to mine relationships between Chemical and Gene/Disease/Pathway/Literature, and visualize them. It aims to help answer the question “anything else should I know about this compound?” from a medicinal chemistry perspective based on the full picture of chemicals. For the mining part, we have already developed an aggregating web services, named WENDI, which calls multiple individual or atomic, web services including diversity of compound-related data sources, predictive models and self-developed algorithms, and aggregates the results from these services in XML; For visualizing, two ways to go: First, we create a RDF reasoner to convert XML from WENDI to RDF, find inferred relationships based on RDF, rank evidences focused on chemical-disease, and print all evidences out by using SWP faceted browser based on Longwell, it mixes the flexibility of the RDF data model with the faceted browser to enable users to browse complex RDF triples in a user-friendly and meaningful manner; Second, we place all relationships from WENDI into a chemical space consisted of 60M PubChem compounds, then clustered/highlighted particular chemical compounds with specific attributes, like gene/disease/pathway/literature by using PubChemBrowse, which is a customized visualization tool for cheminformatics research and provides a novel 3D data point browser that displays complex properties of massive data on commodity clients and supports fast interaction with an external property database via semantic web interface.

horizontal rule

52 - What makes polyphenols good antioxidants? Alton Brown, you should take notes...

Emilio Xavier Esposito PhD. The Chem21 Group, Inc, Lake Forest, Illinois, United States

The dominant physical feature of antioxidants are phenols; polyphenols according to Alton Brown. The proposed antioxidant-tyrosinase mechanism, based on a series of experimentally determined mushroom tyrosinase structures, provides insight to the molecular interactions that drive the reaction. While the enzyme structures illustrate the important molecular interactions for tyrosinase inhibition, the enzyme structures do not always facilitate the understanding of what makes a good inhibitor or the mechanism of the reaction. Using an antioxidant (tyrosinase inhibitors) dataset of 626 compounds (from the linear discriminate analysis research of Martín et al. Euro J Med Chem 42 p1370-1381, 2007) we constructed binary QSAR models to indicate the important antioxidant molecular features. Exploring models constructed from molecular descriptors based on fingerprints (MACCS keys), traditional molecular descriptors (2D and 2˝D), VolSurf-like molecular descriptors (3D) and molecular dynamics (4D-Fingerprints), the relationship between polyphenols' biologically relevant molecular features - as determined by each set of descriptors - and their antioxidant abilities will be discussed.

horizontal rule

53 - Engineering and 3D protein-ligand interaction scaling of 2D fingerprints

Jürgen Bajorath. Department of Life Science Informatics, University of Bonn, Bonn, Germany

Different concepts are introduced to further refine and advance molecular descriptors for SAR analysis. Fingerprints have long been among preferred descriptors for similarity searching and SAR studies. Standard fingerprints typically have a constant bit string format and are used as individual database search tools. However, by applying “engineering” techniques such as “bit silencing”, fingerprint reduction, and “recombination”, standard fingerprints can be tuned in a compound class-directed manner and converted into size-reduced versions with higher search performance. It is also possible to combine preferred bit segments from fingerprints of distinct design and generate “hybrids” that exceed the search performance of their parental fingerprints. Furthermore, effective 2D fingerprint representations can be generated from strongly interacting parts of ligands in complex crystal structures. These “interacting fragment” fingerprints focus search calculations on pharmacophore elements without the need to encode interactions directly. Moreover, 3D protein-ligand interaction information can implicitly be taken into account in 2D similarity searching through fingerprint scaling techniques that emphasize characteristic bit patterns.

horizontal rule

54 - In silico binary QSAR models based on
4D-fingerprints and MOE descriptors for prediction of hERG blockage

Prof. Y. Jane Tseng PhD. Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan Republic of China

Blockage of the human ether-a-go-go related gene (hERG) potassium ion channel is a major factor related to cardiotoxicity. Hence, drugs binding to this channel have become an important biological endpoint in side effects screening. We have collected all available biologically active hERG compounds from the hERG literature for a total of 250 structurally diverse compounds. This data set was used to construct a set of two-state hERG QSAR models. The descriptor pool used to construct the models consisted of 4D-fingerprints generated from the thermodynamic distribution of conformer states available to a molecule, 204 traditional 2D descriptors and 76 3D VolSurf-like descriptors computed using the Molecular Operating Environment (MOE) software. One model is a continuous partial least squares (PLS) QSAR hERG binding model. Another related model is an optimized binary QSAR model that classifies compounds as active, or inactive. This binary model achieves 91% accuracy over a large range of molecular diversity spanning the training set. An external test set was constructed from the condensed PubChem bioassay database containing 816 compounds and successfully used to validate the binary model. The binary QSAR model permits a structural interpretation of possible sources for hERG activity. In particular, the presence of a polar negative group at a distance of 6 to 8 Ĺ from a hydrogen bond donor in a compound is predicted to be a quite structure-specific pharmacophore that increases hERG blockage. Since a data set of high chemical diversity was used to construct the binary model, it is applicable for performing general virtual hERG screening.

horizontal rule

55 - Telling the good from the bad and the ugly: The challenge of evaluating pharmacophore model performance

Robert D. Clark PhD. Simulations Plus, Inc., Lancaster, California, United States

Pharmacophore models are useful when they provide qualitative insight into the interactions between ligands and their target macromolecules, and therefore are more akin in many ways to molecular simulations than to quantitative structure activity relationships (QSARs) based on the partition of activity across a set of molecular descriptors. When the performance of a pharmacophore model is assessed quantitatively, it is usually in terms of its ability to recover known ligands or, less often, in terms of how well it distinguishes ligands from non-ligands. This status as a classification technique also sets it apart from more numerical QSAR methods, in part because of fundamental differences in what being "good" means. Carefully defining what "good" classification is, however, can make creative combination with other techniques a productive way to capture the value of their intrinsic complementarity.

horizontal rule

56 - Creative application of ligand-based methods to solve structure-based problems: Using QSAR approaches to learn from protein crystal

Prof. Curt M Breneman, Dr. Sourav Das, Dr. Matt Sundling, Mr. Mike Krein, Prof. Steven Cramer, Prof. Kristin P Bennett, Dr. Charles Bergeron, Mr. Jed Zaretzki. Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, United States; Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY, United States; Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, United States

In practice, there is no inherent disconnect between the descriptor-based cheminformatics methods commonly used for predicting small molecule properties and those that can be used to understand and predict protein behaviors. Examples of such connections include the development of predictive models of protein/stationary phase binding in HIC and ion-exchange chromatography, protein/ligand binding mode characterization through PROLICSS analysis of crystal structures, and the use of PESD binding site signatures for pose scoring and predicting off-target drug interactions. In all of these cases, models were created using descriptors based on protein electronic and structural features and modern machine learning methods that include model validation tools and domain of applicability assessment metrics.

horizontal rule

57 - Computer-aided drug discovery

Prof. William L Jorgensen. Department of Chemistry, Yale University, New Haven, CT, United States

Drug development is being pursued through computer-aided structure-based design. For de novo lead generation, the BOMB program builds combinatorial libraries in a protein binding site using a selected core and substituents, and QikProp is applied to filter all designed molecules to ensure that they have drug-like properties. Monte Carlo/free-energy perturbation simulations are then executed to refine the predictions for the best scoring leads including ca. 1000 explicit water molecules and extensive sampling for the protein and ligand. FEP calculations for optimization of substituents on an aromatic ring and for choice of heterocycles are now common. Alternatively, docking with Glide is performed with the large databases of purchasable compounds to provide leads, which are then optimized via the FEP-guided route. Successful application has been achieved for HIV reverse transcriptase, FGFR1 kinase, and macrophage migration inhibitory factor (MIF); micromolar leads have been rapidly advanced to extraordinarily potent inhibitors.

horizontal rule

58 - Structure-based discovery and QSAR methods: A marriage of convenience

Jose S Duca. Novartis, Cambridge, MA, United States

The art of building predictive models of the relationships between structural descriptors and molecular properties has been historically important to drug design. In the recent years there has been an extraordinary amount of experimental data available from processes designed to accelerate drug discovery in pharma; from high throughput screening and automation applied to library design and synthesis to chemogenomics and microarray analysis. QSAR methods are one of the many tools to predict affinity-related, physicochemical, pharmacokinetic and toxicological properties through analyzing and extracting information from molecular databases and HTS campaigns.

This presentation will cover case studies in which QSAR and Structure-Based Drug Design (SBDD) have worked in concert during the discovery process of pre-clinical candidates. The importance of incorporating time-dependent sampling to improve the quality of the nD-QSAR models (n=3,4) will also be discussed and compared to simplified low dimensional QSAR models. For those cases where structural information cannot be readily available an extension of these methodologies will be discussed in relation to ligand-based approaches.

horizontal rule

59 - Extending the QSAR Paradigm using molecular modeling and simulation

Professor Anton J Hopfinger Ph.D.. College of Pharmacy, MSC 09 5360, University of New Mexico, Albuquerque, NM, United States; Computational Chemistry, The Chem21 Group, Inc., Lake Forest, IL, United States

QSAR analysis and molecular modeling/ simulation methods are often complementary, and when combined in a study yield results greater than the sum of their parts. Modeling and simulation offer the ability to design custom, information-rich trial descriptors for a QSAR analysis. In turn, QSAR analysis is able to discern which of the custom descriptors most fully relate to the behavior of an endpoint of interest. One useful set of custom QSAR descriptors from modeling and simulation for describing ligand-receptor interactions are the grid cell occupancy descriptors, GCODs, of 4D-QSAR analysis. These descriptors characterize the relative spatial occupancy of all the atoms of a molecule over the set of conformations available to the molecule when in a particular environment. GCODS permit the construction of a 4D-QSAR equation for virtual screening, as well as a spatial pharmacophore of the 4D-QSAR equation for exploring mechanistic insight. Applications that can particularly benefit from combining QSAR analysis and modeling/simulation tools are those in which a model chemical system is needed to determine the sought after property. One such application is the transport of molecules through biological compartments, an integral part of many ADMET properties. The reliable estimation of eye irritation is greatly enhanced by simulating the transport of test solutes through membrane bilayers, and using extracted properties from the simulation trajectories as custom descriptors to build eye irritation QSAR models. These key descriptors of the QSAR models, in turn, also permit the investigator to probe and postulate detailed molecular mechanisms of action.

horizontal rule

60 - Overview of activity landscapes and activity cliffs: Prospects and problems

Prof Gerald M Maggiora. Department of Pharmacology & Toxicology, University of Arizona College of Pharmacy, Tucson, AZ, United States; BIO5 Institute, University of Arizona, Tucson, AZ, United States; Translational Genomics Research Institute, Phoenix, AZ, United States

Substantial growth in the size and diversity of compound collections and the capability to subject them to an increasing variety of different high-throughput assays manifests the need for a more systematic and global view of structure-activity relationships. The concepts of chemical space and molecular similarity, which are now well known to the drug-research community, provide a suitable framework for developing such a view. Augmenting a chemical space with activity data from various assays generates a set of activity landscapes, one for each assay. The topography of these landscapes contains important information on the structure-activity relationships of compounds that inhabit the chemical space. Activity cliffs, which arise when similar compounds possess widely different activities, are a particularly informative feature of activity landscapes with respect to SAR. The talk will present an overview of activity landscapes and cliffs and will describe some of the prospects and problems associated with these important concepts.

horizontal rule

61 - Exploring and exploiting the potential of structure-activity

Dr Gerald M Maggiora PhD, Michael S Lajiness. Department of Pharmacology & Toxicology, University of Arizona College of Pharmacy, Tucson, Arizona, United States; Scientific Informatics, Eli Lilly & Co, Indianapolis, IN, United States

It's well known that small structural changes sometimes result in large changes in activity. There have been some recent efforts to identify such changes but little in regards to defining which structural changes are most informative or even real. Also, the missing value problem often obfuscates the ability to detect relevant patterns
if in fact they exist. This presentation will present several ideas and applications for exploring and exploiting Structure-Activity Cliffs. In addition, various visualizations and approaches to communicate the information contained in these "cliffs" will be shared. Examples will be drawn from PubChem.

horizontal rule

62 - What makes a good structure activity landscape? Network metrics and structure representations as a way of exploring activity landscapes

Dr. Rajarshi Guha. Department of Informatics, NIH Chemical Genomics Center, Rockville, MD, United States

The representation of SAR data in the form of landscapes and the identification of activity cliffs in such landscapes is well known. A number of approaches have been described to identifying activity cliffs, including several network based methods such as the SALI approach (JCIM, 2008, 48, 646-658). While a network representation of an SAR landscape moves away from the intuitive idea of rolling hills and steep gorges, it allows us to apply a variety of quantitative analyses. In this talk I will first examine some of the properties of SALI networks using various measures of network structures and attempt to correlate these features with features of the SAR data. While most examples are from relatively small datasets I will highlight some examples from larger datasets from high-throughput screens. While such data can be noisy and contain artifacts I will examine whether the underlying network structure can shed light on specific molecules that may be worth following up. The second focus of the talk will look at the effect of structure representations on the smoothness of the landscape and how one can derive ideas from the SALI characterization to suggest good or bad landscapes.

horizontal rule

63 - Consensus model of activity landscapes and consensus activity cliffs

Jose L Medina-Franco, Karina Martinez-Mayorga, Fabian Lopez-Vallejo. Torrey Pines Institute for Molecular Studies, Port St Lucie, FL, United States

Characterization of activity landscapes is a valuable tool in lead optimization, virtual screening and computational modeling of active compounds. As such understanding the activity landscape and early detection of activity cliffs [Maggiora, G. M. J. Chem. Inf. Model. 2006, 46, 1535] can be crucial to the success of computational models. Similarly, characterizing the activity landscape will be critical in future ligand-based virtual screening campaigns. However, the chemical space and activity landscape are influenced by the particular representation used and certain representations may lead to apparent activity cliffs. A strategy to address this problem is to consider multiple molecular representations in order to derive a consensus model for the activity landscape and in particular identify consensus activity cliffs [Medina-Franco, J. L. et al. J. Chem. Inf. Model. 2009, 49, 477]. The current approach can be extended to indentify consensus selectivity cliffs.

horizontal rule

64 - R-Cliffs: Activity cliffs within a single analog series

Dimitris Agrafiotis PhD. Pharmaceutical Research & Development, Johnson & Johnson, Spring House, Pennsylvania, United States

The concept of activity cliffs has gained popularity as a means to identify and understand discontinuous SAR, i.e., regions of SAR where minor changes in structure have unpredictably large effects on biological activity. To the best of our knowledge, activity cliffs have been invariably evaluated using global measures of molecular similarity that do not take into account the presence of finer substructure among a series of related analogs. In this talk, we look at activity cliffs within a congeneric series, by decomposing them into R-groups and analyzing how activity is affected by changes in a single variation site. The analysis is greatly enhanced by R-group-aware visualization tools such as the SAR maps, which have been enhanced to specifically highlight such discontinuities.

horizontal rule

65 - Chemical structure representation in the DuPont Chemical Information Management Solutions database: Challenges posed by complex materials in a diversified science company

Dr. Mark A Andrews, Dr. Edward S. Wilks. CR&D, Information & Computing Technologies, DuPont, Wilmington, DE, United States

This talk will describe the novel ways we have developed to represent precisely the structures of the diverse chemical materials of interest to DuPont. These range from simple organics and inorganics to polymers, mixtures, formulations, multi-layer films, composites, and even devices and incompletely defined substances. Part of the solution involves evaluating trade-offs, which may be situation dependent, between details captured in the structure vs. details captured at the sample history level, e.g., ratios of components, polymer molecular weights and microstructures, and the existence of “fairy dust” components. An important aspect of the solution involves ensuring robust structure standardization and duplicate checking for complex and ill-defined substances. We believe that our needs and solutions have challenged and inspired a number of chemical software vendors to provide significant upgrades to the functionalities of their drawing packages and database cartridges.

horizontal rule

66 - From deposition to application: Technologies for storing and exploiting crystal structure data

Dr Colin R Groom, Dr Jason Cole, Dr Simon Bowden, Dr Tjelvar Olsson. Cambridge Crystallographic Data Centre, United Kingdom

In December 2009 The Cambridge Crystallographic Data Centre (CCDC) archived the 500,000th small-molecule crystal structure to the Cambridge Structural Database (CSD). The passing of this milestone highlights the rate of growth of the CSD in recent years and the continuing challenges this represents in terms of information storage and exchange.

This talk will describe the development of a number of tools for the processing, validation, and storage of crystal structure data. Recent developments that will aid this growing body of structural knowledge to be exploited in a range of applications and the provision of additional services that can assist the scientific community will also be illustrated.

horizontal rule

67 - Recent IUPAC recommendations for chemical structure representation: An overview

Mr. Jonathan Brecher. CambridgeSoft Corporation, Cambridge, MA, United States

Accurate and unambiguous depiction of chemical information is a key step in communicating that information. Such depiction is equally important whether the intended audience is a human chemist (as in a journal article or patent) or a computer (as in a chemical registration system). Recent IUPAC publications provide chemists a practical guide for producing chemical structure diagrams that accurately convey the author's intended meaning. A summary of those recommendations will be presented. As part of that summary, common pitfalls in producing chemical structure diagrams will be discussed. Solutions to those pitfalls will also be described, with an emphasis on solutions that are simple, straightforward, and accessible to the majority of practicing chemists.

horizontal rule

68 - Orbital development kit

Dr. Egon L. Willighagen. Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden

Understanding properties of molecular structures requires a computer representation, and quantum mechanical and chemical graph representations have been used abundantly. Own have found their own areas of application in chemistry, and their fields are best described as theoretical chemistry and cheminformatics, respectively. The Orbital Development Kit (ODK) positions itself in-between these two representations, though closest to chemical graph theory, and addressing shortcomings of the latter. In particular, it replaces coloring of the nodes and edges in the chemical graph with atom hybridization and bond order explicit, making the representation more precise in how it represents geometrical features of the molecule. The ODK does so by replacing the atom as single node in the chemical graph by a central atomic core surrounded by valence orbitals, possible hybridized. Using this approach, the definition of an atom type is reformulated as a core element with a particular and well-defined set of identifiable orbitals with an implied, though relative, geometrical orientation. Bonding is now the connection of two orbitals, and a lone pair becomes a single orbital, and is therefore directional too. This approach means that the classical double bond in ethene is now represented by one sigma bonding between two sp2 orbitals of the two carbons, and one bonding of their two pz orbitals. This ODK representation leaves also room for representations beyond the chemical graph, such as proposed by Dietz in 1995: more than two orbitals can be combined into set to represent delocalization. The presentation will present the ODK data model, serialization and deserialization into a Resource Description Framework-based file format, and a bridge to the Chemistry Development Kit, for visualization and
molecular property calculation.

horizontal rule

69 - Line notations as unique identifiers

Krisztina Boda PhD. OpenEye Scientific Software, Santa Fe, New Mexico, United States

A wide variety of structure representation formats have been devised to encode molecular information in order to register, store and manipulate molecules in silico.

One class of these formats, called line notations, is designed to express molecules as compact, unambiguous strings that can be used as unique identifiers for compound registration eliminating the computationally more expensive graph matching.

The presentation will provide an overview of popular line notations, such as canonical SMILES, isomeric SMILES, and InChI, discussing their merits and shortcomings in regards to using them as robust lossless unique identifiers.

We will present results of testing a variety of line notations on a diverse set of 10M compounds generated by combining organic and inorganic vendor databases. We will also examine the information loss of various molecular normalization procedures with regard to line notation generation.

horizontal rule

70 - Analysis of activity landscapes, activity cliffs, and selectivity cliffs

Jürgen Bajorath. Department of Life Science Informatics, University of Bonn, Germany

The concept of activity landscapes (ALs) is of fundamental importance for the exploration of structure-activity relationships (SARs). ALs are best rationalized as biological activity hypersurfaces in chemical space. When reduced to three dimensions, ALs display characteristic topologies that determine the SAR behavior of compound sets. Prominent features of ALs are activity cliffs that are formed by structurally similar compounds having large potency differences, giving rise to SAR discontinuity. ALs and activity cliffs can be analyzed in different ways including similarity-potency diagrams, approximate three-dimensional landscape representations, or molecular networks integrating compound similarity and potency information. Annotated similarity-based compound networks that incorporate results of numerical SAR analysis functions, termed Network-like Similarity Graphs (NSGs) are designed to explore relationships between global and local SAR features in compound data sets of any source. For collections of analogs, substitution patterns that introduce activity cliffs are identified in Combinatorial Analog Graphs (CAGs) that make it also possible to study additive and non-additive effects of compound modifications. Activity cliffs identified in CAGs can frequently be rationalized on the basis of complex crystal structures. When studying multi-target SARs using the NSG framework, the concept of activity cliffs can be extended to selectivity cliffs, i.e. similar compounds having significant differences in target selectivity.

horizontal rule

71 - Using Activity Cliff Information in structure-based design approaches

Birte Seebeck, Markus Wagener, Prof. Dr. Matthias Rarey. Center for Bioinformatics (ZBH), University of Hamburg, Hamburg, Germany; Molecular Design and Informatics, MSD, Oss, The Netherlands

Activity cliffs are often the pitfall of QSAR modeling techniques, but at the same time they exhibit key features of a SAR. Based on the principles of the structure-activity landscape index (SALI) [1], here we present an approach to use the valuable information of activity cliffs in a structure-based design scenario, analyzing key interactions between protein-ligand complexes in activity cliff events. We visualize those interaction “hot spots” directly in the active site of target proteins. In addition, we use the activity cliff information to derive target-specific scoring models and pharmacophoric hypothesis, which are validated in enrichment experiments on independent external test sets. The results show an improved enrichment in comparison to the standard score for various protein targets.

1. Guha R. and Van Drie J.H., J. Chem. Inf. Model., 2008, 48, 646-658.

horizontal rule

72 - Exploring activity cliffs using large scale semantic analysis of PubChem

Dr David J Wild, Bin Chen, Qian Zhu. School of Informatics and Computing, Indiana University, Bloomington, IN, United States

Identification of Activity Cliffs, defined as the ratio of the difference in activity of two compounds to their “distance” of separation in a given chemical space [1], has been established as important in the creation of robust quantitative-structure activity relationship models. Previously, a method, SALI, for identifying and visualizing these activity cliffs was developed at Indiana University, and applied successfully to several established QSAR datasets [2]. In the work reported here, we have extended this work in two ways. First, we have used structure and activitydata from the public PubChem BioAssay dataset to evaluate the method on a much larger scale, and second, we have integrated it with a project called Chem2Bio2RDF to look not just for activity cliffs based on reported assay values, but also on computationally established relationships between compounds and genes and diseases. We thus propose an extended application of SALI which can be used in a systems chemical biology and chemogenomic context.

[1] J. Chem. Inf. Model., 2006, 46 (4), p 1535
[2] J. Chem. Inf. Model., 2008, 48 (3), pp 646-658

horizontal rule

73 - Quantifying the usefulness of a model of a structure-activity relationship: The SALI Curve Integral

John H Van Drie, Rajarshi Guha. R&D, Van Drie Research LLC, Andover, MA, United States; Chemical Genomics Center, NIH, Bethesda, MA, United States

In 2008, in two papers Guha and Van Drie introduced the notion of structure-activity landscape index (SALI) curves as a way to assess a model and a modeling protocol, applied to structure-activity relationships. The starting point is to study a structure-activity relationship pairwise, based on the notion of "activity cliffs"--pairs of molecules that are structurally similar but have large differences in activity. The basic idea behind the “SALI Curve” is to tally how many of these pairwise orderings a model is able to predict. Empirically, testing these SALI curves against a variety of models, ranging over structure-based and non-structure-based models, the utility of a model seems to correspond to characteristics of these curves. In particular, the integral of these curves, denoted as SCI and being a number ranging from -1.0 to 1.0, approaches a value of 1.0 for two literature models, which are both known to be prospectively useful.

horizontal rule

74 - Status of the InChI and InChIKey algorithms

Dr. Stephen Heller. CBRD, MS - 8320, NIST, Gaithersburg, MD, United States

The Open Source chemical structure representation standard, the IUPAC InChI/InChIKey project, has evolved considerably in the past two years. The project is now being supported and widely used by virtually all major publishers of chemical journals, databases, and structure drawing and related software. This usage of the InChI/InChIKey in their products enable them to link information between their products and other (fee-free and fee-based) chemical information available on the world wide web via the Internet

These organizations are now providing for a stable and financially viable structure to the project. This is enabling the world-wide chemistry community to expand its use of the InChI knowing that this freely available Open Source algorithm will be widely accepted and used of as a mainstream standard. The mission of the Trust is quite simple and limited; its sole purpose is to create and support administratively and financially a scientifically robust and comprehensive InChI algorithm and related standards and protocols.

This presentation will describe the current technical state of the InChI and InChIKey algorithms.

horizontal rule

75 - Self-contained sequence representation (SCSR): Bridging the gap between bioinformatics and

Dr Keith T Taylor, Dr William L Chen, Brad D Christie, Joe L Durant, David L Grier, Burt A Leland, Jim G Nourse. Symyx Technologies Inc, San Ramon, CA, United States

In this paper we will discuss the benefits and disadvantages of the current approaches for storing biological sequence information.

We have developed a hybrid representation that uses the compactness of the sequence, together with the detail of chemical connectivity information for modified regions. It represents standard residues with substructure. All instances of the same residue are represented by a single template. This hybrid approach is compact and scalable.

We have developed a converter that takes a UniProt format file extracts the sequence information and derives the modifications producing an SCSR record. The SCSR is encoded as a molfile and registered into a Symyx Direct database. Duplicate checking, exact matching - with and without the modifications -molecular weight calculation, and substructure searching are all available with these structures.

We are using this representation for peptides, oligonucleotides, and we are now extending it to oligosaccharides. Non-natural residues can be included in an SCSR.

horizontal rule

76 - Representation of Markush structures: From molecules toward patents

Szabolcs Csepregi, Nóra Máté, Róbert Wágner, Tamás Csizmazia, Szilárd Dóránt, Erika Bíró, Tim Dudgeon, Ali Baharev, Ferenc Csizmadia. ChemAxon Ltd., Budapest, Hungary

Cheminformatics systems usually focus primarily on handling specific molecules and reactions. However, Markush structures are also indispensable in various areas, like combinatorial library design or chemical patent applications for the description of compound classes.

The presentation will discuss how an existing molecule drawing tool (Marvin) and chemical database engine (JChem Base/Cartridge) are extended to handle generic features (R-group definitions, atom and bond lists, link nodes and larger repeating units, position and homology variation). Markush structures can be drawn and visualized in the Marvin sketcher and viewer, registered in JChem databases and their library space is searchable without the enumeration of library members. Different enumeration methods allow the analysis of Markush structures and their enumerated libraries. These methods include full, partial and random enumerations as well as calculation of the library size. Furthermore, unique visualization techniques will be demonstrated on real-life examples that illustrate the relationship between Markush structures and the chemical structures contained in their libraries (involving substructures and enumerated structures).

Special attention will be given to file formats and how they were extended to hold generic features.

horizontal rule

77 - CSRML: A new markup language definition for chemical substructure representation

Dr. Christof H. Schwab, Dr. Bruno Bienfait, Dr. Johann Gasteiger, Dr. Thomas Kleinoeder, Dr. Joerg Marucszyk, Dr. Oliver Sacher, Dr. Aleksey Tarkhov, Dr. Lothar Terfloth, Dr. Chihae Yang. Molecular Networks GmbH, Erlangen,, Bavaria, Germany; Altamira LLC, Columbus, Ohio, United States

Although, chemical subgraphs or substructures are quite popular and used since a long time in chemoinformatics, the existing and well established standards still have some limitations. In general, these standards are suited even for complex substructure queries, however, show some insufficiences, e.g., for the inclusion of physicochemical properties or annotation of meta information. In addition, the existing standards are not fully interconvertible and specify no validation techniques to check the semantic correctness of a query definition. This paper proposes an approach for the representation of chemical subgraphs that aims to overcome the limitations of existing standards. The approach presents a well-structured, XML-based standard specification, the Chemical Subgraph Representation Markup Language (CSRML), that supports a flexible annotation mechanism of meta information and properties at each level of a substructure as well as user-defined extensions. Furthermore, he specification foresees a mandatory inclusion and use of test cases. In addition, it can be used as an exchange format.

horizontal rule

78 - Prediction of solvent physical properties using the hierarchical clustering method

Dr. Todd M Martin, Dr. Douglas M Young. National Risk Management Research Laboratory, Environmental Protection Agency, Cincinnati, OH, United States

Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including surface tension and the normal boiling point. The hierarchical clustering method divides a chemical dataset into a series of clusters containing similar compounds (in terms of their 2D molecular descriptors). Multilinear regression models are fit to each cluster. The toxicity or property is estimated using the prediction value from several different cluster models. The physical properties are estimated using 2D molecular structure only (i.e. w/o the use of critical constants). The hierarchical clustering methodology was able to achieve excellent predictions for the external prediction sets. A freely available software tool to estimate toxicity and physical properties has been developed. The software tool is based on the open source Chemistry Development Kit (written in Java).

horizontal rule

79 - Scaffold diversity analysis using scaffold retrieval curves and an entropy-based measure

Jose L Medina-Franco PhD, Karina Martinez-Mayorga, Andreas Bender PhD, Thomas Scior PhD. Torrey Pines Institute for Molecular Studies, Port St. Lucie, FL, United States; Leiden University, Leiden, The Netherlands; Benemerita Universidad Autonoma de Puebla, Puebla, Mexico

Scaffold diversity analysis of compound collections has several applications in medicinal chemistry and drug discovery. Applications include, but are not limited to, library design, compounds acquisition and assessment of structure-activity relationships. The scaffold diversity is commonly measured based on frequency counts. Scaffold retrieval curves are also employed. Further information can be obtained by considering the specific distribution of the molecules in those scaffolds. To this end, we present an entropy-based information metric to assess the scaffold diversity of compound databases [Medina-Franco, J. L. et al. QSAR Comb. Sci. 2009, 28, 1551]. The entropy-based information metric takes into account the frequency distribution of the different scaffolds and is a complementary measure of scaffold diversity enabling a more comprehensive analysis.

horizontal rule

80 - Nonsubjective clustering scheme for multiconformer databases

Dr. Austin B. Yongye, Dr. Andreas Bender, Dr. Karina Martinez-Mayorga. Torrey Pines Institute for Molecular Studies, Port St Lucie, FL, United States; Medicinal Chemistry Division and Pharma-IT Platform, Leiden/Amsterdam Center for Drug Research, Leiden University, Leiden, The Netherlands

Representing the 3D-structures of ligands in virtual screenings via multi-conformer ensembles can be computationally intensive, especially for compounds with a large number of rotatable bonds. While clustering and RMSD filtering methods are employed in existing conformer generators, the novelty of this work is the inclusion of a non-subjective clustering scheme. This algorithm simultaneously optimizes the number and the average spread of the clusters. Using this method 10 times less conformers per compound were obtained on averaged and performed as well as OMEGA. Furthermore, we propose thresholds for root-mean square filtering depending on the number of rotors in a compound: 0.8, 1.0 and 1.4 for structures with low (1-4), medium (5-9) and high (10-15) numbers of rotatable bonds, respectively. The protocol employed is general and can be applied to reduce the number of conformers in multi-conformer compound collections and alleviate the complexity of downstream data processing in virtual screening experiments.

horizontal rule

81 - Finding drug discovery "rules of thumb" with bump hunting

Mr. Tatsunori Hashimoto, Dr. Matthew Segall PhD. Department of Statistics, Harvard University, Cambridge, MA, United States; Optibrium, Cambrdige, United Kingdom

Rules-of-thumb for evaluating potential drug molecules, such as Lipinski's Rule of Five, are commonly used because they are easy to understand and translate into practice. These rules have traditionally been constructed by observation or by following simple statistical analysis. However, application of these techniques to QSAR models or early screening data often ignores the underlying statistical structure. Conversely, when machine learning algorithms are used to classify 'drug-like' molecules, they often result in black-box classifiers that cannot be modified to suit a particular target drug profile. We propose a novel hybrid approach to constructing rules-of-thumb from existing data to match a given target product profile for any therapeutic objective. These rules are easily interpretable and can be rapidly modified to reflect expert opinions before application.

horizontal rule

82 - Machine learning in discovery research: Polypharmacology predictions as a use case

Nikil Wale PhD, Kevin McConnell PhD, Eric M Gifford PhD. Computational Sciences Center of Emphasis, Pfizer Inc, Groton, CT, United States

In this talk I will lay out the increasing role of machine learning technology in discovery research at Pfizer. Specifically, I will talk about how algorithms and methods inspired by (Machine) Learning Theory are playing an increasing role in in-silico predictive technologies in pharmaceutical research. These methods will be put in the context of other popular methods based on the classical statistics based approaches and overlap and contrast will be discussed. I will use poly-pharmacology predictions as an important use case to demonstrate the power of large scale machine learning methods for such application. In particular, prospective validation of these methods will be emphasized and discussed.

horizontal rule

83 - Interpretable correlation descriptors for quantitative structure-activity relationships

Prof. Jonathan D. Hirst. School of Chemistry, University of Nottingham, Nottingham, Nottinghamshire, United Kingdom

Highly predictive Topological Maximum Cross Correlation (TMACC) descriptors for the derivation of quantitative structure-activity relationships (QSARs) are presented, based on the widely used autocorrelation method. They require neither the calculation of three-dimensional conformations, nor an alignment of structures. Open source software for generating the TMACC descriptors is freely available from our website: We illustrate the interpretability of the TMACC descriptors, through the analysis of the QSARs of inhibitors of angiotensin converting enzyme (ACE) and dihydrofolate reductase. In the case of the ACE inhibitors, the TMACC interpretation shows features specific to C-domain inhibition, which have not been explicitly identified in previous QSAR studies.

horizontal rule

84 - Chemistry in your hand: Using mobile devices to access public chemistry compound data

Dr Antony J Williams PhD, Valery Tkachenko. ChemSpider, Royal Society of Chemistry, Wake Forest, North Carolina, United States

Mobile devices allowing browsing of the internet to access chemistry related data come in many forms: phones, music players and, increasingly, as “tablets” and “pads”. With the permanently online connectivity of these mobile devices, the browser now being the default environment for much of our computer-based interactions, and the increasing availability of rich datasets online, the aggregation of these offerings mesh together to provide chemists with the capabilities to query and search for chemistry in ways that were the stuff of science fiction only a few years ago. Using the ChemSpider platform as a foundation, and with the intention of continuing to enable the community to access Chemistry, we have delivered mobile chemistry applications to search across over 20 million compounds sourced from over 300 data sources to retrieve data including properties, spectra and links to patents and publications. This presentation will discuss Mobile ChemSpider and the challenges of delivering such a tool.

horizontal rule

85 - Feature analysis of ToxCastTM compounds

Patra Volarath, Stephen Little, Chihae Yang, Matt Martin, David Reif, Ann Richard. National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, NC, United States; Center for Food Safety and Nutrition, U.S. Food and Drug Administration, Bethesda, MD, United States

ToxCastTM was initiated by the US Environmental Protection Agency (EPA) to prioritize environmental chemicals for toxicity testing. Phase I generated data for 309 unique chemicals, mostly pesticide actives, that span diverse chemical feature/property space, as determined by quantum mechanical, feature-/QSAR-based, and ADME-based descriptors. Results in over 450 high-throughput screening assays were generated for the chemicals. Deriving associations across such a structurally diverse and information-rich dataset is challenging. Approaches to determine relationships between the bioassay data and chemistry-/biology-informed structural features, and methods to meaningfully represent this knowledge are being developed. We initially focus on the Phase I data set. Successful approaches will be applied to the much larger chemical libraries in ToxCast Phase II and Tox21 projects (the latter to screen approximately 10,000 chemicals). These approaches will be used to develop data mining approaches to inform toxicity testing and risk assessment modelling. This abstract does not reflect EPA or FDA policy.

horizontal rule

86 - Extracting information from the IUPAC Green Book

Prof Jeremy G Frey, Mark I Borkum. School of Chemistry, Univeristy of Southampton, Southampton, Hants, United Kingdom

The IUPAC manual of Symbols and Terminology for Physicochemical Quantities and Units (the Green Book) was first published in 1969. One of the fundamental principles of the IUPAC Green Book is the reuse of existing symbols and terminology, in order to enable the accurate exchange of information and data. Accordingly, there is a need for the IUPAC Green Book to be repurposed as a machine-processable resource. This paper reports an experiment where we define a syntax for the subject index of the IUPAC Green Book in the Parsing Expression Grammar (PEG) formalism. We repurpose the resulting Abstract Syntax Tree (AST) as the primary data source for a Ruby on Rails application and Simple Knowledge Organization System (SKOS) concept scheme. We demonstrate a metric that gives prominence to the most significant terms and pages in the subject index, and reflect upon the usefulness and relevance of the information obtained.

horizontal rule

87 - Biologics and biosimilars: One and the same?

Roger Schenck. Chemical Abstracts Service, Columbus, OH, United States

Biopharmaceuticals (or biologics) and generic follow-on biosimilars currently account for more than 10% of the revenue in the pharmaceutical market. As patent protection for first generation biotherapeutics begins to expire, follow-on biosimilars have begun to appear. This presentation will provide insights on how the CAS databases handle biologics and biosimilars, how these substances are treated differently in patents, and how biosimilars are viewed by different patenting authorities. What the CAS databases reveal about trends in biopharmaceutical research and development will be discussed along with specific examples

horizontal rule

88 - Intelligent mining of drug information resources

Rashmi Jain, Anay Tamhankar, Aniket Ausekar, Yuthika Dixit. Evolvus Group, Pune, India

A fundamental aspect of any research is to understand and keep track of progress made by peer groups in terms of scientific discoveries. Research Conferences form a definitive source of this information. Annually, thousands of papers are presented in such conferences for any given disease vertical from a Therapeutic, Biological, Pharmacological, Clinical perspective. At first glance, the problem of finding relevant conference proceedings of interest and then organizing the information into a format which is easily analyzed, stored and efficiently retrieved seems to be difficult and chaotic as there are no patterns by which a process can be defined, furthermore conference presentations are highly fragmented and non-standardized.

A hybrid approach, wherein a Machine Learning based text-extraction software coupled with assisted expert annotations by human editors come to the rescue. An in-house Machine Learning software system is used in the first stage wherein the conference proceedings are classified based on keywords, segmented and converted into standardized format.

The software then uses a proprietary, heuristic based, learning algorithm to extract relevant data from the segments. Since it is well known that any automated approach cannot be 100% accurate, in this step the software is assisted by a team of expert human editors who analyze the extracted and segmented data and perform necessary corrections, if any. In the third step, the software then pushes each segment to a team of expert human editors who analyze the segment, extract information relevant to the area of research, and store the information in our internal databases.

horizontal rule

89 - Cheminformatics semantic grid for neglected diseases

Paul J Kowalczyk PhD. Department of Computational Chemistry, SCYNEXIS, Durham, NC, United States

We present a summary of our progress towards establishing a cheminformatics semantic grid for neglected diseases. Our efforts are based on using public data and open-source programs to generate both descriptive and predictive models, which are themselves made publicly available. There are three modes of model access: as web services, via web portals, and as downloads. Models are saved in Predictive Model Markup Language (PMML) format. Information stored for each model includes the training set, test set, descriptors and model tuning parameters. This information is provided so that researchers may determine a model's domain, and its applicability to their data. Examples will be presented for two data sets retrieved from PubChem: enzyme inhibition of dihydroorotate dehydrogenase (AID:1175), and a cytochrome panel assay with activity outcomes (AID:1851).

horizontal rule

90 - Extraction and integration of chemical information from documents

Dr Hugo O Villar, Dr. Juan Betancort, Dr Mark R Hansen. Altoris, Inc., La Jolla, California, United States

Effective chemical research requires that all sources of information be incorporated in the decision making. Here we introduced a tool that saves time when trying to build chemical databases that can be built from web information or chemical literature, including patent information. We discuss some of the challenges faced in automating the identification and extraction of chemicals named in patents, and their conversion into chemical databases that can be mined effectively. The integration of external sources of data can be valuable for research informatics. To that end we have integrated the conversion of IUPAC names with chemical optical character recognition. We show examples where such integration can provide useful competitive information.

horizontal rule

91 - SAR and the role of active-site waters in blood coagulating serine proteases: A thermodynamic analysis of ligand-protein binding

Dr. Noeris K Salam, Dr. Woody Sherman, Dr. Robert Abel. Schrodinger, Inc., San Diego, CA, United States; Schrodinger, Inc., New York, New York, United States

The prevention of blood coagulation is important in treating thromboembolic disorders. Several serine proteases involved in the coagulation cascade are classified as pharmaceutically relevant and are the focus of structure-based drug design campaigns. Here, we investigate the serine proteases thrombin and factors VIIa, Xa, and XIa, using a computational method called WaterMap that describes the thermodynamic properties of the water solvating the active site. We show that the displacement of key waters from specific subpockets (e.g. S1, S2, S3 and S4) of the active site by the ligand is a dominant term governing potency, providing insights into SAR cliffs observed in several compound series. Furthermore, we describe how WaterMap scoring can be supplemented with terms from an MM-GBSA calculation to improve the overall predictive capabilities.

horizontal rule

139 - Configurational entropy and mechanical stress in molecular recognition

Prof. Michael K. Gilson M.D., Ph.D.. School of Pharmacy, University of California, San Diego, La Jolla, CA, United States

I will present molecular dynamics simulations consistent with long-ranged entropy effects throughout a protein upon binding a peptide. The results are somewhat preliminary, given the challenge of generating converged simulation results, but are qualitatively consistent with the long-ranged changes in orientational order parameters due to binding, which have been observed in NMR studies of binding.

These apparent long-ranged effects raise questions regarding the mechanisms by which binding affects remote parts of the protein. I will explain why the concept of mechanical stress may be useful in thinking about such long-ranged consequences, and will describe our initial computational studies of stress at the molecular level. This image

shows computed stress tensors as a guest molecule is pulled from its cucurbituril host in a simulated single-molecule pulling experiment.

horizontal rule

140 - Advancing anthrax toxin countermeasures using topomeric searching and virtual screening methodologies

Prof. Elizabeth A Amin PhD, Dr. Ting-Lan Chiu PhD, Dr. Derek J Hook PhD, Dr. Michael A Walters PhD, Prof. Barry C Finzel PhD, Jonathan Solberg, Satish Patil, Dr. Todd W Geders PhD, Dr. Subhashree Rangarajan PhD, Dr. Rawle Francis PhD, Xia Zhang. Department of Medicinal Chemistry, University of Minnesota, Minneapolis, Minnesota, United States; Institute for Therapeutics Discovery and Development, University of Minnesota, United States; Department of Chemistry, University of Minnesota, United States

One of the most dangerous bioterror agents is the rod-shaped, spore-forming bacterium Bacillus anthracis, which is the causative agent of anthrax. Concentrated anthrax spores have been deployed as biological weapons in the United States and elsewhere, resulting in high mortality rates among those exposed. The lethal factor (LF) enzyme is secreted by the bacillus as part of the anthrax lethal toxin, and is mainly responsible for anthrax-related cytotoxicity. As LF can remain in the system long after antibiotics have eradicated the bacilli, the preferred therapeutic modality would be the administration of antibiotics together with an effective LF inhibitor. To date, however, no LF inhibitor is available as a therapeutic or preventive agent. Here we present an original high-throughput computational protocol that successfully identified five promising novel LF inhibitor scaffolds with low micromolar inhibition against that target, demonstrating a 12.8% experimental hit rate. This protocol incorporated topomeric shape-based searching techniques that were particularly effective in identifying potential new leads. Three of the five new hits exhibited experimental IC50 values less than 100 mM and may potentially serve as scaffolds for lead optimization. Virtual screening simulations predicted that these preliminary hits are likely to engage in critical ligand-receptor interactions with nearby residues in at least two of the three (S1', S1-S2, and S2') subsites in the LF binding area. Notably, it was found that micromolar-level LF inhibition can be attained by compounds with non-hydroxamate zinc-binding groups that exhibit monodentate zinc chelation as long as key hydrophobic interactions with at least two LF subsites are retained.

horizontal rule

141 - Model-free drug-like filters

Dr Oleg Ursu, Dr Cristian G. Bologa, Prof. Tudor I. Oprea MD, PhD. Department of Biochemistry and Molecular Biology, Division of Biocomputing, University of New Mexico School of Medicine, Albuquerque, NM, United States

Extended connectivity descriptors computed by the Morgan algorithm have been used for the classification of various molecular properties. The information content encoded by such descriptors can be used to compute any 2D descriptors [1]. As these atom environments are canonical, we extracted them as molecular substructures (SMARTS) queries. Rooted in the information gain concept, already applied to derive selection rules in decision trees [2], we aimed at a better separation between classes of chemicals such as “drugs” and “non-drugs”. The most discriminating atom environments (having the highest information gain) were selected as model-free drug-like filters. These can be used to evaluate third party chemical libraries to assess drug-likeness.

[1] JL Faulon, DP Visco, RS Pophale. J. Chem. Inf. Comput. Sci. 2003, 43:707-720
[2] JR Quinlan. Machine Learning 1986, 1:81-106

horizontal rule

142 - Chemocentric informatics: Enabling bioactive compound discovery through structural hypothesis fusion

Prof. Alexander Tropsha. School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Historically, computational drug discovery studies have relied on limited sources of data such as biological assays of compound libraries tested against single targets with results published in print. Nowadays, the information resources have broadened dramatically including large chemical genomics databases (e.g., ChEMBL, PubChem, PDSP, ToxCast), digital libraries (e.g., PubMed), gene expression profiles (e.g., cmap), and others. I shall describe a chemocentric informatics strategy integrating different information resources and diverse computational methodologies towards discovering novel bioactive compounds. I shall describe the use of digital libraries for establishing new datasets to analyze the relationships between chemical structure and biological activity; highlight the importance of chemical data curation; and illustrate how computational models help spotting and correcting erroneous data. I will describe a study combining Quantitative Structure Activity Relationship (QSAR) modeling, virtual screening (VS), text mining, and gene expression profiling of chemicals for identifying novel experimentally confirmed high-affinity GPCR ligands as potential anti-Alzheimer drug candidates.

horizontal rule

143 - Computers and drug discovery: From duds to $5B drugs

Prof. Robert C Glen PhD. Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom

Despite what you may think, given the investment in industrial scale pharmacology and chemistry, drug discovery is still a cottage industry. Small focussed groups of scientists combine diverse expertise from pharmacology and biology to synthesis and design, wrestling with complex and uncertain data. It is a poorly defined science, with undefined outcomes, often guided by rule-of-thumb, intuition and sheer luck. Bringing the logic of computation to the chaos of biology is very difficult, but every so often we succeed beyond our wildest dreams. Since this is the 50th anniversary of The Journal of Chemical Information and Modeling, I would like to review some of our work on novel algorithms and drug discovery, focussing on GPCR's, over the past twenty years and in particular identify some things that worked, some that didn't and also challenge some views of where modelling and computation should be applied, and where it shouldn't (yet).

horizontal rule

144 - Weighting and fusion methods for similarity-based virtual screening

Prof. Peter Willett, Shereen Arif, Dr John Holliday, Nurul Malim, Christoph Mueller. Information School, University of Sheffield, Sheffield, South Yorkshire, United Kingdom

Recent work in Sheffield on similarity searching has focussed on the use of data fusion and fragment weighting methods to search the MDDR, WOMBAT and MUV databases. Data fusion involves the combination of multiple similarity searches. The overlap between multiple searches is shown to follow a Zipf-like, power law distribution, with very few molecules (or active molecules) common to multiple searches; and a comparison of a large number of different group-fusion algorithms shows that one based on molecules' inverse rank positions is the most effective of those tested. Information about the frequencies with which fragments occur in molecules can be used in two ways to increase search effectiveness (when compared with using just the presence or absence of fragments in molecules): using functions of the frequencies of fragment occurrences in individual molecules, and using inverse functions of the frequency of fragment occurrences in the database as a whole.

horizontal rule

Division of Chemical Education

horizontal rule

10 - Construction of topical faculty learning communities by the Center for Workshops in the Chemical Sciences (CWCS) and the use of Drupal as a development platform

Dr. Cianán B. Russell, Dr. David M. Collard. School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, United States

A new national dissemination initiative of the Center for Workshops in the Chemical Sciences (CWCS) is to develop topical faculty learning communities to further spread the adoption of innovative content and to propagate the use of good pedagogical practice in the teaching of undergraduate chemistry. CWCS has provided 88 workshops in a variety of topical areas, hosting over 1400 participants who have then used the workshop materials in a number of ways to improve undergraduate education. In this new initiative, we wish to engage workshop participants as the foundation of online communities that provide access to databases of curricular materials and pedagogies, together with the shared expertise of the group through discussion boards, blogs, etc. The Drupal platform was used to develop a flexible and adaptable interface. The process of developing this interface, and challenges associated with prototyping, assessing, and modifying our approach to the development will be discussed.

horizontal rule

11 - Ebooks: A culture shift for academic libraries?

Assisstant Professor Barbara A. Losoff. Science Library, University of Colorado, Boulder, CO, United States

The decline of print materials in academic libraries is a result of changing technology, cost, and plummeting use by patrons. This mobile, Google/YouTube/Facebook, generation acquires their information online. Images are as important as text. Librarians must ask themselves the question: in what ways are these users transforming the very definition of a book, and how can libraries support this cultural shift to digital content, and does anyone know what the book of the future will resemble?

horizontal rule

12 - Engaging student discussion: The role of a google jockey

Prof. Laura E Pence, Emily R. Greene. Department of Chemistry, University of Hartford, West Hartford, CT, United States

A challenge to the inclusion of real world applications in a course can be the students' lack of mental images to provide context. PowerPoint images are a solution in a structured lecture environment, but in a first year seminar course with an emphasis on discussion, preselected illustrations constrain the dialogue and reflect only the instructor's mental framework.

A powerful alternative solution employed a senior student embracing the role of Google Jockey, whose purpose is to search and display images from the internet as illustration or counterpoint to an ongoing discussion. The replacement of mental images with visual images enhanced the student engagement in the class and allowed the senior to have a vital, if silent, contribution to the dialogue.

horizontal rule

13 - Rip-Mix-Learn (RML): Using Google Docs to create collaborative multimodal class notes

Dr. Lucille A Benedict, Dr. Harry E Pence. Department of Chemistry, University of Southern Maine, Portland, ME, United States; SUNY College at Oneonta, Oneonta, NY, United States

Computer and internet use has become ubiquitous among college students and can be very powerful educational tools when properly incorporated into the course curriculum. The Rip-Mix-Learn (RML) approach applies students' knowledge of surfing the web with course content to create a set of collaborative class notes that incorporate multimodal representations of each concept to make the students more personally invested in the topics. To create these class notes, first-semester general chemistry course students were given a basic set of notes each week in Google Docs focusing on the current course topics. The students' task was to annotate these documents with pictures, videos, or other representations found on the web and then write brief descriptions of how these annotations related to the specific topics. This talk will focus on the implementation, use, and advantages and drawbacks of using this RML approach in a large lecture first-semester general chemistry course.

horizontal rule

14 - Smart phones, smart objects, and chemical education

Prof. Harry E. Pence PhD. Department of Chemistry and Biochemistry, SUNY Oneonta, Oneonta, New York, United States

The mobile phone is already changing the way we communicate, but it is also creating new ways to access information. Companies, like Google, Yelp, and Layar, are building a layer of digital information that can augment the photograph a user takes with his/her smartphone. As 2D bar codes become more popular in this country, these symbols can label an object with a URL which, in turn, can cue a smartphone or personal computer to access a web site. This means that a piece of paper can include the equivalent of a hyperlink that may lead to structural, safety, or other information. What new opportunities open up for chemical educators when smartphones offer not only portable access to a massive library of information but also a quick and convenient way to work with smart objects that are connected to the World Wide Web?

horizontal rule

15 - How community crowdsourcing and social networking is helping
to build a quality online resource for chemists

Dr Antony J Williams PhD. ChemSpider, Royal Society of Chemistry, Wake Forest, North Carolina, United States

With an intention to provide a free internet resource of chemistry related data for the community, ChemSpider provides an online database of chemical compounds, reaction syntheses and related data. Members of the community can contribute to the database via the deposition of chemical structures, synthesis procedures and analytical data. Data are also aggregated from many other depositors, at present over 400 data sources. The aggregation of data associated with over 25 million chemical compounds does not come without data quality issues. By engaging the community to curate the data the quality continues to improve on a daily basis. The presentation will provide an overview of our ongoing efforts to expand and curate the database. Using a combination of game-based and recognition systems as well as our dependence on societal giveaway by the community ChemSpider continues its path to become a high quality resource and foundation for the semantic web for chemistry.

horizontal rule

16 - Chemistry of social media

Scott Jensen. American Chemistry Council, Arlington, VA, United States

The rise of Web 2.0 or social media has created a new frontier in communicating with a variety of audiences on issues directly related to chemistry and how it impacts their lives. Blogs, Twitter, Facebook and even YouTube have created new opportunities to disseminate information in a very direct and targeted fashion. At the same time, social media tools can allow for dialogue or a forum for debate.

This presenttion will discuss how The American Chemistry Council's Chlorine Chemistry Division has entered this new frontier and utilized Web 2.0 tools to engage and educate a range of audiences from policy makers to the general public regarding chlorine related issues.

horizontal rule

43 - Communicating organic chemistry through the internet: Global learning

Prof. Philip A Janowicz. Department of Chemistry and Biochemistry, California State University - Fullerton, Fullerton, CA, United States

The power of broadband internet has allowed for instant communication across the world, and opportunities for distance education have been greatly enhanced. In the spring of 2009, students from Peking University in Beijing, China, joined in with students from the University of Illinois at Urbana-Champaign in synchronous discussion sessions for organic chemistry. In the fall of 2009, students from Lahore University of Management Sciences in Lahore, Pakistan, joined the synchronous discussions. Experiences during these semesters will be shared along with an outlook for the future.

horizontal rule

44 - Focusing CENtral Science: An overview of C&EN's redesigned blog portal and its usefulness to educators

Editor, C&EN Online Rachel Pepling. Chemical & Engineering News, Washington, DC, United States

In March 2010, Chemical & Engineering News magazine relaunched its blog, C&ENtral Science (, as a portal to several content-focused blogs meant for different audiences (and dropped the "&" along the way). This overview will discuss why that decision was made and how the new CENtral Science can be a valuable resource to chemical educators.

horizontal rule

45 - Chemistry blogging: From literature to controversy to community to...

Aaron D. Finke. Department of Chemistry, University of Illinois, Urbana-Champaign, Urbana, IL, United States

This talk will focus on my experiences as the co-author of a popular chemistry blog, Carbon-Based Curiosities. Initially, I started blogging as a means to keep up with the literature by forcing myself to read and summarize papers I enjoyed or found interesting. However, as the blog progressed, the audiences increased, and my interests diverged, I found myself using the chemistry blogosphere as a means to a different end, one in which one's personal creative energies, even those that tended to diverge far from chemistry, could be applied to ideas, problems, and controversies in current chemical research. In this personal account, I will draw from not only my own experiences in blogging, but also from others across the chemical blogosphere, and show how this small community has already made some big waves.

horizontal rule

46 - Blogging: Ego trip, or sound science? Its role in chemical education and research

Prof Henry S Rzepa D. Sc.. Chemistry, Imperial College, London, United Kingdom

Blogs evolved as a personal statement by an individual, but in science and chemistry have now emerged as a fascinating new way of reviewing the correctness of previously reviewed traditional published science. I will argue they can be much more. In chemical education, they enable the chemist to communicate their accumulated expertise in an accessible manner to both the educational and as it happens the research communities, and indeed to present new and original science that might otherwise be lost. The speaker has posted more than 50 blogs in a year of activity, and a number of these have also been used to enhance taught courses. Others have morphed into published peer-reviewed articles, in traditional journals. The difference between a publication and a blog will be discussed, as well as how a blog can be enhanced with semantic attributes, harvested and aggregated and archived for the longer term.

horizontal rule

47 - Teaching scientific communication in pharmaceutical bioinformatics education

Dr. Egon Willighagen. Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Uppland, Sweden

Communication is a central part in science. Traditionally, students are educated to access scientific literature, but communication channels are changing. The amount of literature has risen sharply, and not even established researchers can keep up with the amount of publications that appear each week. At the same time, new technologies have changed communication as we knew it, and with the introduction of the internet communication world anyone around the world has become as easy as communicating with people at the same department. Research has become so specialized, however, that peers at the same university not always are the best judges of ones work, and the international communication becomes more and more important.

In my education of students doing a 20 week research project in Pharmaceutical Bioinformatics at Uppsala University, we made the use of social websites a core part of their education. Within their projects, the students (two at the moment) report the work they do via their blog; additionally, taking advantage of the programming side of their work, the results of their experiments (source code) is submitted to a central source code repository. This is quite similar to the use of wikis for describing synthesis experiments in organic chemistry. Additionally, reusable components or examples on how their work can be used, is shared via the social website, allowing others to download the protocols the students have development, comment on them, and rate them.

The students also take part in a journal club, where we discuss related literature. Goals of these meetings is that the student learns to formulate an opinion on the paper, after which we discuss the theories behind the paper in more detail. For each discussed paper, one or two participants write up a dedicated blog post, which we mark up such that social websites like Chemical blogspace and can pick up the discussed literature. is used to share the list of discussed papers using a dedicated hashtag.

By making the literature reviews and their progress in the 20 week project publicly available, the students engage in a scientific discussion with peers. By having parts of their work publicly available in their blog, it is easier for them to discuss issues on more targeted mailing lists for databases and software libraries they use in their own project. Using these social websites helps the student to put their scientific work in
perspective, and learns them to discuss their research with other scientists around the globe.

horizontal rule

48 - Developments in chemistry resources on Wikipedia

Prof. Martin A Walker PhD. Department of Chemistry, SUNY Potsdam, Potsdam, NY, United States

In recent years, Wikipedia has become a standard information source for students and researchers alike, but its open nature tends to undermine its reliability. This presentation will explain how to use this immense resource effectively, and also describe efforts made by the Wikipedia chemistry community to address users' concerns. A collaboration with Chemical Abstracts Service has led to validation of Registry Numbers and structures, while other collaborations with ChemSpider and RSC have also brought improvements, yet much remains to be done. The presentation will close with an overview of work that is planned or under way, indicating the direction of likely future developments.

horizontal rule

49 - ChemEd DL WikiHyperGlossary

Dr. Robert Belford, Dr. Daniel Berleant PhD, Michael Bauer, Dr. John W. Moore PhD, Roger Hall. Department of Chemistry, UALR, Little Rock, AE, United States; Department of Chemistry, University of Wisconsin-Madison, Madison, WI, United States; Department of Information Sciences, UALR, Little Rock, AR, United States; MidSouth BioInformatics Center, UALR, Little Rock, AR, United States

We will present the new editing interface of the wikihyperglossary generating program being developed for ChemEd DL. We will go over the database design, present several databases, including a non-editable one with IUPAC Gold book definitions, along with several editable ones. We will then discuss our experiences in a general chemistry class where students created definitions for terms in their class textbook using textual and multimedia online resources.

horizontal rule

50 - Chempedia Lab: Group meeting on a global scale

Ph. D Richard L Apodaca. Metamolecular, LLC, La Jolla, CA, United States

Online database searches have become the information tool of choice for answering tough experimental chemistry questions. But what if it were possible to answer questions by simply asking the entire experimental chemistry community directly? What would a system that made this possible look like, and how might it work? Chempedia Lab ( represents our attempt to answer these questions through a fundamentally new approach to online knowledge-gathering. This talk will discuss how traditional databases have failed the experimental chemistry community, and what Chempedia Lab might teach about the chemical information systems of the future.

horizontal rule

Division of Computers in Chemistry

horizontal rule

13 - Tautomerism in chemical information management systems

Wendy A. Warr M.A., D. Phil. Wendy Warr & Associates, Holmes Chapel, Cheshire, United Kingdom

Tautomerism has an impact on many of the processes in a chemical information management system including novelty checking during registration into chemical structure databases; storage of structures; exact and substructure searching in chemical structure databases; and depiction of structures retrieved by a search. For this talk the approaches taken by a great many different software vendors and database producers have been compared. Since it is important to take account of the nature of the database and the process for which it is designed, and the user requirements vary, it is dangerous to lay down the law about what is right and wrong. The comparison is nevertheless of considerable interest.

horizontal rule

14 - Tautomerism in large databases

Dr. Markus Sitzmann, Dr. Wolf-Dietrich Ihlenfeldt, Dr. Marc C Nicklaus. Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, Frederick, MD, United States; Xemistry GmbH, Königstein, Germany

We are reporting on a comprehensive tautomerism analysis of one of the largest currently existing sets of real (i.e. not computer-generated) compounds. We used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records. Tautomerism was found to be possible for more than 2/3 of the unique structures in CSDB. A total of 680 million tautomers were calculated from the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.

horizontal rule

15 - Tautomerism in drug discovery

Bahaa El-Dien M. El-Gendy, Prof. Alan R. Katritzky PhD, Dr. C. Dennis Hall PhD, Bogdan Draghici. Department of Chemistry, University of Florida, Gainesville, Florida, United States; Department of Chemistry, Benha University, Benha, Qalubia, Egypt

The influence of tautomerism on the precise structure of drugs and thus of their potential to interact in biological systems is discussed from thermodynamic and kinetic aspects. The types of tautomerism encountered in the structure of drugs in current use are surveyed together with the effect of pH, solvent polarity, and temperature.

horizontal rule

16 - Quantitative forecasts of biological potency of molecules
that can tautomerize

Dr. Yvonne C Martin. Martin Consulting, Waukegan, IL, United States

Whether one is using ligand-based 2D or 3D QSAR or structure-based estimates of potency of molecules, tautomerism needs to be addressed. This talk will highlight insights as to when one needs to consider tautomerism and how it can be included in potency forecasts.

horizontal rule

17 - New questions about tautomerism in cytosine: Quantum chemical and matrix isolation spectroscopic studies

Prof. Geza Fogarasi, Mr Gabor Bazso, Prof Peter G Szalay, Dr. Gyoergy Tarczay. Laboratory of Theoretical Chemistry, Institute of Chemistry, Eotvos University, Budapest, Budapest, Hungary; Laboratory of Molecular Spectroscopy, Institute of Chemistry, Eotvos University, Budapest, Budapest, Hungary

In spite of numerous studies, there is much uncertainty about tautomerism in
nucleic acids and specifically cytosine. In the gas phase, form 2 dominates but DG maybe about 1 kcal/mol for both 1 and the “rare” form 3. Spectroscopic studies “see” them but in much smaller abundance. The UV spectrum is normally assigned to 1. Dimerization may also influence tautomerization.

Fig. 1. Selected isomers and a dimer of cytosine

We present infrared and UV spectroscopic measurements in Ar matrix and discuss them by MP2 and CC quantum chemical calculations, including electronic excitations. Contributions from isomers/tautomers and/or dimers to the spectra are discussed.

horizontal rule

55 - FPGA implementation of cheminformatics and computational chemistry algorithms and its cost/performance comparison with GPGPU, cloud computing and SIMD implementations

Dr. Attila Berces PhD, Prof. Bela Feher PhD, Peter Szanto, Imre Pechan, Laszlo Lajko, Zoltan Runyo, Peter Laczko, Janos Lazanyi. Chemistry Logic Kft, Budapest, Hungary; Dept. of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary; evopro Kft, Budapest, Hungary

We have developed binary fingerprint based similarity searching, topologial torsional fingerprint based similarity searching, chemical library to library comparison, sphere exclusion and Jarvis Patrick clustering, peptide mass spectrometry fingerprinting, BLAST prefiltering, short read mapping in color space on Silicon Graphics RC100 FPGA card. In addition, we implemented the Autodock docking software on FPGA. We reached 5 to 500 folds acceleartion compared to CPU in these implementations. In this presentation the audience will learn what characteristics an algorithm should have to make it worthwhile to implement it on FPGA. We shall also compare the cost/performance characteristics to other alternatives such as cloud computing, GPGPU, and single-instruction-multiple-data (SIMD) optimization.

horizontal rule

56 - Technologies for desktop HPC: Application developer's perspective

Dr. Volodymyr Kindratenko PhD, Guochun Shi. National Center for Supercomputing Applications, University of Illinois, Urbana, IL, United States

In the last few years we have witnessed the emergence of a new computing paradigm: computational accelerators. Most prominent examples of such accelerators include FPGAs, Cell/B.E., and most recently GPUs. While these technologies bring unprecedented computing capabilities to the desktop users at a fraction of the cost of a traditional HPC system, their use comes with substantial difficulties due to the need for software reengineering. We survey the landscape of application accelerators for desktop systems and discuss the challenges of re-implementing computational chemistry applications on some of these systems using Hartree-Fock method and molecular dynamics codes as examples.

horizontal rule

57 - Faster, cheaper, and better science: Molecular modeling on GPUs

John E. Stone. Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, United States

Over the past ten years graphics processing units (GPUs) have evolved from fixed-function single-purpose devices into highly programmable massively parallel co-processors. State-of-the-art GPUs support double-precision floating point arithmetic and achieve performance levels approaching one trillion floating point arithmetic operations per second. Modern GPUs enable software development in dialects of familiar C, C++, and Fortran languages, and GPU acceleration extensions exist for Python, Matlab, and other popular languages and computing tools. The high performance of GPUs has created opportunities for acceleration of many computationally demanding molecular modeling algorithms that contain significant parallelism.

We will describe how GPUs are currently employed to accelerate some of the most computationally demanding tasks involved in molecular dynamics simulation, visualization, and analysis in  our NAMD and VMD software, and give an overview of how GPUs are expected to evolve in the next few years.

horizontal rule

58 - Folding@home: Petaflops on the cheap today, exaflops soon?

Prof. Vijay Pande. Department of Chemistry, Stanford University, Stanford, CA, United States

Over the last 10 years, Folding@home has emerged as a very powerful resource. Today, it has multi-petaflop performance, making it the most powerful supercluster in the world. I will talk about how Folding@home works, both in terms of infrastructure and algorithms, and how one can easily reproduce these sorts of approaches in your own lab. I will also very briefly touch on recent results from Folding@home to highlight what petascale power can do to dramatically change the nature of what simulations can inform us about systems of interest.

horizontal rule

59 - Protein-ligand docking on the Cell/BE processor with eHiTS Lightning

Zsolt Zsoldos PhD, Orr Ravitz PhD. SimBioSys Inc., Toronto, Ontario, Canada

The eHiTS flexible docking has proven to be among the most accurate pose prediction tools ( providing one of the highest enrichment factors based on comparative evaluation studies ( The accurate results of eHiTS have been achieved at the price of longer CPU times in the past, but that has changed with the recent port of the algorithm to the Cell/BE processor ( The revolutionary hardware that powers RoadRunner (the world's current fastest supercomputer) and also available in the low cost SONY PS3 game console, gives eHiTS 30-50 fold speedup compared to a single core Intel/AMD processor. The advantages of the Cell/BE platform over other acceleration techniques (FPGA,GPGPU) will be described, along with the challenges faced during the porting effort. A new proximity data structure is introduced that is optimized for SIMD architectures. It allows efficient evaluation of short range pairwise interactions with optimum cache locality.

horizontal rule

60 - Fragment-based druggable hot spot identification in proteins and protein-protein interactions using HPC

Dr. Gwo Yu Chuang, Dr. Ryan Brenke, David R Hall, Dr. Dmitri Beglov, Dr. Dima Kozakov, Dr. Sandor Vajda. Department of Biomedical Engineering, Boston University, Boston, MA, United States

Here we present a highly parallel FFT-based method FTMAP for performing computational fragment mapping. Mapping methods place molecular probes on the surface of proteins in order to identify the most favorable binding positions. Since regions of the protein surface that are major contributors to the binding free energy in drug-protein interactions also bind a variety of small organic molecules, mapping can identify such “hot-spots” and the number of probe molecules bound is a good predictor of druggability. The highly parallel nature of our FFT-based approach allows it to be fully scalable, running efficiently on everything from desktop machines with CUDA enabled graphics adapters to an IBM Blue Gene. The method has been applied to both canonical and protein-protein interaction drug targets, successfully predicting binding hot-spots and target druggability. Our public web server is gaining popularity among academic users and generating significant interest from industry.

horizontal rule

61 - GPUs: What is all the fuss about?

Brian Cole, Bob Tolbert, Anthony Nicholls. OpenEye Scientific Software, Santa Fe, NM, United States

High performance computing hardware is undergoing a revolution. The best way to achieve increasing performance is through highly parallelized architectures like the graphics processing unit. However, the GPU requires a new assessment of algorithm design based on different memory versus time tradeoffs. Good performance is no longer gained by simply reducing the number of operations, but by organizing the interaction of those operations with a complex hierarchy of memory with varying latencies. Understanding the changing programming paradigm is critical both to selecting which algorithms will benefit from the GPU and how to achieve optimal performance. We will discuss design principles used when porting ROCS to the GPU. We will compare performance of a GPU implementation of ROCS to the highly-tuned production CPU implementation. We will show that higher performance can be achieved on the GPU at a significantly reduced cost compared to CPU clusters.

horizontal rule

76 - Water in protein binding sites: Consequences for ligand optimization

Dr. Julien Michel, Dr. Julian Tirado-Rives, James Luccarelli, Prof. William L Jorgensen. Department of Chemistry, Yale University, New Haven, CT, United States

An efficient molecular simulation methodology, JAWS, has been developed to determine the positioning of water molecules in the binding site of a protein or protein-ligand complex. Occupancies and absolute binding free energies of water molecules are computed using a statistical thermodynamics approach. The importance of determining proper water occupancies is illustrated in Monte Carrlo/free energy perturbation calculations for ligand series that feature displacement of ordered water molecules in the binding sites of scytalone dehydratase, p38-aMAP kinase, and EGFR kinase. The change in affinity for a ligand modification is found to correlate with the ease of displacement of the ordered water molecule. For accurate results, a complete thermodynamic analysis is needed. It requires identification of the location of water molecules in the protein-ligand interface and evaluation of the free energy changes associated with their removal and with the introduction of the ligand modification. Direct modification of the ligand in free-energy calculations is likely to trap the ordered molecule and provide misleading guidance for lead optimization.

horizontal rule

77 - Efficient method for computing the free energies of active site waters: Application to drug discovery

Jinming Zou, Sia Meshkat, Zenon Konteatis, Anthony Klon, Charles H. Reynolds. Ansaris, Blue Bell, Pennsylvania, United States

Grand canonical Monte Carlo and systematic free energy methods have been reported previously that allow us to rapidly compute protein-fragment interaction energies. The same methodologies can be employed to compute free energies of binding for water. We have used this approach to identify critical waters in a number of therapeutically interesting protein active sites. Knowledge of the location and affinities of these waters can be useful for designing ligands with improved potency.

horizontal rule

78 - Using explicit solvent implicitly

Dr. Christopher J Fennell, Charles W. Kehoe, Prof. Ken A. Dill. Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States; Graduate Group in Bioinformatics, University of California, San Francisco, San Francisco, CA, United States

Solvent plays a critical role in biomolecular simulations. It mediates the transfer of small molecules, it bridges interactions between ligands and binding sites, it stabilizes protein stuctures with external hydrophilic groups and buried hydrophobic cores, among others. When solvent is modeled explicitly in simulations, the microscopic interactions can be handled rigorously, but obtaining converged solvation energetics can be time-consuming. Here we describe a process, called Semi-Explicit Assembly, where we precompute the solvation response in simple systems and apply it in complex systems. We show that it is possible to have a detailed/explicit-like treatment of solvation at a computational cost similar to the fastest of implicit solvents.

horizontal rule

79 - Role of solvent in protein-ligand binding

Robert Abel PhD, Noeris Salam PhD, Thijs Beuming PhD, Woody Sherman PhD, Ramy Farid PhD. Schrodinger Inc., New York, NY, United States

Calculation of protein-ligand binding affinities continues to be an active area of research. Although many techniques for computing protein-ligand binding affinities have been introduced, ranging from computationally very expensive approaches, such as free energy perturbation (FEP) theory to more approximate techniques, such as empirically derived scoring functions, which, although computationally efficient, lack a clear theoretical basis - their remains pressing need for more robust approaches. The recently introduced WaterMap technology, which calculates the locations and displacement free energies of hydration sites in proteins, was developed to bridge the gap between the accuracy of FEP and the computational efficiency of empirically derived scoring functions. In the present work, we apply WaterMap to a number of pharmaceutically relevant targets, and present a generalized approach for accurate predication of binding affinities that combines solvation terms from WaterMap with other important thermodynamic terms.

horizontal rule

80 - Compute the contribution of protein-pocket solvation to ligand-binding affinity by explicit water simulations

Dr. Ming-Hong Hao, Dr Ingo Muegge. Department of Medicinal Chemistry, Boehringer Ingelheim Pharmaceuticals, Inc, Ridgefield, CT, United States

A significant fraction of ligand-binding free energy in proteins arises from the replacement of water molecules by the ligand in the binding site of proteins. Continuum solvation models based on surface areas do not treat the short-range correlations of water molecules well in the highly irregular and heterogeneous protein-binding pocket. We have developed a computational procedure to simulate the density distribution and free energy of water molecules in the ligand-binding pocket of proteins using a molecular dynamics procedure (NAMD) with explicit water model (TIP3P). Our results are comparable with literature works (e.g. WaterMap software from Schrodinger Inc.) and show good agreement with crystallized water molecules observed in the X-ray structures of proteins. In our procedure, the distribution of water molecules in the protein-binding pocket is presented as water density on a 3-dimensional grid which we find to provide an intuitive way for visualizing the hydrophobic or polar characteristics of a binding site. The contribution of solvation to ligand-binding free energy is estimated by the difference of free energy of the pocket of water replaced by the ligand in the protein binding site and in the bulk solvent. This contribution is added to the direct ligand-protein interactions in scoring the binding affinity of ligands. We investigated the effects of residue mutations in protein binding-site on ligand binding affinity, including the Tryptophan mutations (W79F, W92F, W108A and W120A) in the high-affinity Streptavidin-Biotin complex and the drug-resistant mutants of HIV protease in complex with the inhibitor U-89360E. In these systems, X-ray crystallography showed no significant differences in the given protein-ligand complex structures between the wild type and mutant proteins. Intermolecular interactions between protein and ligand alone do not fully account for the changes in ligand-binding affinity. The free energy change of solvation in the binding site between wild type and mutants provides a good explanation for the shift in ligand-binding affinity. We also applied the procedure to study the structure-activity relationship of congeneric series of ligands. Our results suggest that binding-pocket solvation is an important factor in understanding the binding affinity of ligands to proteins.

horizontal rule

81 - All-atom explicit-solvent fragment-based drug discovery: SILCS ("Site Identification by Ligand Competitive Saturation") molecular dynamics simulations applied to IL-2

Prof. Olgun Guvench M.D., Ph.D.. Department of Pharmaceutical Sciences, University of New England College of Pharmacy, Portland, ME, United States

Two challenges in computer-aided drug discovery are incorporation of protein flexibility and an accurate description of solvation effects. Fast in silico screening methods typically employ rigid or near-rigid protein conformations and continuum descriptions of solvation, while more physical and accurate explicit-solvent all-atom molecular dynamics or Monte Carlo methods are very computationally demanding. Site Identification by Ligand Competitive Saturation (SILCS) is a recently-developed computationally-efficient fragment-based drug discovery method that employs all-atom explicit-solvent molecular dynamics simulations, essentially soaking the target in a 1 molar bath of hydrophobic fragments to compute 3-D probability maps of hot-spots on the protein surface that preferentially bind hydrophobic fragments or water molecules. Applied to the apo crystal structure of IL-2, SILCS identifies two hydrophobic pockets not present in the apo crystal, but later discovered to exist in complexes with small molecule inhibitors and to bind hydrophobic moieties on these molecules.

horizontal rule

101 - Predicting tautomer preference: Simple rules and unforeseen complexities

Peter W. Kenny PhD, Peter J Taylor. AstraZeneca (retired), Cheadle, United Kingdom

Tautomer ratio depends on phase, so for coherent analysis this must be chosen first. We settle for water as the biological medium, and show inter alia that the gas phase is still more removed from water than even the least polar of organic solvents. We also point out that, while minor tautomers may bind to receptors, this must entail an energetic penalty.
The 'basicity method' is the main source of quantitative data in water but suffers from systematic errors through its inevitable reliance on model compounds. Elimination of these using correction factors not only improves accuracy but has demonstrated structural regularities that have gone unsuspected till now. Their extrapolation leads to plausible predictions amenable to experiment. The effects of benzofusion, and of intramolecular lone pair and dipolar repulsion, exemplify these regularities and will be discussed.

Central to our approach is the realisation that tautomerism takes two forms, 'C-type' and 'N-type,' which depend on different electronic factors. The apparent inconsistencies that result may have helped to inhibit the comprehensive approach to tautomer ratio that is needed, and hopefully their rationalisation will help in its renewal.

horizontal rule

102 - Methods for robust and efficient tautomer enumeration, tautomer searching and tautomer duplicate filtering

József Szegezdi, Zsolt Mohácsi, Tamás Csizmazia, Szilárd Dóránt, Ákos Papp, György Pirok, Szabolcs Csepregi, Ferenc Csizmadia. ChemAxon Ltd., Budapest, Hungary

Tautomerism is an important and difficult problem in cheminformatics, and has gained much attention recently. [1] The presentation will focus on ChemAxon's approaches and algorithms for handling tautomerism.

There are four main topics to cover:

1. The tautomerization calculator plugin [2] is the basis of most methods. It can identify tautomerizable regions, enumerate all or dominant tautomers and
predict the distribution of dominant tautomers. Furthermore, it can provide generic and canonical tautomers that are used by the methods discussed. It first identifies possible proton donors and acceptors and finds the tautomerization paths between them. Depending on the desired operation, it then combines the paths into regions (generic tautomer), combinatorially enumerates all possible tautomeric forms (all tautomers), filters and ranks enumerated structures based on pKa and other criteria (dominant tautomers) or canonicalizes using empirical rules (canonical tautomer).

The tautomerization plugin is also used to improve results of other calculations, such as macro pKa and logP.

2. Tautomer duplicate search uses generic tautomers combined with a hash key. This method also allows fast filtering of tautomers in chemical database tables. It will be shown how this method is able to handle tautomeric migration of H isotopes and interactions with stereochemistry.

3. Tautomer substructure search enumerates tautomers of the query, and searches each of them separately. In case of query H constraints (explicit H), the constraint is enforced on the tautomeric region to retrieve only true tautomers.

4. Standardizer is a tool for performing custom and built-in transformations on molecules. It is integrated with the JChem chemical database system, so that database and query structures are automatically transformed by the specified transformations [3]. It will be shown how the canonical tautomer and custom transformations can be used to handle tautomerism. Custom transformations also allow handling of ring-chain tautomerism.


[1] Martin, Y.C.: Let's not forget tautomers J Comput Aided Mol Des (2009) 23:693-704, DOI 10.1007/s10822-009-9303-2
[2] Szegezdi, J.; Csizmadia, F: Tautomer generation. pKa based dominance conditions for generating dominant tautomers.
American Chemical Society meeting, Aug 19-23rd, 2007
[3] Pirok, G. et al: Standardizer - Molecular Cosmetics for Chemoinformatics.
Drug Discovery Technology, August 7-10th, 2006

horizontal rule

103 - Tautomerization approach for drug-like molecules

Dr. John C. Shelley PhD, Arron P. Sullivan, David Calkins, Dr. Jeremy R. Greenwood PhD. Schrodinger, Inc., Portland, Oregon, United States; Schrodinger, Inc., New York, New York, United States

We outline a pragmatic approach for generating the important protonation states, including tautomers, for drug-like molecules in the context of ligand and structure based virtual screening. The emphasis is on generating those states that have significant populations (which we define to be 0.01 mole fraction or more) in solution. These states also encompass the vast majority of those intuited from the examination of more than 2,500 protein-ligand complexes. The overall technology combines the use of many pre-parameterized tautomeric equilibria with Hammett and Taft calculation estimates of pKa values, which in turn can also be used to generate variations in both protonation states and tautomeric states. The overall approach permits the calculation of the mole fractions for the states generated along with their relative free energies. These free energy estimates have been shown to improve the performance of subsequent studies such as docking with Glide.

horizontal rule

104 - Acid/base ionization vs. prototropic tautomerism

Dr. Robert Fraczkiewicz PhD, Dr. Marvin Waldman PhD, Dr. Robert D. Clark PhD, Walter S. Woltosz MS, MAS, Dr. Michael B. Bolger PhD. Life Sciences, Simulations Plus, Inc., Lancaster, CA, United States

The most serious difficulty in computational predictive modeling of tautomerism is the lack of a sufficiently comprehensive database of tautomeric constants. [1] Published data on aqueous protonic ionization is, on the other hand, quite abundant to build successful QSPR models. Moreover, prototropic tautomerism is intimately tied to ionization in more than one way. We present compelling examples of how these ties can be explored to make both qualitative and quantitative predictions regarding tautomers using a truly predictive model of ionization constants. We show a very surprising case where the model refuted the widely accepted tautomeric form of one of the most successful drugs on the market today and how all of these predictions were confirmed beyond any doubt, both experimentally and theoretically. We demonstrate how the complex tautomerism of another very well known drug could be explained and quantified from its predicted ionization patterns. A general theoretical treatment of tautomer and ionization equilibria will be presented as well.

1. Martin, Y. C. J. Comput. Aided Mol. Des. 2009, 23, 693-704.

horizontal rule

105 - Combinatorial-computational-chemoinformatics approach
to finding and analyzing low-energy tautomers

Dr. Maciej Haranczyk, Prof. Maciej Gutowski. Computational Research Division, Larence Berkeley National Laboratory, Berkeley, CA, United States; Chemistry-School of Engineering and Physical Sciencs, Heriot-Watt University, Edinburgh, United Kingdom

Enumeration of low-energy tautomers of neutral molecules in the gas-phase or typical solvents can be performed by applying available organic chemistry knowledge.

However, in esoteric cases such as charged molecules in uncommon, non-aqueous solvents there is simply not enough available knowledge to make reliable predictions of low energy tautomers. We have been developing an approach to address the latter problem and we successfully applied it to discover the most stable anionic tautomers of nucleic acid bases that might be involved in the process of DNA damage by low-energy electrons. The approach involves three steps: (i) combinatorial generation of a library of tautomers, (ii) energy-based screening of the library using electronic structure methods, and (iii) analysis of the information generated in step (ii). In steps i-iii we employ combinatorial, computational and chemoinformatics techniques, respectively. This presentation summarizes our developments and most interesting methodological aspects of our approach.

horizontal rule

106 - Comparison of pattern-based and algorithm-based approaches to tautomer informatics

Ben Ellingson, Robert Tolbert, A. Geoffrey Skillman. OpenEye Scientific Software, Inc, Santa Fe, NM, United States

Tautomers are an important consideration for cheminformatics and molecular modeling. In cheminformatics, a unique tautomer is stored as the singular registration key where it is vital that the unique key can be generated from any tautomer as well as that all tautomers can be generated from the unique key. The stored tautomer is often chosen for aesthetics or computational ease, but chemical implications such as the loss or gain of aromaticity or stereochemistry through tautomerization must also be addressed. Molecular modelers are often concerned with small ensembles of low energy tautomers. Unfortunately, determining the low energy tautomers is a complex task, for which sub-kcal/mol accuracy remains computationally intensive [1]. Thus, tautomer prediction for large-scale modeling or cheminformatics remains the domain of approximate. We will discuss two such approximate methods, pattern-based tautomer recognition and atom-type tautomer recognition. The advantages and disadvantages of these approaches will be examined.

1. Geballe, M. T.; Skillman, A. G.; Nicholls, A.; Guthrie, J. P.; Taylor, P. J. The SAMPL2 Blind Prediction Challenge: Introduction and Overview. Journal of Computer-Aided Molecular Design 2010, 24, XX.

horizontal rule

120 - Community structure-activity resource: Collecting, curating, and generating protein-ligand data to improve docking and scoring

Dr. James B. Dunbar Jr, Prof. Heather A. Carlson. Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Ann Arbor, MI, United States

The Community Structure-Activity Resource (CSAR) is a center at the University of Michigan funded by the National Institute of General Medical Sciences. The function of this center is to collect, curate, and disseminate protein-ligand data sets of crystal structures, biological binding affinities, and thermodynamic data to aid in the refinement of docking and scoring methodologies. These data sets are to come from in-house projects at the University of Michigan, other academic labs, and most importantly from industrial, pharma sources. Part of our remit is to augment the deposited data with synthesis, crystallography, and assays to expand the range of properties, binding affinities, and other relevant characteristics involved in docking and scoring. Here, we present CSAR's capabilities and summarize our current in-house project and potential future targets. We also outline the creation of a dataset (based on the PDB, Binding MOAD, and PDBbind) used in our first community-wide benchmark exercise.

horizontal rule

121 - Results of CSAR's 2010 Benchmark Exercise

Dr. James B. Dunbar, Dr. Richard D. Smith, Prof. Heather A. Carlson. Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Ann Arbor, MI, United States

The goal of CSAR's Benchmark Exercises is not to declare winners and losers! Instead, we combine the results of all participants to provide a wider assessment of the field. Here, we present an analysis of which protein-ligand complexes score poorly across the majority of submissions (“globally bad” complexes) and compare their properties to the set of complexes that score well across the majority of methods (“globally good”). It may be tempting to draw conclusions by simply examining the characteristics of the globally bad set, but those characteristics must be rarely observed in the globally good set to gain true insight. Lastly, each participant was asked to submit a standard method and an alternative approach. Several groups showed that the correlation to experiment was the same for vdw/fit-based scores as for full scoring functions that included electrostatics and hydrogen bonding. To help the field overcome this limitation, CSAR will focus on creating datasets that provide a range of hydrogen-bonding characteristics. The overarching goal of our benchmark exercises is to provide insight into what data is most needed to move our field ahead.

horizontal rule

122 - Scoring performance of eHiTS on the CSAR dataset

Zsolt Zsoldos PhD, Orr Ravitz PhD. SimBioSys Inc., Toronto, Canada

Numerous studies have pointed out at the inability of scoring functions to perform uniformly well accross all biological systems of interest. Some studies suggest guidelines for choosing the best method for a specific problem, others advocate consensus techniques.

An alternative solution is to tailor the scoring function for the system of interest. eHiTS uses a novel scoring method consisting of statistical knowledge focused on interacting surface points and physical terms combined with an adaptive parameter scheme. During the automated tuning of eHiTS-score, receptor targets are clustered according to the chemical and shape similarity of the active site, and weight sets are optimized for each family.

The performance of eHiTS on the CSAR dataset was evaluated using the default parameters (pre-tuned on other data). In addition, the automatic tuning utility was run on one subset of the CSAR data and tested on the other. Results will be presented from both studies.

horizontal rule

123 - Hydrophobic complementarity: A dominant term in affinity and binding mode prediction

Dr. Leslie A. Kuhn, Matthew E. Tonero. Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, United States

Empirical scoring functions designed for high-throughput docking, containing linear combinations of terms measuring protein-ligand interactions, were tested for affinity prediction. Scoring functions that best predicted affinity were dominated by hydrophobic or shape complementarity terms. Similarly, a scoring function containing only polar terms compensated for the absence of a hydrophobic term by heavily weighting the polar term that correlated most with hydrophobic complementarity. These results are consistent with Eisenberg & McLachlan's observation that the solvation component of the change in Gibbs free energy upon binding is proportional to the surface area and degree of hydrophobicity of atoms buried in the interface. Scoring functions that perform best at affinity prediction are not necessarily optimal for binding mode prediction, though hydrophobic burial is important in both. In other words, tuning scoring functions only to predict the affinity of good ligands in the correct binding mode can limit their applicability, suggesting a broader approach.

horizontal rule

124 - Docking and scoring for 2010 CSAR benchmark using an improved iterative knowledge-based scoring function with MDock

Sheng-You Huang, Xiaoqin Zou. Department of Physics, Department of Biochemistry, Dalton Cardiovascular Research Center, Informatics Institute, University of Missouri-Columbia, Columbia, MO, United States

Based on a physics-based iterative method (Huang & Zou, J. Comput. Chem., 2006, 27, 1865-75; 1876-82), we have extracted a set of distance-dependent all-atom potentials for protein-ligand interactions (ITScore2.0) using a large training set of 1300 protein-ligand complexes. The iterative method circumvents the long-standing reference state problem in traditional knowledge-based scoring functions. ITScore2.0 has been tested with the 2010 CSAR dataset of 345 diverse protein-ligand complexes, and achieved a correlation coefficient of 0.73 between the calculated binding scores and experimental affinity data, compared to 0.58 for the van der Waals (VDW) scoring function and 0.32 for the force field (FF) scoring function consisting of VDW and electrostatic terms. For rigid-ligand docking, ITScore2.0 achieved a success rate of 86.7% in identifying native binding modes, compared to 80.0% and 64.1% for FF and VDW. For flexible-ligand docking, ITScore2.0 yielded a success rate of 79.7%, compared to 71.0% and 52.8% for FF and VDW. The moderate performance of VDW suggests that VDW alone may serve as a benchmark for evaluation of scoring functions. What we have learned through participating in CSAR scoring will be shared.

horizontal rule

145 - Lead Finder in the CSAR scoring challenge

Victor Stroylov MD, Dr Ghermes Chilov, Dr Oleg Stroganov, Fedor Novikov, Val Kulkov MD, MBA. "Molecular Technologies", Ltd, Moscow, Russian Federation; BioMolTech, Corp., Toronto, Ontario, Canada

Lead Finder is a specialized software package for ligand docking, binding energy evaluation and virtual screening. The standard approach in estimation of binding affinities of protein-ligand complexes of the CSAR test set was the use of Lead Finder v.1.1.14 scoring mode that estimates free energy of protein-ligand binding for the fixed ligand coordinates for each protein-ligand complex. No pre-optimization of either protein or ligand structures were performed.
The improvements in the scoring protocol included corrections of protein's and ligand's protonation states, positions of functional hydrogen atoms (for proteins only), and local geometry of nitrogen atoms (for ligands only). No other improvements of Lead Finder's the standard scoring function have been performed.
The RMSD of estimated vs experimentally obtained protein-ligand binding energies was found to be equal to 2.07 kcal/mol and 1.98 kcal/mol for the standard and improved protocols correspondingly.

horizontal rule

146 - Benchmark of solvated interaction energy (SIE) scoring function on the CSAR-2010 dataset

Traian Sulea, Qizhi Cui, Herve Hogues, Christopher R Corbeil, Enrico O Purisima. Biotechnology Research Institute, National Research Council Canada, Montreal, QC, Canada

Solvated interaction energy (SIE) is a first-principle function for predicting absolute binding affinities from force-field non-bonded terms, continuum solvation, and scaling for configurational entropy. Standard SIE parametrization applied to the CSAR dataset with binding interfaces refined by constrained minimization predicted absolute affinities with 2.5 kcal/mol mean-unsigned-error, but with correlation outperformed by buried surface or van der Waals interaction alone. Re-training SIE on CSAR subsets led to increased solute dielectric and reduced electrostatic interactions, stressing the weak signal carried by calculated electrostatics in this heterogeneous dataset. Overestimated complexes implicate highly negatively-charged ligands interacting via metals. Underestimated outliers reveal alternate protonation states that significantly improve SIE predictions. In an upgraded version of the CSAR dataset with reassigned protonation states, 10% of ligands and 20% of proteins are affected. Among other investigated aspects are the sensitivity to polar hydrogens orientation, incorporation of MD-generated ensembles, different solvent models and entropy estimates, and ligand strain.

horizontal rule

147 - Protonation states and scoring receptor-ligand poses: It's always the details

Emilio Xavier Esposito PhD. exeResearch LLC, East Lansing, Michigan, United States

The protonation state of the receptor - ligand complex has a large influence over the correct approximation of the binding interactions. Using the CSAR dataset, various methods of assigning the complex's protonation state are used to explore the abilities of several scoring functions with respect to protonation state. In conjunction with the complex's protonation state, the 'standard' protocols employed to prepare a receptor for a docking simulation, along with the post-dock refinement of poses, are explored.

horizontal rule

148 - Role of active-site solvent in protein-ligand binding affinity calculations

Dr. Ye Che, Dr. Veerabahu Shanmugasundaram. Groton Structural Biology, Antibacterials Chemistry/Discovery Technologies, Pfizer PharmaTherapeutics Research & Development, Groton, CT, United States

Accurate methods for computing binding affinities of a small molecule to a protein are needed to speed the discovery and optimization of new medicines. An assessment of six scoring functions commonly applied at Pfizer using the CSAR (Community Structure-Activity Resource) set of protein-ligand complexes will be presented. A current weakness amongst these various scoring functions is the treatment of active-site water molecules. Here, we quantitatively estimate the thermodynamic properties of active-site water molecules and capture the effects of solvent displacement from the protein active site. Water inclusion shows promise in improving current scoring functions and we propose that this could be used more extensively in virtual screening and lead optimization applications.

horizontal rule

149 - Flexible docking using a stochastic rotamer library of ligands

Dr. Feng Ding, Dr. Shuangye Yin, Prof. Nikolay V. Dokholyan. Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Uncovering structures of molecular complexes via computational
docking is at the heart of many structural modeling efforts and virtual drug
screening. Modeling both receptor
and ligand flexibility is important in order to capture receptor conformation
changes induced by ligand binding, but is a major challenge in computational
drug discovery. Many flexible docking approaches model the ligand and receptor
flexibility either separately or in a loosely-coupled manner, which captures
the conformational changes inefficiently. Here, we propose a truly flexible docking
approach, MedusaDock, which models both ligand and receptor flexibility
simultaneously using sets of discrete rotamers. We developed an algorithm which
allows for the building of the ligand rotamer library “on the fly” during
docking simulations. MedusaDock benchmarks demonstrate a rapid sampling
efficiency and high prediction accuracy in both self-docking (to the
co-crystallized state) and cross-docking (to a state co-crystallized with a
different ligand), the latter of which mimics the virtual screening procedure in
computational drug discovery. We also perform a virtual-screening test for a
flexible protein target, cyclin-dependent kinase 2. We find a significant
improvement in virtual screening enrichment when compared to rigid-receptor
methods. The high predictive power of MedusaDock comes from several
innovations, including the generation of a stochastic rotamer library of
ligands, the efficient docking protocol, and the novel ligand pose-ranking
method. We expect a broad adaption of these methodologies and the application
of MedusaDock in ligand-receptor interaction predictions and drug discovery.

horizontal rule

150 - Cheminformatics meets molecular mechanics: A combined application of knowledge based pose scoring and physical force field-based hit scoring functions improves the accuracy of virtual screening

Jui-Hua Hsieh, Shuangye Yin, Xiang S. Wang, Shubin Liu, Nikolay V. Dokholyan, Alexander Tropsha. University of North Carolina at Chapel Hill, United States

Many scoring functions fail to discriminate between true binders and non-binders (binding decoys), leading to a large number of false positive hits in virtual screening (VS) studies. We have developed a novel binary QSAR-like approach that discriminates geometrical pose decoys from native-like poses for each ligand. We have applied it for filtering (presumed) decoy poses from a library of docked ligand conformations followed by scoring the remaining poses with the MedusaScore physical force field-based scoring. We have demonstrated that this pre-filtering affords a significant improvement of hit rates in virtual screening studies for 5 of the 6 benchmark sets from the Database of Useful Decoys (DUD). Moreover, the top 10 hits in these 5 sets were found to include chemically diverse ligands while yielding high true positive rates (60-100%). We will discuss the methodology as well as the results of applying this approach to CSAR datasets.

horizontal rule

151 - Application of free energy methods to water molecules in protein binding sites

Prof. Jonathan W. Essex D.Phil., Dr Caterina Barillari PhD, Mr Michael Bodnarchuk, Dr Russell Viner PhD. School of Chemistry, University of Southampton, Southampton, Hampshire, United Kingdom; Jealott’s Hill International Research Centre, Syngenta, Bracknell, United Kingdom

Water molecules play a crucial role in mediating the interaction between a ligand and a macromolecular receptor. An understanding of the nature and role of each water molecule in the active site of a protein could  efficiency of rational drug design approaches. In this presentation, a range of different simulation methods, including double decoupling with replica exchange thermodynamic integration, Grand-Canonical Monte Carlo, and JAWS, are used to calculate the absolute binding free energies of a number of water molecules in protein-ligand complexes. The relative merits of each of these methods are discussed. In addition, the development of a number of descriptor-based QSAR models for calculating water binding free energies is described, with a view to reducing the need for expensive free energy simulations.

horizontal rule

152 - Which waters are important and how do we
identify them?

Dr Simon Bowden, Dr Jason C Cole, Dr Oliver Korb, Dr Tjelvar Olsson, Dr John Liebescheutz, Dr Colin Groom. Cambridge Crystallographic Data Centre, Cambridge, United Kingdom

The important role waters play in ligand binding both in terms of thermodynamics and selectivity is well known but identifying which waters are important for the success of a docking experiment is still difficult. Given that consideration of waters involved in primary and secondary mediated protein-ligand contacts has been shown to improve success rates in both native docking and virtual screening, experimenters need tools to help them decide which waters are important and which are not even real.

In this talk we will describe tools which may be of use to identify important waters and to highlight dubious waters. Conserved water structures can also be identified which may have an important influence on ligand binding. The effect of this information when applied to molecular docking will be demonstrated.

horizontal rule

153 - Free energies and entropies of water molecules at protein-ligand interfaces

Prof. Steve W Rick PhD, Mr. Hongtao Yu. Chemistry, University of New Orleans, New Orleans, LA, United States

Water molecules are commonly found at he protein-ligand interface. The thermodynamics of these water molecules plays an important role in ligand affinity. In particular, the entropic cost of localizing a water molecule at the binding site can be significant. From the database of crystal structures, it is evident that the local environments of water molecules at the protein-ligand interface can vary considerably. We use molecule dynamics simulations and thermodynamic integration to calculate the free energy, enthalpy, and entropy changes associated with localizing a water molecule at a wide variety of sites at protein-ligand interfaces. Results analyzing how the free energies, enthalpies, and entropies depend on the details of the local environment, including the number of hydrogen bonds and the cavity size, will be presented.

horizontal rule

154 - Role of water molecules in docking studies of Cytochromes P450

Dr. Chris Oostenbrink. Institute of Molecular Modeling and Simulation, BOKU University, Vienna, Austria; Chemistry and Pharmaceutical Sciences, VU University, Amsterdam, The Netherlands

Active-site water molecules form an important component in biological systems facilitating promiscuous binding, or an increase in specificity and affinity. Taking water molecules into account in computational approaches to drug design or site-of-metabolism prediction is far from straightforward. The effect of including water molecules in molecular docking simulations of metabolic Cytochrome P450 enzymes is investigated, focusing on pose prediction, virtual screening and free energy estimates. The structure and dynamics of water molecules that are present in the active site simultaneously with selected ligands are described. The transferability of hydration sites between different ligands is investigated. The role of water molecules appears to be very dependent on the protein conformation and the substrate, further enhancing the versatility of these metabolic enzymes.

horizontal rule

155 - Modeling explicit waters in docking and scoring

Dr. Niu Huang. National Institute of Biological Sciences, Beijing, Beijing, China

Water molecules play an important role in protein-ligand recognition. However, incorporating explicit waters during docking is challenging in both the sampling and scoring aspects. We explored a method to switch ordered water molecules “on” (retained) and “off” (displaced) during docking screens. This method assumes additivity and scales linearly with the number of waters sampled despite the exponential growth in configurations. We tested this approach for ligand enrichment in screens of a large compound database against 24 DUD targets, exploring up to 8 waters in 256 configurations. Compared to calculations where the water positions were not sampled, enrichment factors increase substantially for 12 of the targets and are largely unaffected for most others. However, in our previous study, the positions of the water molecules were obtained from the x-ray structures, and all waters were treated as equally displaceable without the consideration of the differential energy of water binding. Our recent work in improving the treatment of waters during docking and scoring will be presented.

horizontal rule

156 - Desolvation/resolvation: A revolving door that controls the rates of association/dissociation of protein-ligand complexes? Analysis of PCSK9-EGF-A binding kinetics using WaterMap

Dr. Robert A. Pearlstein Ph.D., Dr. Qi-Ying Hu Ph.D., Dr. Jing Zhou Ph.D., Dr. David Yowe Ph.D., Dr. Julian Levell Ph.D., Bethany Dale, Virendar Kaushik, Dr. Doug Daniels Ph.D., Susan Hanrahan, Dr. Woody Sherman Ph.D., Dr. Robert Abel Ph.D.. Novartis Institutes for BioMedical Research, Cambridge, MA, United States; Schrodinger, Inc., New York, NY, United States

We hypothesize that desolvation and resolvation processes can constitute rate-determining steps for protein-ligand association and dissociation, respectively. We tested this hypothesis using proprotein convertase subtilisin-kexin type 9 (PCSK9) bound to the epidermal growth factor-like repeat A (EGF-A) of low density lipoprotein cholesterol receptor (LDL-R). We analyzed and compared predicted desolvation properties of wild-type vs. gain-of-function mutant Asp374Tyr PCSK9 using WaterMap, a new method for calculating preferred locations and thermodynamic properties of water solvating proteins (“hydration sites”). We propose that fast kon and entropically driven thermodynamics observed for PCSK9-EGF-A binding is due to functional replacement of water occupying stable PCSK9 hydration sites (exchange of water for polar EGF-A groups). We further propose that relatively fast koff observed for EGF-A unbinding results from limited displacement of unstable water. Slower koff observed for EGF-A and LDL-R unbinding from Asp374Tyr PCSK9 may be due to destabilizing effects of this mutation on PCSK9 hydration sites.

horizontal rule

157 - Biophysics-based library design: Discovery of “non-acid” inhibitors of S1 DHFR

Veerabahu Shanmugasundaram, Kris Borzilleri, Jeanne Chang, Boris Chrunyk, Mark E Flanagan, Seungil Han, Melissa Harris, Brian Lacey, Richard Miller, Parag Sahasrabudhe, Ron Sarver, Holly Soutter, Jane Withka. Groton Structural Biology, Antibacterials Chemistry/Discovery Technologies, Pfizer PharmaTherapeutics Research & Development, Groton, CT, United States; AntiBacterials Chemistry, Pfizer PharmaTherapeutics Research & Development, Groton, CT, United States; AntiBacterials Research Unit, Pfizer PharmaTherapeutics Research & Development, Groton, CT, United States

Methicillin-resistant Staphylococcus aureus (MRSA), the causative agent of many serious nosocomial and community acquired infections, and other gram-positive organisms can show resistance to trimethoprim (TMP) through mutation of the chromosomal gene or acquisition of an alternative DHFR termed "S1 DHFR" To develop new therapies for health threats such as MRSA, it is important to understand the molecular basis of TMP resistance and use that knowledge to design and develop novel inhibitors that are effective against S1 DHFR. This presentation will highlight and illustrate an effort using a multi-pronged biophysics based strategy that utilizes NMR, thermodynamic, kinetic, structural, computational and medicinal chemistry information in developing an understanding of the mechanism of resistance in S1 DHFR as well as using this prospectively in drug discovery. Specifically this presentation will illustrate computational studies using WaterMap (WM) that developed an understanding of a key element of the mechanism of resistance that was supported by a variety of biophysical experiments and use of these WM calculations in a prospective fashion in library design.

horizontal rule

170 - Computational evaluation of tautomers and zwitterions of D-amino acid oxidase (DAAO) inhibitors

Scot Mente. Neuroscience Chemistry, Pfizer Global Research and Development, Groton, CT, United States

Quantum mechanical calculations and molecular docking were used in to design novel inhibitors of D-amino acid oxidase (DAAO). Using available x-ray structural information and simple tautomer enumeration tools, reasonable docked poses of a set of small ligands have been obtained. Use of these tools have helped lead to the optimization of the novel non-acidic 3-hydroxyquinolin-2(1H)-one Series (I), as well as the identification of structurally similar 3-hydroxyquinoline (II) and benzotriazole (III). Despite their small sizes, all three of these molecular scaffolds are capable of adopting multiple tautomer or zwitterionic states. The ability to accurately predict these states with quantum mechanical methods will be discussed.

horizontal rule

171 - Defining states of ionization and tautomerization of thiamin diphosphate at individual reaction intermediates on enzymes: Enzymes that use a rare tautomeric form

Prof. Frank Jordan PhD, Dr. Natalia S. Nemeria PhD, Mr. Anand Balakrishnan, Mr. Siakumar Paramasivam, Prof. Tatyana Polenova PhD. Chemistry, Rutgers University, Newark, NJ, United States; Chemistry and Biochemistry, University of Delaware, Newark, DE, United States

The author and coworkers demonstrated on several thiamin diphosphate (ThDP) enzymes that the 1',4'-iminopyrimidine tautomer of ThDP participates at several reaction steps. Hence, ThDP has dual function: an electrophilic covalent catalyst - a function long accepted- and an acid-base catalyst facilitating the ionization of the weak carbon acid to generate the C2 ylide.
It is proposed that ThDP exists in these forms on enzymes: the N1'-protonated 4-aminopyrimidinium (APH+) in protolytic equilibrium with its three conjugate bases, the canonical 4-aminopyrimidine (AP), its 1',4'-iminopyrimidine (IP) tautomeric form, and the C2 carbanion or ylide (Yl). The first three forms have been observed on multiple enzymes in the absence of substrate. In the presence of substrate and analogs, the IP form has been seen on several enzymes along with the APH+ state. Circular dichroism and solid-state NMR methods are being used for the first time to characterize different species. Supported by NIH-GM-050380 and 5P20RR017716.

horizontal rule

172 - Do tautomers matter in calculating molecular similarity?

Dr. Steven W Muchmore PhD, Isabella Haight, Dr. Scott Brown. Cheminformatics, Abbott Laboratories, Abbott Park, IL, United States

Compounds that have multiple tautomeric forms, which typically account for about 25% of pharmaceutical company corporate collections, present a challenge in cheminformatic analysis. While widely recognized, their manipulations are often ignored in database registration, substructure searching and similarity searching due to incremental increases in computation time and
data management. However, clustering and diversity selection, which are based on similarity calculations, could yield erratic results if they include or exclude molecules that happen to be encoded as different tautomers. We enumerated tautomers
for a data set of more than 66,000 compound pairs with associated activity against protein targets used in the assessment of similarity programs (Muchmore et al. J. Chem. Inf. Model. 2008, 48, 941). The similarity value for the highest scoring tautomer pair was compared to the original data to determine if its similarity score increased.  These tautomer similarity values were also applied to single representation results to determine if tautomer enumeration would yield a better estimate of the probability that two compounds will be equipotent.

horizontal rule

173 - Automated prediction of tautomeric states in protein-ligand complexes

Sascha Urbaczek, Stefan Bietz, Prof. Dr. Mathias Rarey. Center for Bioinformatics, University of Hamburg, Hamburg, Hamburg, Germany

Hydrogen bonding plays a mayor role in the stabilization of protein-ligand complexes. Unfortunately, the positions of hydrogen atoms are not resolved in most structures present in the PDB. This makes it particularly hard to predict adequate tautomeric and protonation states for the atoms and groups involved in the binding. To overcome this difficulty many approaches have been developed to predict the correct protonation of either the ligand or the protein separately using a variety of different methodologies. We present a new method that predicts the tautomeric and protonation states as well as the resulting hydrogen atom positions of both the protein and the ligand simultaneously. The optimization of these states is based on an empirical scoring scheme used also in docking methods. Assuming an optimal hydrogen bonding network, the obtained results indicate that the most stable tautomeric forms in solution do not always correspond to those found in binding modes.

horizontal rule

174 - Predicting relative binding affinities in the CSAR Scoring

Prof. Matthew P Jacobson, Dr. Chakrapani Kalyanaraman. Department of Pharmaceutical Chemistry, UCSF, San Francisco, CA, United States

We have been interested in evaluating whether all-atom force fields combined with implicit solvent models can be also used as a docking scoring function. Our prior experience has suggested that such energy functions can be used for, at best, predicting relative binding affinities to a particular binding site, with the best results being achieved for chemically related compounds, such as congeneric series generated in lead optimization. Thus, although predicting absolute binding affinities is a noble challenge, we have not attempted to do so in the CSAR exercise. Instead, with the assistance of the organizers, we focused on series of compounds bound to the same target. The results using the protein-ligand structures as provided showed essentially no ability to rank order compounds by binding affinity. However, complete energy minimization, and in some cases correcting protonation states, significantly improved the results, to the point where there was some ability to distinguish more potent from less potent compounds, as we have also shown in other work on congeneric series. I will also discuss our attempts to characterize and correct some of the many limitations of this simple scoring scheme.

horizontal rule

175 - Surflex:
Docking and scoring on CSAR

Prof. Ajay N Jain PhD. Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, United States

One of the most challenging aspects of structure-based
drug design is binding affinity prediction, since it embeds both the pose determination problem as well as requiring accuracy in estimation of energetic contributions where differences on the order of 1 kcal are large enough to matter. Even in the artificial case where a bound ligand/target structure is known, this remains a challenging problem. We present results for the Surflex family of methods for making predictions on the CSAR 2010 benchmark data set. Results will include straight docking-based pose prediction and scoring, tuned scoring approaches through scoring function optimization and protein structure optimization, and ligand-based approaches.

horizontal rule

176 - What we can learn from very large panel docking screens

Kong T Nguyen, John J Irwin, Brian K Shoichet, Michael M Mysinger. Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, United States

Whereas molecular docking is the most practical way to leverage structure for ligand discovery, the method retains important weaknesses. Among the more confounding problems is that docking can work well one target yet fail completely on the next, yet predicting in advance which will succeed or fail is challenging. To investigate the strengths and weaknesses of docking we have assembled a very large panel of experimental information with which to test it. We have used our automated docking program, DOCK Blaster1. , to study the performance of DOCK 3.5.54 against many protein targets for which experimental control information is available2. We have focused on two of the seven stated goals of the 2010 CSAR Workshop: to provide a baseline assessment of current scoring functions and to document which targets are most difficult. This approach has enabled us to comprehensively test the effect of changes in sampling, scoring and library composition.


1. Irwin, J.J. et al. Automated docking screens: a feasibility study. J Med Chem 52, 5712-20 (2009).

2. Overington, J. ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI). Interview by Wendy A. Warr. J Comput Aided Mol Des 23, 195-8 (2009).

horizontal rule

177 - Docking and scoring of fragments

Dr Marcel L Verdonk PhD. Astex Therapeutics Ltd, Cambridge, United Kingdom

Through the application of fragment-based drug discovery, Astex have produced >1,400 in-house X-ray crystal structures of fragments and >2,500 structures of lead-like compounds against a range of drug targets. From this wealth of structural data, we have constructed two test sets, each containing ~100 complexes, representing 10 drug targets. In the first test set the ligands are fragments, whereas in the second test set the ligands are lead-like compounds. By applying docking and virtual screening on these sets, we will discuss whether fragments are harder to dock and score than larger compounds, and present our latest experiences on docking and scoring fragments. In addition, we will show how structural data on fragments obtained early on in drug discovery projects can be used to improve docking and scoring during the hit-to-lead phases. Finally, we will show examples of the application of docking and scoring of fragments on actual drug discovery programs.

horizontal rule

217 - Molecular dynamics studies of water-protein interactions

Gerhard Hummer, Jayendran C. Rasaiah, Hao Yin, Guogang Feng. Laboratory of Chemical Physics, National Institutes of Health, Bethesda, MD, United States; Department of Chemistry, University of Maine, Orono, ME, United States

We use molecular dynamics simulations to study the interaction of water with proteins. With the help of a semi-grand canonical formalism, we determine the structure, dynamics, and thermodynamics of water in the protein interior and at buried sites. We find that water filling of weakly polar protein cavities from the solvent is governed by a subtle balance between the loss in bulk hydrogen bond interactions, the gain in strong hydrogen-bond interactions between confined water molecules, weakly attractive interactions between water and the cavity, and the entropic gain from filling a void space. The simulation results will be compared to X-ray crystallography and NMR experiments. The effects of interfacial and cavity water on protein function and ligand binding will be discussed.

horizontal rule

218 - Addressing limitations with the MM-GB/SA scoring procedure using the WaterMap method and free-energy perturbation calculations

Dr. Cristiano R. W. Guimaraes. CVMD Chemistry, PharmaTherapeutics Research and Development, Pfizer, Inc., Groton, Connecticut, United States

The MM-GB/SA scoring technique has become an important computational approach in lead optimization. Despite showing good accuracy, much work is necessary before the method can be applied to rank multiple chemical series. Here, we investigate the poor estimation of protein desolvation provided by GB/SA and the large dynamic range in the MM-GB/SA scoring compared to that of the experimental data. In the former, replacing the GB/SA protein desolvation by the WaterMap free energy liberation of binding-site waters provides the best results. However, the improvement is modest over results obtained with the MM-GB/SA and WaterMap methods individually, apparently due to the high correlation between the free energy liberation and protein-ligand van der Waals interactions. As for the large dynamic range, comparisons between MM-GB/SA and FEP calculations indicate that it has its origin in the lack of dynamical screening of protein-ligand electrostatic interactions and the incomplete description of enthalpy-entropy compensation effects.

horizontal rule

219 - Prediction of potency of protease inhibitors by GBSA simulations with polarizable quantum mechanics-based ligand charges and a hybrid water model

Dr. Debananda Das, Dr. Hiroaki Mitsuya, Dr. Yasuhiro Koh, Yasushi Tojo, Dr. Arun Ghosh. HIV and AIDS Malignancy Branch, National Cancer Institute, Bethesda, MD, United States; Departments of Hematology and Infectious Diseases, Kumamoto University Graduate School of Medical and Pharmaceutical Sciences, Kumamoto, Japan; Departments of Chemistry and Medicinal Chemistry, Purdue University, West Lafayette, Indiana, United States

Reliable and robust prediction of binding affinity for drug molecules continues to be a daunting challenge. We have simulated the binding interactions and free energy of binding of several protease inhibitors (PIs) with wild-type and various mutant proteases by performing GBSA simulations, in which each PI's partial charge was determined by quantum mechanics and the partial charge accounts for the polarization induced by the protease environment. We employed a hybrid solvation model that retains selected explicit water molecules in the protein with surface generalized Born implicit solvent. We examined the correlation of the free energy with antiviral potency of PIs. The free energy showed a strong correlation with experimentally determined anti-HIV-1 potency. The present data suggest that the presence of selected explicit water in protein, and protein polarization-induced quantum charges for the inhibitor, compared to lack of explicit water and a static force field-based charge model, can serve as an improved lead optimization tool, and warrants further exploration.

horizontal rule

220 - Continuum theory and the analysis of active sites

Dr. Anthony Nicholls PhD, Dr. Mike Word. Department of Research and Development, OpenEye Scientific Software, Inc, Santa Fe, NM, United States

Continuum theory for electrostatics free energies at the molecular level was never supposed to work- water is discrete and the very idea of treating its properties as a mean field was considered inappropriate. Yet Poisson-Boltzmann (PB) theory continues to perform as well as, if not better than, explicit water treatments in the estimation of small molecule solvation or macromolecular biophysics. However, it is still assumed PB will fail to correctly describe the physics of the active sites of proteins. As this remains a focus for predictive drug discovery, is this assumption correct? And if it is, can we improve continuum theory by going beyond the mean field limit, i.e. producing a 'virial' expansion of PB? This talk will cover our attempts to date and the physical insight gained.

horizontal rule

221 - Prediction of consistent water networks in uncomplexed protein binding sites based on knowledge-based potentials

Michael Betz, Gerd Neudert, Professor Gerhard Klebe PhD. Institute of Pharmaceutical Chemistry, Philipps-University Marburg, Marburg, Germany

Within the active site of a protein water fulfills a variety of different roles. Solvation of hydrophilic parts stabilizes a distinct protein conformation, whereas desolvation upon ligand binding may lead to a gain of entropy. In an overwhelming number of cases, water molecules mediate interactions between protein and the bound ligand. Therefore, a reliable prediction of water molecules participating in ligand binding is essential for docking and scoring, and is necessary to develop strategies in ligand design. We require some reasonable estimates about the free energy contributions of water to binding.

Useful parameters for such estimations are the total number of displaceable water molecules and the probabilities for their displacement upon ligand binding. These parameters depend on specific interactions with the protein and other water molecules, and thus the positions of individual water molecules.

The high flexibility of water networks makes it difficult to observe distinct water molecules at well defined positions in structure determinations. Thus, experimentally observed positions of water molecules have to be assessed critically, bearing in mind that they represent an average picture of a highly dynamic equilibrium ensemble. Moreover, there are many structures with inconsistent and incomplete water networks.

To address these deficiencies we developed a tool that predicts possible configurations of complete water networks in binding pockets in a consistent way. It is based on the well established knowledge-based potentials implemented into DrugScore, which also allow for a reasonable differentiation between "conserved" and "displaceable" water molecules. The potentials used were derived specifically for water positions as observed in small molecule crystal structures in the CSD.

To account for the flexibility and high intercorrelation we apply a clique-based approach, resulting in water networks maximizing the total DrugScore.

To incorporate as much known information as possible about a given target, we also allow to include constraints defined by experimentally observed water positions.

Our tool provides a useful starting point whenever a possible configuration of water molecules need to be estimated in an uncomplexed protein, and suggests their spatial positions and their classification with respect to some kind of affinity prediction.

In first tests we were able to get classifications and positional predictions which are in good agreement with crystallographically observed water molecules with remarkably small deviations.

horizontal rule

222 - Explicit-water modeling of a model protein-ligand binding site predicts the non-classical hydrophobic effect

Demetri T. Moustakas PhD, Phil W Snyder PhD, Woody Sherman PhD, Prof. George M Whitesides. Department of Infection, Computational Sciences, AstraZeneca R&D Boston, Waltham, MA, United States; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, United States; Schrödinger, Inc., New York, NY, United States

This work reports a study of the thermodynamics of hydrophobic interactions between human carbonic anhydrase II and a series of structurally analogous heteroaromatic sulfonamides. Isothermal titration calorimetry (ITC) established that increasing the non-polar surface area of the ligands resulted in a large enthalpy-dominated increase the binding affinity - the so-called non-classical hydrophobic affect. Subsequent X-ray crystallography studies reveal no significant changes in protein-ligand interactions as a function of increasing the ligand non-polar surface area, suggesting that solute-solvent interactions are responsible for the observed thermodynamic effects. Modeling studies using explicit solvent models suggest that the larger ligands alter both the structure and thermodynamic characteristics of water molecules in the binding site, which contributes significantly to the observed non-classical hydrophobic effect.

horizontal rule

223 - New coarse-grained model for water: The importance of electrostatic interactions

Zhe Wu, Prof. Qiang Cui, Prof. Arun Yethiraj. Department of Chemistry, UW Madison, Madison, WI, United States

A new coarse-grained (CG) model for water is developed based on the properties of clusters of four water molecules in atomistic simulations. CG units interact via a soft non-electrostatic interaction. Electrostatic interactions are incorporated via three charged sites with the charges and model topology chosen to reproduce the dipole moment and quadrupole moment tensor of 4-water clusters. The parameters in the model are optimized to reproduce experimental data for the compressibility, density, and permittivity of bulk water, and the surface tension and interface potential for the air-water interface. This big multipole water (BMW) model represents a qualitative improvement over existing CG water models, e.g., it reproduces the dipole potential in membrane-water interface when compared to experiment, with modest additional computational cost.

horizontal rule

359 - Introduction to cross pharma high performance computing forum

John C Morris MBA, Dr Zheng Yang. Massachusetts Research Business Technology, Pfizer, Cambridge, MA, United States; Department of Computational and Structural Chemistry, GlaxoSmithKline Pharmaceuticals, Collegeville, PA, United States

High Performance Computing (HPC) within the pharmaceutical industry is a growing and critical component of research due to the large scale analytical demands driven by modern research methods and advancements in computational chemistry and bioinformatics methods to model biological systems. HPC has become a necessary capability to facilitate the analysis of the terabytes of scientific data being generated from technologies such as Next Generation Sequencing, modeling complex drug-target interaction, and statistical analysis. To support the industrialization of scientific research, integrated and coordinated HPC information technology tools, methods, and capabilities are needed. The Cross Pharma HPC forum is a group of scientists, engineers, and key stakeholders within the pharmaceutical industry working together to promote best practices, coordinate activities, optimize methods, and leverage experience in the non-competitive areas within HPC. In this talk, the history, current status, and future directions of HPC in the pharmaceutical industry will be discussed.

horizontal rule

360 - Applications and use of cloud computing in the pharmaceutical industry

Dr. Michael D Miller PhD, David M Powers, Gregory Stiegler, Dr Jeremy Martin M PhD. Research Business Technlogy, Pfizer, Groton, CT, United States; Research and Development IT, Eli Lilly, IIndianapolis, Indiana, United States; Scientific Computing, Bristol-Myers Squibb, Princeton, New Jersey, United States; System Support Department, Information Technology, GlaxoSmithKline R&D Ltd, Harlow,, Essex, United Kingdom

Technological advances across the sciences have enabled basic drug research with an unprecedented amount of data. As a result, the application of computational methods are becoming an increasingly important approach in drug discovery and development. The need for increased computing capacity has reached the point where, today it can become rate limiting. As a result Pharmaceutical companies have begun exploring the use of cloud computing to address these needs. We will present on some of the challenges Pharmaceutical companies have faced in using cloud resources and the different approaches that have been taken to address them.

horizontal rule

361 - Current trends of high performance computing in Pharma

Dr. Stephen Litster, Dr. Jeremy Martin. NITAS Scientific Computing, Novartis Institutes of BioMedical Research, Cambridge, MA, United States; Department of System Support, Information Technology, GlaxoSmithKline R&D Ltd, Harlow, Essex, United Kingdom

The world of high performance computing (HPC) has evolved quickly, as exemplified by recent developments in hardware (e.g. Intel Nehalem multi-core CPUs with integrated memory controller), software (e.g. NAMD, a highly scalable molecular dynamics program), computing services (e.g. cloud computing), and storage (TB+ scale file systems). Given these recent developments and much lower cost of entry into HPC, Pharma based Scientific Computing groups are beginning to apply traditional HPC techniques to “non-traditional” (e.g. High Content Screening) and emerging areas of research (e.g. Next Generation Sequencing).

We present here a number of case studies highlighting the current trends of HPC in the pharmaceutical industry and its to impact scientific workflows.

horizontal rule

362 - Challenges of HPC and collaboration opportunities in Pharma

Robert Stansfield PhD, MBA, Michael D Miller PhD. R&D Information Solutions, sanof-aventis U.S., Bridgewater, NJ, United States; Research Business Technologies, Pfizer, Groton, CT, United States

High Performance Computing (HPC) in Pharmaceutical R&D is well established in computational chemistry and computational biology for drug discovery, but is increasingly seeing broader application across research and development. In addition, internal capacity is being supplemented by external “cloud computing”. In consequence, the issues around providing HPC services to in-house scientists in an optimal way for the entire company become more visible and critical. From a technical perspective, HPC requires a holistic view across compute, network, and storage capabilities. From an organizational perspective, effective governance - roles, responsibilities, prioritization and decision making across multiple different groups, operations, and support to end-user scientists - makes all the difference. For these reasons at least, HPC deserves a place in strategic planning. These issues will be explored, as well as the opportunities afforded by pre-competitive collaboration in the Cross-Pharma HPC Forum for identifying best practices.

horizontal rule

376 - Approaches to the treatment of multidrug resistant gram negative infections

Dr. Mark C Noe PhD, Dr. Steven J Brickner PhD, Dr. Thomas Gootz PhD, Michael Huband, Dr. Mark E Flanagan PhD, Dr. John Mueller PhD. Department of Antibacterials Research, Pfizer Global Research and Development, Groton, CT, United States

Each year, over 4.3 million people worldwide contract hospital-based bacterial infections, approximately half of which are caused by Gram negative organisms. The widespread emergence of genes that confer multidrug resistance in these pathogens threatens to undermine the clinical utility of several antibiotic classes, including the fluoroquinolones, cephalosporins, carbapenems and aminoglycosides. Particularly concerning are the extended spectrum beta lactamases, including carbapenemases, which are advancing at an alarming rate and compromise the effectiveness of the most widely used classes to treat Gram negative infections. This talk will review the medical need for new antibacterial agents, some of the challenges associated with discovering new antibiotics, examples of potentially enabling technologies and recent advances in our understanding of privileged targets for antibacterial therapy. An example of one antibacterial drug discovery program will be presented.

horizontal rule

377 - Physicochemical property space of antibiotics

Heinz E Moser PhD. Department of Chemistry, Achaogen, South San Francisco, California, United States

While there have been enormous discovery efforts during the past decades to identify novel classes of antibacterials with clinical utility against Gram-negative pathogens, no first-in-class compounds have been successfully developed to use in humans for roughly half a century, and none is currently in clinical evaluation. Predictably, this lack of success has been met by an increasing prevalence of Gram-negative pathogens causing serious infections in hospitals and critical care settings. Recent outbreaks caused by multi-drug resistant (MDR) or pan-resistant organisms such as K. pneumoniae have been reported recently and leave physicians with few to no treatment options. This presentation focuses on the physico-chemical property space of antibacterial drugs and how an understanding of this property space can assist in the discovery and lead optimization of antibiotics, in particular that of antibacterial drugs active against Gram-negative bacteria. Specific examples will be presented and discussed in detail.

horizontal rule

378 - Physicochemical properties correlated with Gram-negative antibacterial activity of compounds in the Pfizer corporate library

Jeremy T Starr PhD, Rishi Gupta PhD, Veerabahu Shanmugasundaram PhD. Department of Antibacterials and Discovery Technologies, Pfizer Pharmatherapeutics Research and Development, Groton, CT, United States

Correlation of computed physicochemical properties of Pfizer proprietary compounds with their respective E. coli or P. aeruginosa MICs has led to the identification of a physicochemical fingerprint associated with higher probability of whole cell activity with a cytosolic target and presumed passive cell penetration. A computational tool has been designed to calculate a desirability quotient based on these parameters which demonstrates positive differentiation of higher scoring compound classes.

horizontal rule

379 - Combining lessons from computational design of gram positive antibacterials with datamining to aid the design of novel gram negative antibacterials

Charles J. Eyermann. Infection Discovery, AstraZeneca, Waltham, MA, United States

Our approach to address the emergence of resistant bacterial strains has been to identify new chemotypes with a novel mode of action. A significant effort has been made to develop novel inhibitors against gram positive strains like Methicillin-resistant Staphylococcus aureus (MRSA) These efforts have provided a number of key lessons related to target isozyme specificity and drug safety margins. Work to identify novel MurI inhibitors of H. pylori has also provided insights into the physiochemical properties that impact gram negative antibacterial activity. Combining the lessons learned from the above research efforts with datamining of existing gram negative agents provides a framework to aid in the optimization of novel leads for gram negative antibacterials.

horizontal rule

380 - Targeting gram-negative pathogens: Drug design to improve antibiotics permeation?

Eric Hajjar, Amit Kumar, Paolo Ruggerone, Matteo Ceccarelli PhD. Department of Physics, Universita degli Studi di Cagliari and Sardinian, Monserrato, Italy

Gram-negative bacteria are protected by an outer membrane and to function, antibiotics have to diffuse passively through outer membrane channels, known as porins, such as OmpF in E.coli (Pages, J. M. et al. Nat. Rev. Microbiol. 2008, 6, 893). Bacterial strains can modulate their susceptibility to antibiotics by under-expressing or mutating the structures of porins, becoming resistant, in the worst case, to different antibiotics families. These multidrug resistant bacteria are now ubiquitous in both hospitals and the larger community and the resurrection of tuberculosis provides one ominous example highlighting the risk associated with evolved drug resistance (Cars, O. et al. Brit. Med. J. 2008, 337, 726). Moreover, many pharmaceutical companies abandoned this field and no truly novel active antibacterial compounds are currently in clinical trials. A major current dilemma for the pharmaceutical industry is whether to develop drugs for new targets or promote those drugs presently on the market (Weiss, D. et al. Nat. Rev. Drug. Discov. 2009, 8, 533.), identifying bottlenecks of existing antibiotics to suggest chemical modifications. Following such a strategy, we revealed the complete permeation pathways of b-lactams and fluoroquinolones antibiotics through porins using metadynamics simulations and found that experimental results remarkably confirmed the computational predictions. Further, simulations revealed its potentiality to overcome experimental limitations and provide microscopic details on the permeation process (Hajjar, E. et al. Biophys. J. 2010, 98, 569; Mahendran K. et al. J. Phys. Chem. B, IN PRESS).

Here we follow the paradigm for selecting antibiotics with better permeation properties using computer simulations only. Taking advantage of the atomic level of detail that the simulations provide we find that the diffusion of ampicillin through OmpF is governed by a subtle balance of interactions with partners in the porin channel: we draw, for the first time, the complete inventory of the rate-limiting interactions and map them on both the porin and antibiotics structure. Our methodology, which can be conveniently employed to study other porins/antibiotics, allows identifying the functional groups that govern optimal translocation. Such findings will directly benefit rational antibiotics design, by defining for example, some appropriate pharmacophores within high throughput screening strategies.

horizontal rule

381 - Structure-based lead optimization of novel bacterial type II topoisomerase inhibitors

Dr Neil D Pearson, Dr Zheng Yang, Dr Benjamin D Bax, Michael N Gwynn. Department of Antibacterial Chemistry, Infectious Diseases Center of Excellence in Drug Discovery, GlaxoSmithKline Pharmaceuticals, Collegeville, Pennsylvania, United States; Department of Computational and Structural Chemistry, GlaxoSmithKline Pharmaceuticals, Collegeville, Pennsylvania, United States; Department of Antibacterial Microbiology, Infectious Diseases Center of Excellence in Drug Discovery, GlaxoSmithKline Pharmaceuticals, Collegeville, Pennsylvania, United States; Department of Computational and Structural Chemistry, GlaxoSmithKline Pharmaceuticals, Stevenage, Hertfordshire, United Kingdom

The emergence of multi drug resistant Gram negative pathogens is a major concern given the paucity of new therapies in clinical development. GSK has discovered a novel series of inhibitors of both DNA gyrase and topoisomerase IV (NBTIs) with a unique mechanism and no target based cross resistance to established classes of antibacterials including the fluoroquinolones. Optimisation of the Gram positive selective early leads led to new series which afforded good activity versus Gram negative pathogens. GSK subsequently solved the first X-ray structure of a NBTI inhibitor in complex with S.aureus DNA gyrase and DNA providing unprecedented knowledge for lead optimization and the design of novel inhibitors. This talk will discuss how the structural information enabled the medicinal chemistry team to design new subunits as well as illustrating when optimization of interactions with the binding site have been well served by traditional medicinal chemistry.

horizontal rule

382 - Fragment-based development of tetrazole inhibitors against class A beta-lactamase

Yu Chen PhD. Department of Molecular Medicine, University of South Florida, Tampa, FL, United States

The production of beta-lactamases is the predominant cause of resistance to beta-lactam
antibiotics, such as penicillins, in Gram-negative bacteria. Whereas high through-put screening has appeared insufficient for the development of new beta-lactamase inhibitors, fragment-based methods provide an effective approach in sampling novel chemical space in antibiotics discovery. We have previously used fragment-based molecular docking to identify mM range
tetrazole inhibitors against CTX-M Class A beta-lactamase and to subsequently evolve their affinities to ~10 micromolar. New compounds have now been synthesized using the micromolar-affinity tetrazole scaffold, based on some similarities between this scaffold and beta-lactam antibiotics or on X-ray crystal structures of the inhibitor-bound complexes. Other fragment compounds have also been tested to probe regions of the active site not sampled by existing inhibitors. Combining the fragment-based approach with molecular docking, X-ray crystallography and chemical synthesis, we hope to eventually develop these tetrazole compounds into nM inhibitors.

horizontal rule

419 - Utilizing organic
syntheses and microbial iron assimilation processes for the development of new

Prof. Marvin J. Miller. Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN, United States

Pathogenic microbes have rapidly developed resistance to all known antibiotics. To keep ahead in the “microbial war,” extensive
interdisciplinary effort is needed.  Resistance develops primarily to overuse of antibiotics that can result in alteration of microbial permeability, alteration of drug target binding sites, induction of enzymes that destroy antibiotics (ie, beta-lactamases) and
even cause efflux of antibiotics. A combination of chemical syntheses, microbiological and biochemical studies will demonstrate that the known critical dependence of iron assimilation by microbes for growth and virulence can be exploited for the development of new approaches to antibiotic therapy. Iron recognition and active transport relies on the biosyntheses and use of microbe-selective iron chelating compounds called siderophores.

Our studies demonstrate that siderophores and analogs can be used for

-Iron transport-mediated drug delivery (“Trojan Horse”).

-Induction of iron limitation (Development of new agents to block microbial iron assimilation).

-Converting microbe-induced chemistry of iron into a process that is lethal to microbes.

horizontal rule

420 - Utilization of bacterial iron transport systems for
drug delivery

Dr. Ute Moellmann, Dr. Lothar Heinisch. Department of Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knoell Institute, Jena, Germany

The outer membrane permeability barrier is an important resistance factor of bacterial pathogens. In combination with other factors like drug inactivating enzymes, target
alteration and efflux, it can increase resistance dramatically. A strategy to overcome this membrane mediated resistance is the misuse of bacterial transport systems. Most promising systems are those for iron transport. They are vital for virulence and survival of bacteria in the infected host, where iron depletion is a defense mechanism against invading pathogens. We synthesized biomimetic siderophores as shuttle vectors for active transport of antibiotics through the bacterial membrane. Structure activity relationship studies resulted in ampicillin siderophore conjugates highly active against Pseudomonas aeruginosa and other Gram-negative pathogens, which play a crucial role in destructive lung infections in cystic fibrosis patients and in severe nosocomial infections. The mechanism of action, in vitro and in vivo efficacy were demonstrated.

horizontal rule

421 - Activity of BAL30072, a novel siderophore sulfactam

Prof. Malcolm G P Page PhD. Basilea Pharmaceutica International Ltd, Basel, Switzerland

BAL30072 is a monocyclic b-lactam antibiotic belonging to the sulfactams. BAL30072 showed potent activity against multidrug-resistant (MDR) Pseudomonas aeruginosa and Acinetobacter spp., including many carbapenem-resistant strains. BAL30072 was bactericidal against both Acinetobacter spp. and P. aeruginosa, even against strains that produced metallo-b-lactamases that conferred resistance to all other b-lactams tested, including aztreonam. It was also active against many species of MDR Enterobacteriaceae, including isolates that had a class A carbapenemase or a metallo-b-lactamase. Unlike other monocyclic b-lactams, BAL30072 was found to trigger spheroplasting and lysis of E. coli, rather than the formation of extensive filaments. The basis for this unusual property is its inhibition of the bifunctional penicillin-binding proteins PBP 1a and PBP 1b in addition to its high affinity for PBP 3, which is the target of monobactams such as aztreonam.

horizontal rule

422 - Targeting bacterial multidrug efflux pumps

Olga Lomovskaya PhD, Scott Hecker PhD. Mpex Pharmaceuticals, San Diego, California, United States

Powerful techniques of modern drug discovery such as comparative genomics, ultra-high-throughput screening, structure-guided drug design and combinatorial chemistry have been used to identify novel targets and optimize novel, preferentially broads-spectrum antibiotics to combat antibiotic resistance. However, despite the fact that these employed targets are broadly conserved in bacteria, no drug candidate advanced using these methods has demonstrated relevant activity against most gram-negative bacteria. Thus, the outlook for new antibiotics appears unchanged from present in that of all approved classes of antibiotics, representatives of only three classes (fluoroquinolones, b-lactams and aminoglycosides) have clinical utility for the treatment of gram-negative bacteria such as Pseudomonas aeruginosa.

Multidrug resistance (MDR) efflux pumps play a prominent and proven role in gram-negative intrinsic resistance. Moreover, these pumps also play a significant role in acquired clinical resistance. Together, these considerations make efflux pumps attractive targets for inhibition in that the resultant efflux pump inhibitor (EPI)/antibiotic combination drug should exhibit increased potency, enhanced spectrum of activity and reduced propensity for acquired resistance. To date, at least one class of broad-spectrum EPI has been extensively characterized. While these efforts indicated a significant potential for developing small molecule inhibitors against efflux pumps, they did not result in a clinically useful compound. Stemming from the continued clinical pressure for novel approaches to combat drug resistant bacterial infections, a second-generation programs have been initiated based on a number of recent developments in the field, including structural elucidation of all three individual components of MDR efflux pumps and ligand-based insights into the mechanism-of-action of drug transporters. Building upon previous efforts, these new approaches show early promise to significantly improve the clinical usefulness of currently available and future antibiotics against otherwise recalcitrant gram-negative infections.

horizontal rule

423 - Interaction of b-peptides with membranes

Jagannath Mondal, Dr. Xiao Zhu, Prof Qiang Cui, Prof Arun Yethiraj. Department of Chemistry, UW Madison, Madison, WI, United States

A new class of anti-microbial agents named b-peptides have recently been reported that show interesting sequence dependent activity and selectivity. In this work we investigate the interaction of these molecules with a model membrane in an effort to obtain physical insight into the mechanism of anti-microbial activity. We investigate the effect of sequence on the adsorption of these b-peptides to a membrane using computer simulations with both implicit and explicit solvent and membrane. Two classes of molecules are investigated: 10-residue oligomers of 14-helical sequences, and four sequences of random co-polymeric b-peptides. The oligomers of interest are two isomers, globally amphiphilic (GA) and non-GA, of two 10-residue 14-helical sequences. The penetration of the molecules into the membrane and the orientation of the molecules at the interface depend strongly on the sequence. We attribute this to the propensity of the b-phenylalanine (bF) residues for membrane penetration. The membrane adsorption studies are consistent with potential of mean force calculations using the same model. Results are similar when the membrane and solvent are treated in an implicit or explicit fashion. For the four sequences of random-co-polymeric b-peptides, the extent of stabilization of free-energy correlates with their efficiency to segregate the hydrophobic and cationic residues. The simulations are in qualitative accord with experiments on the minimum inhibitory concentration, and suggest simple strategies for the design of candidates for anti-microbial beta-peptides.

horizontal rule

424 - Molecular modeling of beta-lactamase inhibitors

Sookhee Nicole Ha, T. Blizzard, H. Chen, S. Kim, J. Wu, K. Young, Y. Park, A. Ogawa, S. Raghoobar, R. Painter, N. Hairston, S. Lee, A. Misura, T. Felcetto, P. Fitzgerald, N. Sharma, Jun Lu, E. Hickey, J. Hermes, M. Hammond. Merck & Co., Inc, Whitehouse Station, New Jersey, United States

Resistance against new antibiotics usually appears within few years after their marketing. Expression of the beta-Lactamase is the most common mechanism of resistance to the beta-Lactam antibiotics in Gram-negative bacteria. To maximize delaying the drug resistance, we have developed a beta-Lactamase inhibitor for combination therapy. We report our efforts on optimization of bridged mono-bactam analogs.

horizontal rule

425 - Assembly and function of large Gram-negative bacterial machines studied by molecular simulation integrated with experimental data

Prof. Matteo Dal Peraro. Institute of Bioengineering, Swiss Federal Institute of Technology, EPFL Lausanne, Lausanne, Switzerland

Gram-negative bacteria have evolved several means to attack their hosts and defend themselves from external attacks. Here, we use molecular simulations closely integrated with new experimental data to dissect the structural and dynamic features of the assembly mechanism of three large bacterial machines.

(i) We propose a four-helix model of E.coli PhoQ two-component system transmembrane domain, which is consistent with new experimental cross-linking data, and can explain the bacterial response to divalent cations and antimicrobial peptides. (ii) We study, with the aid of site-directed mutagenesis, the role of the pore-forming loop and the C-terminal pro-peptide for the heptamerization of pore-forming toxin aerolysin from A.hydrophila. Finally, (iii) we model the needle formation and regulation for the type III secretion system from Y.enterocolitica (injectisome) based on fresh genetic and mutagenesis results.

The full comprehension of the structural assembly of these bacterial machines can contribute, on one side, to unveil their fundamental biological function, and, on the other, will permit to develop rational strategies to specifically interfere with them for therapeutic intervention.

horizontal rule

426 - Design of potent, broad-spectrum AccC inhibitors

Li Xiao PhD, Cliff Cheng, Gerald W Shipps, Aileen Soriano, Peter Orth, Todd Black. Merck Research Laboratory, Kenilworth, New Jersey, United States; Merck Research Laboratory, Cambridge, Massachusetts, United States

The biotin carboxylase (AccC) is part of the multi-component bacterial acetyl coenzyme-A carboxylase (ACCase) and is essential for pathogen survival. We identified and validated AccC as an antibacterial drug target for our in-house AS/MS screen. An initial hit, 2-(2-chlorobenzylamino)-1-(cyclohexylmethyl)-1H-benzo[d]imidazole-5-carboxamide (1), was identified, and x-ray crystallography and computer modeling were utilized in its optimization. In this presentation we report our biology, chemistry and structure based drug design efforts in discovering a novel series of AccC inhibitors, exemplified by (R)-2-(2-chlorobenzylamino)-1-(2,3-dihydro-1H-inden-1-yl)-1H-imidazo[4,5-b]pyridine-5-carboxamide (2). These inhibitors are potent and selective for bacterial AccC with good cell-based activity against a sensitized strain of E. coli (HS294 E. coli).

horizontal rule

433 - Exploring protein conformational changes with accelerated molecular dynamics in NAMD

Dr. Yi Wang, Prof. J. Andrew McCammon. Chemistry and Biochemistry, Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA, United States

Accelerated molecular dynamics (aMD) enhances conformational space sampling by reducing energy barriers separating different states of a system. Here we present the implementation of aMD in the highly efficient parallel molecular dynamics program NAMD and offer exemplary applications performed on systems up to 60,000 atoms. Our results indicate that while providing significantly enhanced sampling, aMD simulations have only a small overhead in comparison to classical MD simulations. A 10-ns aMD simulation performed on the bacterial enzyme RmlC successfully revealed its transition from apo- to holo- state, which is not observed in a 50-ns classical MD simulation. We demonstrate that aMD can be applied efficiently to explore the conformational changes of complex biomolecules, especially when little is known about their alternative structures and transition reaction coordinates.

horizontal rule

434 - Pseudo-chair conformation of carboxyphosphate

Venkata S Pakkala, Steven M Firestine, Jeffrey D Evanseck. Department of Chemistry and Biochemistry, Duquesne University, Pittsburgh, Pennsylvania, United States; Eugene Applebaum College of Pharmacy and Health Sciences, Wayne State University, Detroit, Michigan, United States

For over 40 years, carboxyphosphate has been postulated as a key intermediate in several carboxylase enzymes. Unfortunately, this compound is extremely unstable (t1/2 of 70 ms), thus precluding direct experimental studies. Therefore, we have utilized high level ab inito (MP2 and CCSD(T)), DFT (B3LYP, BB1K, M05-2X, M06-2X and MPW1K) and ONIOM(DFT:AMBER) methods to investigate the structure and energetics of carboxyphoshpate in vacuum, in a PCM continuum solvation model and in the active site of N5-CAIR synthetase, an enzyme shown to proceed via the formation of carboxyphosphate. We report here, for the first time, that carboxyphosphate adopts a “pseudo-chair” conformation and calculations reveal that this conformation is found to be the most stable in vacuum, solvent and the active site. This study has implications in the development of the carboxyphosphate analogs as potential inhibitors, in understanding the instability of the compound, and in elucidating the mechanisms of enzymes utilizing this compound.

horizontal rule

435 - Analysis of vibrational spectra of polypeptides in terms of localized vibrations

Dr. Christoph R Jacob, Prof. Markus Reiher. Center for Functional Nanostructures, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany; Laboratorium für Physikalische Chemie, ETH Zurich, Zurich, Switzerland

While nowadays efficient quantum chemical methods allow for the calculation of vibrational spectra of large (bio-)molecules, such calculations also provide a large amount of data. In particular for the vibrational spectra of polypeptides, a large number of close-lying normal modes contribute to each of the experimentally observed bands, which hampers the analysis of the calculated spectra considerably.

Here, we discuss how vibrational spectra obtained from quantum chemical calculations can be analyzed by transforming the calculated normal modes contributing to a certain band in the vibrational spectrum to a set of localized modes [1]. We demonstrate that these localized modes are more appropriate for the analysis of calculated vibrational spectra of polypeptides and proteins than the delocalized normal modes.

We apply this methodology to investigate the influence of the secondary structure on infrared and Raman spectra of polypeptides [2]. As a model system, a polypeptide consisting of twenty (S)-alanine residues in the conformation of an a-helix and of a 310-helix is considered. In particular, we show how the use of localized modes facilitates the analysis of the positions and of the total intensities of the bands in the vibrational spectra, and how the couplings between localized modes determine the observed band shapes. Finally, this analysis is applied to analyze the Raman optical activity (ROA) spectra of these helical polypeptides, which provides a detailed picture of the generation of ROA bands in proteins [3].

[1] Ch. R. Jacob and M. Reiher, J. Chem. Phys. 130 (2009), 084106.
[2] Ch. R. Jacob, S. Luber, M. Reiher, J. Phys. Chem. B 113 (2009), 6558.
[3] Ch. R. Jacob, S. Luber, M. Reiher, Chem. Eur. J. 15 (2009), 13491.

horizontal rule

436 - Conformational coupling between LOV and kinase domains in phototropins: A computational perspective

Dr. Marco Stenta PhD, Prof. Matteo Dal Peraro PhD. Department of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Vaud, Switzerland

Phototropins constitute an important class of plant photoreceptors playing key roles in many physiological responses to light, including phototropism, chloroplast movement and stomata opening. Phototropins feature, along with a serine-threonine kinase domain, two LOV (light-, oxygen- or voltage-regulated) domains, each binding a FMN (flavin mononucleotide). Blue light affects the kinase domain by triggering, in the LOV domain, the formation of a covalent intermediate between the FMN cofactor and a nearby cysteine residue. Despite X-ray structures provided solid ground for mechanicistic hypothesis, the molecular details of the inter-domain communication process are still unknown. By using accurate QM/MM (quantum mechanics/molecular mechanics) calculations we investigated the formation/breaking of the FMN/Cys covalent intermediated. We investigated the coupling between the LOV and kinase domains by means of long MD (molecular dynamics) simulations and detailed PES (potential energy surface) explorations (MM level).

Zoltowski, B. D.; Vaccaro, B.;
Crane, B. R. Nat Chem Biol 2009, 5, 827-834.

horizontal rule

437 - Conformational sampling of macrocycles through accelerated molecular dynamics simulation

S. Roy Kimura Ph.D.. Department of Computer Assisted Drug Design, Bristol Myers Squibb, Wallingford, CT, United States

Macrocyclization is a strategy used in medicinal chemistry to lock a molecule in its bioactive conformation. The resulting decrease in conformational flexibility often leads to higher potencies due to the reduced entropy loss upon binding, and sometimes improved physical chemical properties such as bioavailability. Conformational searches of macrocycles are usually performed by temporary ring opening and Monte Carlo (MC) sampling to overcome the energy barriers between low energy states. However, widely available MC algorithms can only be used in conjunction with simplified continuum solvents such as dielectrics or Generalized Born-related models. In this study, we assess the use of molecular dynamics simulation in explicit solvent with periodic high-temperature pulsing as a method to overcome the characteristic energy barriers of macrocycles. The pros and cons of this methodology versus MC sampling are discussed.

horizontal rule

Small Chemical Businesses Division

horizontal rule

44 - Best practices in scientific computer modeling

Dr. Masha V Petrova. Department of Research, MVP Modeling Solutions, LLC, Springfield, IL, United States

Computer modeling can help research organizations save a lot of money and time, if the modeling program is implemented correctly. Are you sure that your research group is making the most out of computer modeling? Attend this session to learn:

How companies and research groups tend to shoot themselves in the foot when setting up a computer modeling project;

What measures you can take to make sure that you don't spend a lot of time going down the wrong path or purchasing the wrong software;

The best way to take a scientific/engineering problem and translate it into computer modeling terms.

horizontal rule

45 - New wave of computational tools for the leads selection in biomedical industry

Dr. Aurora D. Costache PhD, Prof. Doyle D. Knight PhD, Prof. Joachim Kohn. New Jersey Center for Biomaterials, Rutgers - The State University of New Jersey, Piscataway, NJ, United States; Mechanical and Aerospace Engineering, Rutgers - The State University of New Jersey, 98 Brett Rd, Piscataway, NJ, United States

The high cost and intensive labor of developing new polymeric biomaterials for tissue engineering, drug delivery and other medical applications highlights the need for a change in the discovery process. As large corporations continuously look to cut costs, individual contractors or small businesses that can provide them with lead materials for given biomedical applications are expected to thrive. With this business niche in mind, the New Jersey Center of Biomaterials (NJCBM) created “Biomaterials StoreTM”- a computational tool specifically designed for development of new biomaterial leads. This integrated database and datamining tool allows the user to create/use large databases of virtual polymer libraries and to apply modeling tools to predict relevant polymer properties and biological responses to biomaterials. Based on the requirements for a specific application, the most promising candidates are selected for synthesis and complete experimental evaluation, thus accelerating the discovery process and cutting costs at the same time.

horizontal rule

46 - Computational modeling of soft condensed matter and biomaterials

Dr. Jayeeta Ghosh. New Jersey Center for Biomaterials, Rutgers, Piscataway, NJ, United States

Computational modeling helps understand chemistry starting from quantum level to process dynamics length and time scales.

This presentation will discuss the application of atomistic and mesoscale modeling for soft condensed matters as well as combinatorial computational approach to biomaterials invention. The main objective is to show the relevance and importance of detailed molecular modeling versus approximate surrogate modeling.

Molecular modeling of soft condensed matters including glasses, polymers and lipids will be discussed in the context of industrial application and drug delivery.

Quantitative structure property relation (QSPR) modeling approach for identifying suitable biomaterials starting from a large combinatorial library of polymers for tissue engineering and biomedical applications, can help reduce the experimental cost and time and advance business.

horizontal rule

47 - First-principles computational approach for the characterization and design of novel organic electronic materials

Roel S Sanchez-Carrera PhD, Prof. Alan Aspuru-Guzik. Deparment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, United States

Organic electronics have recently emerged as a technology that will revolutionize the way in which we visualize information, generate energy from renewable resources, and communicate with people around the world. Thus, various international academic laboratories and major chemical companies are actively involved in the fine-tuning and development of the molecular materials used in the field of organic electronic devices.

To highlight the potential of current computational methodologies, in this study, on the basis of quantum chemistry calculations and molecular dynamics simulations, we investigate the microscopic charge transport parameters of one of the most outstanding candidates, the dinaphtho-thieno-thiophene organic semiconductor. The good agreement found in this work between observed and computed properties, stresses the importance of using computational chemistry techniques to identify suitable molecular materials for the emerging field of organic electronics.

horizontal rule

48 - Recent advances in structure-based drug design

Woody Sherman. Schrodinger, New York, NY, United States

Structure-based drug design is an important part of the drug discovery process and recent methodological advancements, as well as increased computing resources have resulted in a growing number of success stories. In this presentation, we highlight some of the most promising methods and applications, including the accurate assessment of water free energies, incorporation of protein flexibility into docking algorithms, and structure-based modeling of GPCRs. In addition, we describe the most significant limitations in the existing methods and provide a development roadmap to overcome these limitations.

horizontal rule

49 - Computer simulation of ligand binding to a flexible protein target

Dr Philip W Payne. Consulting, InterBiotics LLC, Sunnyvale, CA, United States

A research-based biotechnology or pharmaceutical business must focus capital and labor on the experiments that will most rapidly discover or refine intended products. Computer simulations are useful adjuncts to an experimental program when they provide structural insights that suggest how a protein or ligand structure should be modified to improve a measured outcome - enzymatic rate, receptor activation, or ligand affinity; the successful modeling program means that fewer proteins need to be mutated or fewer ligands synthesized during a product development campaign.

Important biological functions often entail large displacements of protein main chains or loops, and industrially useful modeling of protein structure needs to assess such motion and its impact on protein-ligand affinity or ligand-directed signaling. Unfortunately, there is little commercial software that can cost-effectively predict important protein motions. Therefore we have developed a strategy (Inverse Docking) for analyzing main chain movements that conform a G-Protein Coupled Receptor (Dopamine D2S) to a nanomolar D2 antagonist, spiperone.

horizontal rule

50 - FAST Predictions of protein stability and flexibility

Prof. Dennis R. Livesay, Dr. Hui Wang, Prof. Donald J. Jacobs. Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States; Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC, United States

Accurate descriptions of stability and flexibility are necessary for a complete understanding of protein structure and function. As such, we have developed “FAST” to provide a Flexibility And Stability Test on proteins in aqueous solutions. Herein, all intramolecular interactions are assigned enthalpy and entropy values. Total enthalpy is the sum of all components, whereas efficient graph-rigidity algorithms account for entropy nonadditivity. FAST has been designed from the ground-up to account for dependence on temperature, pressure, pH, salt concentration, etc. As such, free energy landscapes as a function of multiple thermodynamic variables can be quickly calculated. FAST also calculates a wide variety of mechanical properties related to structural rigidity and flexibility with virtually no increase in computational expense. This talk will summarize our general approach, and recent improvements in regards to speed and accuracy. Support for this work has been from grants from the NIH (R01-GM073082) and the Charlotte Research Institute.

horizontal rule

51 - Patentability of computer simulations and models

Noah Malgeri. Law Office of Noah V. Malgeri, Uxbridge, Massachusetts, United States

In recent years, several companies and individuals, including major industry leaders, have filed patent applications for computer models, particularly in the area of control systems, project management simulations and for modeling pathologies. This presentation will address the subject of scientific software patentability.

horizontal rule