#237 - Abstracts

ACS National Meeting
March 22-26, 2009
Salt Lake City, UT

 1 A career working with engineers and scientists: Lessons learned
Judith Siess, Information Bridges Intl, 830 Sedgegrass Drive, Champaign, IL 61822

In 1982 I completed my M.S.L.I.S. thesis at the University of Illinois at Urbana-Champaign on “Information Needs and Information-Gathering Behavior of Research Engineers.” After receiving my degree, I worked for 14 years with engineers and scientists. Since then I have kept up with the profession through electronic lists, blogs and personal networking. What lessons have I learned? First, when faced with a problem, engineers will first go to their own resources (notes, private library, etc.), then to a colleague, and to the librarian only as a last resort. A good, proactive librarian can decrease the time “wasted” before coming to the library. Practice “reference by walking around.” Second, how do you teach them to use online or print resources? Make use of “the teachable moment” instruction at the point of need. Third, if you “Feed them and they will come.” Especially with younger engineers, providing food will almost ensure that they will come to an open house and greatly increase their attendance and instruction sessions. And fourth, to encourage them to use the library offer more than standard library services. Provide maps, computer magazines, comfortable places to read or work, and chocolate. Even if you can't change the way engineers and scientists think, you can change the way they think about information, libraries, and librarians

 2 Fulfilling specialized information needs of engineers
Diana Bittern, dbittern@knovel.com, Product Management, Knovel, 489 Fifth Avenue, New York, NY 10017

It's becoming increasingly evident that people engaged in applied engineering have significantly different needs from researchers and information consumers in other disciplines, such as health. Public search engines, like Google, are still popular destinations for gathering general information or surveying the landscape, but more often, the engineer's demands center on fast and reliable access to highly specialized and specific data. This information typically does not reside in journals, but is more likely found in reference texts, handbooks and databases. Engineers spend upwards of 20% of their time in Excel! Giving engineers the tools for simulating data models, manipulating calculations, and comparing process and materials specifications are key facilitators in satisfying their demands for pinpointing relevant, reliable information. This paper provides a more in-depth look at the findings of user interviews and two recent studies of hundreds of ASME and AIChE member engineers conducted on behalf of Knovel. The study focuses attention on how respondents currently work with information and what they are looking for in process- and productivity-enhancing information tools

 3 Meeting the information needs of chemical engineering students
Ann D. Bolek, bolek@uakron.edu, Science-Technology Library, The University of Akron, Akron, OH 44325-3907

Chemical engineering students need many of the same resources that chemistry students do, but, in addition, need sources for bulk chemical prices, process flow diagrams, vapor-liquid equilibria, thermodynamic data, loss prevention, and business information about their chemicals. Their favorite sources are Perry's Chemical Engineers' Handbook and the Kirk-Othmer Encyclopedia of Chemical Technology. Bulk chemical prices used to be found in Chemical Market Reporter, but it has changed title to ICIS Chemical Business Americas, and has changed its focus. Process flow diagrams can be found in Kirk-Othmer, but also Kent and Reigel's Handbook of Industrial Chemistry and Biotechnology, Ullmann's Encyclopedia of Industrial Chemistry, McKetta's Encyclopedia of Chemical Processing and Design, patents, and some journal articles. Vapor-liquid equilibria can be found in Gmehling's Vapor-Liquid Equilibrium Data Collection, Knovel's Critical Tables, and Beilstein CrossFire. The latter two sources are also good for thermodynamic data. Lee's Loss Prevention in the Process Industries, available in book format or online in Knovel, is a good source for loss prevention. Business information can sometimes be found in the same databases that business students use, such as Lexis-Nexis, Business Source Complete, ABI/Inform, and Business and Industry, although two online databases may be more targeted: Chemical Business NewsBase and Chemical Industry Notes. But, these latter two databases are not easily available to most students.

 4 The history, evolution, and adoption of the IUPAC InChI/InChIKey
Stephen R. Heller, srheller@nist.gov1, Stephen E. Stein, steve.stein@nist.gov1, Dmitrii V. Tchekhovskoi, dmitrii.tchekohovskoi@nist.gov1, Igor V. Pletnev, pletnev@analyt.chem.msu.ru2, and Alan D. McNaught, mcnaught@ntlworld.com3. (1) Physical and Chemical Properties Division, NIST, Gaithersburg, MD 20899-8380, (2) Chemistry Department, Lomonosov Moscow State University, GSP-3 Vorobyovy Gory, Moscow, 119899, Russia, (3) Cambridge, United Kingdom

The IUPAC InChI is an open source, public domain, international standard for representing a defined chemical structure. This presentation will describe the history, evolution, adoption and use of the IUPAC InChI/InChIKey project from its initial beginnings in 1999 to its current state of use, adoption, and acceptance by the worldwide chemical community. The remaining portion of this symposium will be devoted to "case studies" from commercial, non-profit, and government organizations who are using InChI/InChIKey and a panel session at the end for questions and discussion.

 5 Going a mile InChI by InChI: Enabling online chemistry at ChemSpider
A J Williams, tony@chemspider.com, ChemZoo, 904 Tamaras Circle, Wake Forest, NC 27587

The task of finding chemical information online can be daunting since even the most rudimentary query on Google can provide tens to hundreds of thousands of links to peruse. While there has been an increase in the number of online chemical structure databases there has not been a central online resource allowing integrated chemical structure-searching of chemistry databases, chemistry articles, patents and web pages, such as blogs and wikis, until now. ChemSpider provides a significant knowledge base and resource for chemists working in different domains. From the perspective of the InChI identifiers this project can be considered to be a success story since ChemSpider has used both for the development of the database and the provision of fast searching routines. ChemSpider has provided web services for both InChI generation and searching, leading to a proliferation of InChI in the web-based domain of chemistry. This talk will provide an update of ChemSpider's functionality.

 6 Development and use of a molecular structure ontology
Henry E. Dayringer, henry.e.dayringer@pfizer.com1, Leiming Zhu, leiming.zhu@pfizer.com1, Matthias Nolte, matthias.nolte@chemitment.com2, and Chris L. Waller, chris.waller@pfizer.com1. (1) Chemistry Informatics, Pfizer, Inc, 575 Maryville Center Drive, St. Louis, MO 63141, (2) chemITment, Inc, 47 Lake Road, Amston, CT 06231

As the number of compound structures of potential interest continues to grow, so does the problem of correlation of those compounds. Both internal and external compounds of interest must be indexed such that the corresponding and closely related compounds can be quickly found and reported to interested client software applications. To address this growing problem at Pfizer we have used the InChI encoding schema for chemical structures. Uniquely designed to be segmented at various levels of specificity, the InChI makes the problem of finding, for example, stereo chemically related structures simpler and faster than other structure representations. We have processed all of the compounds in our files from both internal and external origins through a unique tautomer canonicalization followed by generation of the InChI string for each separate non-bonded fragment of the input molecular structure. Each unique InChI string is registered to an Oracle database with composition records, as required, for each registered compound component pointing to these unique InChI records. Using a file of known salt and solvent fragments, components of the molecule are assigned types of parent, salt or solvent at the time of registration. A web based service was created for client applications that can return, via the service, information about the compounds related to any given compound ID, structure, SMILES, or InChI string. Related compounds found in the database are categorized by match types such as Identical Parent, Different Stereo (mirror image), or Different Isotopic Labeling. Using the results of this service, client applications are able to provide their users with detailed information about compounds closely related to any compound that is otherwise identified by or of interest to the user, assisting them in fully exploring the known internal and external data around a compound or compound series of interest.

 7 Project prospect and the InChI
Colin R Batchelor, batchelorc@rsc.org, Royal Society of Chemistry, Thomas Graham House, Milton Road, Cambridge CB4 0WF, United Kingdom

The award-winning Project Prospect was launched in early 2007 and would not have been possible without having InChIs to represent chemical compounds, both as a compact representation within XML and as a transport medium over the web. We describe how the existence of InChI provided the impetus to set up Project Prospect in the form it took, how we have built it into our workflows, the needs that InChI doesn't satisfy and how we are dealing with those, as well as giving an insight into our staff development programme. We hope sharing our experiences will speed the uptake of InChI among participants.

8 InChI as a publishing application
Graeme Whitley, John Wiley & Sons, 111 River Street, Hobokn, NJ 07030-5774 and Bernd Berger, John Wileys & Sons, Boschstrasse 12, Weinheim, D-69469, Germany

Science publishers are composed of hundreds of brands and products sourced from thousands of different authors and many different software systems. A critical part of the publishing process is handling these different inputs efficiently and producing a consistent product. A second challenge, unique to a chemistry publisher, is that many of our publications contain novel compounds that have not yet been registered in any compound registry, and therefore do not have a unique identifier associated with them. Wiley was one of the first publishers to employ InChI and a precursor of the InChI key as a publishing solution. Our publishing requirements included needing a compound identifier, a means of quickly identifying replicate records for the same compound, and a means of quickly matching look up requests. We describe our approach and experience deploying InChI in a real world publishing environment.

9 A systematic nomenclature for codifying engineered nanostructures
Warren C. W. Chan, warren.chan@utoronto.ca, Institute of Biomaterials and Biomedical Engineering, University of Toronto, 164 College Street 408, Toronto, ON M5S 3G9, Canada and Darcy Gentleman, dgentlem@chem.utoronto.ca, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, ON M5S3H6, Canada.

Nanotechnology's growing applications are fueled by the synthesis and engineering of myriad nanostructures, yet there is no systematic naming and/or classification scheme for such materials. This lack of a coherent nomenclature is confusing the interpretation of data sets and threatens to hamper the pace of progress and risk assessment. A systematic nomenclature that encodes nanostructures' overall composition, size, shape, core and ligand chemistry, and solubility is presented. A typographic string of minimalist field codes facilitates digital archiving and searches for desired properties. This nomenclature system could also be used for nanomaterial hazard labeling.

10 Nanotech nomenclature in environmental sciences
Gopal Coimbatore, gopal.coimbatore@tiehh.ttu.edu, Institute of Environmental and Human Health, Texas Tech University, Box 41163, Lubbock, TX 79416

A report published in C&EN in 2005 on nanotech terminology said this, “It's basically been a free-for-all in the world of nanotech terminology. Quantum dots, nanoshells, nanopeapods—nanoscientists have been inspired by everything from Polish dumplings to Inuit landmarks when naming new nanomaterials.” Three years since, the state of flux hasn't gone away, although several recent efforts have made the picture clearer and helped crystallize a rudimentary framework on nanotech nomenclature. The derivative area of science (and of nanotechnology), namely environmental sciences, has as usual displayed a phase lag in fructifying its nomenclature. Yet, the pace of progress is such that by the time this presentation finally lights up on the screen, the author's preliminary thoughts on the topic would be obsolete. Despite that, we will make an attempt to summarize the most recent approaches in nomenclature in environmental sciences.

 11 Nanotechnology at CAS: Size matters
Roger J. Schenck, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202

Literature related to the field of nanotechnology is growing rapidly – 1.8 percent of the records covered by Chemical Abstracts Service in 2000 contained the term “nano”; in 2005 that percentage had grown to 4.9 percent, and in 2007 it was over 8 percent. As expected, CAS scientists are seeing a quickly evolving nomenclature in this relatively new field of science. This presentation will discuss some of the examples and problems encountered in processing nano information, and solutions that CAS is adopting for indexing and substance representation. Specific examples will be illustrated.

12 Patenting nanotechnology: Correlating size and language to describe nanotech inventions
Jeffrey A. Lindeman, jlindeman@nixonpeabody.com, Nixon Peabody LLP, Suite 900, 401 9th Street NW, Washington, DC 20004-2128

The words used to describe and claim the invention determine the scope of patent protection for an invention. The words used to describe an invention do not necessarily change with size. They can be the same for the macroscale or for the nanoscale. Common words may be “too big” for nanotechnology and simply using the prefix “nano” may not be sufficient to accurately describe an invention. This presentation considers how words in patent claims are interpreted and how that impacts nanotech inventions. As nanotechnology continues to develop, so does nanotechnology patent practice. This presentation considers the applications of patent law and practice to nanotechnology and discusses how to use patent strategy to achieve effective robust nanotech patents.

13 What is nanotechnology?
Peter Hartwell, peter.hartwell@hp.com, Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304

Nano. A popular culture term, a marketing term, and a magical key to unlocking research funding. But what makes something nanotechnology? The answer can depend on your background as much as your intentions. A few things are certain. The field is an interdisciplinary meeting of scientists, engineers, and companies. Diverse backgrounds create a wide range of interpretations, expectations, and conventions. The concepts are similar but the descriptive language can be quite different. Attempting to define nanotechnology provides a great jumping off point to begin to tame the challenge of defining standard terminology for this field.

I will explore the definition of nanotechnology from the perspective of a MicroElectroMechanical Systems (MEMS) engineer working in a group of self-proclaimed nanotechnologists. Starting with our “top down” techniques for device fabrication including nano-imprint lithography, atomic layer deposition, reactive ion etching, and plasma enhanced chemical vapor deposition I will introduce our language. This contrasts with synthesis techniques from chemistry or biology known as “bottom up” where molecules or systems are assembled up from the molecular level. I will look for the common ground at a high level and drill down on some topics to illustrate the current state and offer suggestions on how to get the diverse communities into using the same terminology. I will use examples from the ACS Nanotations wiki to highlight how an online community can be used to help develop standard terminology.

14 Use and utility of InChI in PubChem
Evan Bolton, bolton@ncbi.nlm.nih.gov, National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894

PubChem is a free, online public information resource located at the National Center for Biotechnology Information (NCBI). The system provides information on the biological properties and activities of chemical substances, linking together results from different sources on the basis of chemical structure and/or chemical structure similarity. PubChem utilizes InChI as a means of structure input and output. This presentation will detail ways in which InChI may be used in conjunction with the PubChem resource, including data integration and data mining aspects.

15 InChI keys as standard global identifiers in chemistry web services
Russ Hillard, russ.hillard@symyx.com and Keith T Taylor, keith.taylor@symyx.com. Product Marketing, Symyx Technologies inc, 2440 Camino Ramon, San Ramon, CA 94583

The role of calculated compound identifiers is increasingly important as large collections of chemical structures are made available in online systems. The ability to correlate molecules and reactions across multiple sources is critically important to high performance delivery of related records from different sources. Historically, the progression from topologically derived text strings (WLN, SMILES) to connection tables (molfiles, SDfiles, RDfiles) and derived values (SEMA, InChI, NEMA and others) continues to bring us closer to the ultimate goal of a unique, globally standard, computed compound identifier. The role of Inchi keys and related values in delivering high performance access to large datasets of chemistry related information via web services will be examined.

16 Chemical journal publishing in an online world
Jason Wilde, Nature Publishing Group, 4 Crinan Street, London, N19XW, United Kingdom

Online searchable databases of structures, 3-D imagery and searchable formulae take chemistry information light years beyond what the printed page made possible. Chemists have also been amongst the most active in embracing blogging, and other web 2.0 initatives such as open lab notebooks. In this environment, the challenge is on for publishers to deliver journals that go way beyond traditional publishing models. Nature Publishing Group, in launching a new chemistry journal, has been able to look at the current status of chemistry publishing and develop a new generation of tools and approaches embracing these new opptunities. This talk will outline these new developments and pose questions for the future.

17 InChI/InChIKey vs. NCI/CADD Identifiers: A comparison
Markus Sitzmann, sitzmann@helix.nih.gov, Laboratory of Medicinal Chemistry, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, 376 Boyles St, Frederick, MD 21702, Igor V. Filippov, igorf@helix.nih.gov, Laboratory of Medicinal Chemistry, SAIC-Frederick, Inc., NCI-Frederick, 376 Boyles St, Frederick, MD 21702, and Marc C. Nicklaus, mn1@helix.nih.gov, Center for Cancer Research, National Institutes of Health, National Cancer Institute, Laboratory of Medicinal Chemistry, 376 Boyles Street, Frederick, MD 21702

We present a comparison of the IUPAC InChI/InChIKey Identifiers with our CACTVS hashcode-based NCI/CADD Identifiers. Both types of identifiers are calculated in the context of, and are available in, our Chemical Structure Lookup Service (CSLS) available at http://cactus.nci.nih.gov/lookup, which currently indexes approx. 57 million chemical structure records representing about 40 million unique chemical structures. Like the IUPAC identifiers, our NCI/CADD Identifiers have been specifically designed to enable a fine-tunable yet rapid compound identification even in very large datasets. They can be set to be sensitive to a variety of chemical features such as tautomerism, different resonance structures drawn for a charged species, and fragments such as counterions. We will discuss the differences in structure identification between the NCI/CADD and the IUPAC identifiers that we have observed in this very large structure set, and what these discrepancies can tell us about definition and design, scope, limitations and problems in either set of identifiers.

18 Combining quantitative data and qualitative knowledge to score reaction energies
Chloe-Agathe Azencott, cazencot@ics.uci.edu1, Matthew A. Kayala, mkayala@ics.uci.edu1, and Pierre Baldi, pfbaldi@uci.edu2. (1) Bren School of Information and Computer Science, IGB at University of California, Irvine, 6210 Donald Bren Hall, Irvine, CA 92697, (2) Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697

Predictive scoring functions based on statistical learning techniques generally require large amounts of quantitative training data. Unfortunately this numerical knowledge is usually unavailable or prohibitively expensive to obtain. For practical application however, experts often only require qualitatively precise results to define accurate ranking orders. Inspired by the inherent reaction prediction capability of human chemists, we propose a novel machine learning technique in the context of state energy calculations. QM/MM and wet lab experiments can supply some quantitative energy data, but are impractical to run on a large scale. In contrast, chemists exhibit significant problem-solving ability without making exact numerical calculations. Rather, their decisions are based solely on qualitative knowledge of trends and ranking orders in molecule stability and reaction rates. Our method utilizes the limited quantitative experimental data available together with this qualitative information to yield score functions accurate enough to reproduce the problem-solving capability of human experts.

19 Multiobjective approach to optimizing scoring functions for docking
Iain P. Mott, i.mott@sheffield.ac.uk1, Peter Gedeck, peter.gedeck@novartis.com2, and Valerie J. Gillet, v.gillet@sheffield.ac.uk1. (1) Department of Information Studies, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom, (2) Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Novartis Horsham Research Centre, Wimblehurst Road, Horsham, West Sussex, RH12 5AB, United Kingdom

Current scoring functions often fail to correctly prioritise compounds according to their known binding affinities. Previously, negative training data has been employed in scoring function optimisation. A genetic algorithm optimises a function to rank a known binding mode in preference to noisy decoy poses – this having the advantage of explicitly accounting for disfavoured interactions. We present a more targeted multiobjective approach. Using the Astex diverse dataset, we dock with an impaired version of GOLD to generate diverse decoys for each protein. Using a multiobjective evolutionary algorithm, we demonstrate a scoring function optimisation protocol. Optimising every pair-wise combination of the 85 members of the Astex diverse dataset we show that contentions exist in the optimal scoring function, suggesting that no global function exists for all targets. We extend this method to cross-docking to incorporate protein flexibility, optimize to particular targets and target classes, and demonstrate performance in virtual screening.

20 Reaction simulation expert system for synthetic organic chemistry
Jonathan H. Chen, chenjh@uci.edu and Pierre Baldi, pfbaldi@uci.edu. Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697

The long term goal of this project is to develop a computerized system with problem-solving capabilities in synthetic organic chemistry comparable to those of a human expert. At the core of such a system should be the ability to predict the course of chemical reactions to, for instance, validate synthesis plans. Our first approach, based on encoding expert knowledge as transformation rules, achieves predictive power competitive with chemistry graduate students, but requires significant knowledge engineering to expand its coverage to new reactivity. To overcome this limitation and achieve greater predictive power, our current approach is not based on specific rules, but instead upon general principles of physical organic chemistry. These principles allow the system to elucidate the mechanistic pathways and reaction coordinate energy diagrams of simulated reactions. These results directly mimic the qualitative problem-solving ability of human experts, but with the speed, precision, and combinatorial power of an automated system.

21 Wavelet compression of GRID fields for similarity searching and virtual screening
Richard L. Martin, lip06rlm@sheffield.ac.uk1, Eleanor J. Gardiner, e.gardiner@sheffield.ac.uk1, Valerie J. Gillet, v.gillet@sheffield.ac.uk1, and Stefan Senger, stefan.x.senger@gsk.com2. (1) Department of Information Studies, University of Sheffield, Sheffield, United Kingdom, (2) Computational, Analytical and Structural Sciences, GlaxoSmithKline, Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY, United Kingdom

Perhaps the most commonly used molecular interaction potential is the GRID field, comprised of a discrete grid placed over a molecule for which potential interaction energies between the molecule and a probe group (e.g. water) are calculated at each vertex. However GRID fields can be very large so that it is infeasible to align molecules based on their GRID representations. We show that the Daubechies 4-tap wavelet transform can be exploited to represent finely sampled GRID maps in 1.1% to 1.5% of the storage of the original fields. The reduced representations can be used in ligand-based similarity searching without significant loss of accuracy compared with using the whole field. The efficacy of other wavelets and the fast Fourier transform are also examined. We also describe the impact of wavelet approximation upon the retrieval of actives from decoys, and a method for generating molecular alignments based on the reduced GRID fields.

22 Where does the tetrazole ring belong? Insight to the binding pose of AT1 antagonists using homology modeling, molecular dynamics, and docking
N. J. Maximilian Macaluso, njmm2@cam.ac.uk and Robert C. Glen. Unilever Centre For Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom

The angiotensin II type I (AT1) receptor is a family A GPCR that mediates the renin-angiotensin system (RAS), a well-characterized pathway for blood pressure regulation. A class of non-peptide AT1 antagonists, called “sartans”, have been used to successfully treat hypertension. The ability to design more effective AT1 antagonists is of great pharmaceutical interest and relies on understanding the role of Lys199 in the binding site. Does this positively charged residue interact with the anionic tetrazole ring of many sartan drugs or not? In order to address this question, a comparative model of the AT1 receptor was constructed using the newly crystallized β2-adrenergic receptor as a template. This structure was relaxed using molecular dynamics in explicit solvent and lipids. Diverse AT1 antagonists were docked in this model, guided by SAR data and binding affinity trends. The results agree well with experimental information and suggest a novel binding orientation.

23 Fragment library design: What have we learned so far?
Ijen Chen, i.chen@vernalis.com and Roderick E. Hubbard. Vernalis (R&D) Ltd, Granta Park, Cambridge, CB21 6GB, United Kingdom

Fragment-based methods have become established over the past ten years as a powerful approach in structure-based lead discovery, with a number of compounds now entering clinical trials. The recent successes have led to the methods being adapted to varying degrees within most pharmaceutical companies.

As with any screening approach, the design of the library is crucial. As well as the usual criteria of compound diversity and chemical suitability, a fragment library is also constrained by the methods used to detect binding and how the fragments are going to be used. The initial versions of the Vernalis library were selected based on fairly well defined criteria that included cheminformatics filters and manual assessments of chemical tractability. Over the past seven years, the library has evolved considerably based on our experience in screening a wide variety of different target classes. The new factors that are taken into account include experience of the medicinal chemists with evolving the fragments, design of new fragments to explore binding hypotheses and the challenge of new protein-protein interaction targets. In addition, practical considerations such as compound stability and continued commercial availability have had an impact.

This presentation will briefly review the evolution of the library and our experience of utilising fragments for drug discovery projects. The main focus will be on a recent analysis that contrasts the physico-chemical properties of the library with the hits seen against various classes of targets. We will discuss what implications this experience has for the design of the next refresh of our library.

24 De novo design using reaction vectors: Application to library design
Valerie J. Gillet, v.gillet@sheffield.ac.uk1, Hina Patel, lip05hp@sheffield.ac.uk1, Michael Bodkin, BODKIN_MICHAEL@LILLY.COM2, and Beining Chen, b.chen@sheffield.ac.uk3. (1) Department of Information Studies, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, United Kingdom, (2) Eli Lilly UK, Erl Wood Manor, Windlesham, GU20 6PH, (3) Department of Chemistry, University of Sheffield, Sheffield, S3 7HF, United Kingdom

One of the outstanding issues in de novo design is the generation of molecules that are synthetically accessible and which also represent non-obvious structural transformations. We have developed a knowledge-based approach to de novo design which is based on reaction vectors that describe the structural changes that take place at the reaction centre, along with the environment in which the reaction occurs. The reaction vectors are derived automatically from a database of reactions which is not restricted by size or reaction complexity. A structure generation algorithm has been developed whereby reaction vectors can be applied to previously unseen starting materials in order to suggest novel syntheses. The approach has been implemented in KNIME and is validated by reproducing known synthetic routes. We then present applications of the method in different drug design scenarios including lead optimisation and library enumeration.

25 Virtual screening for fragment based drug discovery
Qiong Yuan, qyuan@cas.org, Cynthia Liu, and Fred Winer. Chemical Abstracts Service, PO Box 3012, Columbus, OH 43210

Fragment-based drug discovery has become an active field in academics and industry. CAS has been identifying key concepts and substances found within the associated documents in the world's largest repository of chemistry-related information. Previously it has been reported that bioactivity related concepts and more than 30,000 specific targets have been associated with specific substances. Making use of these features, it is possible to explore the chemical space around the fragments and discover the new relationships. Specific examples of this functionality will be provided.

26 Reagent-based fragment space for hit generation
Atipat Rojnuckarin, arojnuckarin@arqule.com, Rocio Palma, rpalma@arqule.com, and Mark A. Ashwell. ArQule, Inc, 19 Presidential Way, Woburn, MA 01801

ArQule's parallel synthesis technology is a powerful tool and it is continually being expanded. We recently undertook a case study to construct a new virtual chemical space from fragments derived from available reagents and more than 30 ArQule Platform Chemistries. This procedure seeks to improve the synthetic accessibility of potentially valuable hit molecules and takes advantage of the diversity of commercially available reagents. FTree-FS software from BioSolveIT GmbH allows efficient searching across this space for novel chemical matter that shares chemical features with the known active molecules but with improved synthetic accessibility. We also describe the application of this new chemical space searching paradigm as part of ArQule's Kinase inhibitor platform (AKIPTM) to identify kinase inhibitors with a type IV mechanism of action.

27 LoFT: Focused library design using feature tree similarity
J. Robert Fischer1, Uta Lessel, Uta.Lessel@bc.boehringer-ingelheim.com2, and Matthias Rarey1. (1) Center for Bioinformatics (ZBH), University of Hamburg, Bundestrasse 43, 20146 Hamburg, Germany, (2) Department of Lead Discovery - Computational Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, 88397, Germany

We present LoFT, a new approach for focused combinatorial library design. In contrast to existing methods chemical fragment spaces, which mainly consist of a collection of fragments and connection rules, are used as the underlying search space. Selecting one or several core fragments with the same link pattern, a focused library can be designed.

LoFT combines classical physicochemical design criteria with the feature tree descriptor for similarity/dissimilarity measurement. By applying the comparison directly on fragment level, we are able to design focused libraries efficiently without explicitly combining the fragments. Several stochastic algorithms are provided for traversing the search space, employing a weighted multi-objective scoring function, filtering rules and diversity mechanisms. Besides e.g. simulated annealing and threshold acceptance, a cherry picking, which selects the n best products from the search space, is available.

For validation, LoFT was applied to several drug design scenarios. Starting with known drug molecules, we generated focused libraries within desired property ranges.

28 Chemistry Librarians: What's on the horizon? How do we get there?
Elizabeth Anne Brown, ebrown@binghamton.edu, Libraries, Binghamton University Libraries, P.O. Box 6012, Binghamton, NY 13902-6012

The chemical information profession and libraries in general face a host of new technologies that have and will transform the daily practice of librarianship. Some of these new technologies include search engine applications such as Google Scholar, Open Source applications, new publishing models such as Open Access, and emphasis on building digital repositories and preserving digital library content. This lecture gives an overview of these technologies, the challenges to traditional library spaces and services, and offers solutions to address relevancy in this changing environment, as well as long-term career growth and skill-building strategies.

29 Google generation and nontraditional chemistry information training
Norah N. Xiao, nxiao@usc.edu, Science and Engineering Library, University of Southern California, 910 Bloom Walk SSL 101, Los Angeles, CA 90089

As academic libraries continue providing seamless access to information in electronic format, more and more users tend to study and find information on-line. This is most evident in the Google Generation of our students who grew up navigating multi-media and information technologies. However, do they really get what they are looking for? What are they actually searching? How do they feel about their information search skills or information resources they use? On the other hand, how can we, librarians, provide effective chemistry information research training to them? Especially with more new technologies available, how can we effectively apply them to our services and train students become chemistry information literate? This presentation will try to answer above questions based on author's work with chemistry graduate students at the University of Southern California. Many new technologies, such as blog, Youtube, Slideshare, on-line tutorials created by Camtasia, etc., are adopted and applied to library training in different learning occasions (e.g. new students' orientation, on-going workshop, and reference services, etc.). It is hoped this work can provide an example for nowadays chemistry information training. The related links are (1) orientation link (http://chemusc.wordpress.com/for-students/new-graduate-students-orientat...) and on-line tutorials link (http://www.usc.edu/libraries/subjects/engineering/tutorial/index.php).

30 Evaluating, recommending, ranking, linking: Traditional or new roles of chemical information professionals?
Martin P. Brändle, braendle@chem.ethz.ch1, Jana Sonnenstuhl, jana.sonnenstuhl@googlemail.com2, and Engelbert Zass, zass@chem.ethz.ch1. (1) Informationszentrum Chemie Biologie Pharmazie, ETH Zuerich, HCI G 5.3, CH-8093 Zuerich, Switzerland, (2) Informationszentrum Chemie Biologie Pharmazie, ETH Zürich, HCI G 5, 8093 Zürich, Switzerland, Zürich, 8093 Zürich, Switzerland

The traditional role of the chemical information specialist as a searcher and mediator is challenged by the increased availability of databases and electronic publications at the workbench of the researcher. This results in decrease of costumer contact and retention. Teaching of information literacy is often taken up as counter-strategy. Due to time restraints it is focused on the most important sources and hence must be complemented with information services that support the user in locating and judging the appropriate source. One important quality that comes here into play is the knowledge about subjects and information sources that the information professional acquires through evaluation of information products and through cooperation with publishers, information providers, database producers and customers. We will present strategies and individual projects and examples – such as a recent evaluation of the chemical content of Wikipedia and the Römpp Chemistry Lexikon – for how sharing specialist's and user's knowledge may enhance products and library information services.

31 Learning spaces and library places
Andrea B. Twiss-Brooks, atbrooks@uchicago.edu, John Crerar Library, University of Chicago, 5730 S. Ellis, Chicago, IL 60637

The dramatic changes in the nature of scientific publishing and communication in recent years have had a direct impact on the nature of the academic library as place. Spaces previously earmarked by libraries for the growth of our print collections should be reexamined for other purposes as we locate significant portions of their collections into storage facilities, cancel print journal subscriptions, and withdraw unneeded materials. Space in the center of our campuses for offices, classrooms, and other facilities is at a premium, and libraries are being targeted by administrators as a source of new space for these purposes. Libraries need to forge new partnerships with campus units whose programs complement our own. Librarians must become versed in the principles and practices of programmatic planning, design, and assessment of learning spaces. Recent activities and thinking about reinventing science library spaces at the University of Chicago Library are described.

32 Addressing researchers' current awareness and personal information management needs
Meghan Lafferty, mlaffert@umn.edu, Science & Engineering Library, University of Minnesota, 108 Walter Library, 117 Pleasant St SE, Minneapolis, MN 55455

Over the last few years, the University of Minnesota Libraries conducted two studies of research-related habits of faculty, graduate students, and other researchers; one addressed the social sciences and humanities, and the other focused on the sciences. A major goal was to identify needs not currently being met where the libraries might be able to play a role in providing solutions. One key observation was the growing difficulty scholars and researchers have keeping up with the literature in their fields and subsequently managing that information. In response to this problem, the libraries formed an exploratory group with the goal of finding a more systematic approach to current awareness and personal information management. The group assessed existing tools and potential opportunities for collaboration and services. I will be discussing their recommendations and the resulting best practices guidelines for researchers.

33 Computational tools for fragment based drug design
A. Peter Johnson, P.Johnson@leeds.ac.uk1, Zsolt Zsoldos2, Aniko Valko, aniko.valko@keymodule.co.uk3, and Vilmos Valko, vilmos.valko@keymodule.co.uk3. (1) School of Chemistry, University of Leeds, Leeds, LS2 9JT, United Kingdom, (2) SimBioSys Inc, 135 Queen's Plate Dr, Suite 520, Toronto, ON M9W 6V1, Canada, (3) Keymodule Ltd, Leeds, United Kingdom

Although originally developed for complete de novo ligand design, the SPROUT software suite provides a set of tools ideally suited to the design of ligands incorporating one or more small fragments known through experimental methods (such as x-ray crystallography or nmr) to bind to specific regions of a target protein. In the case of a single fragment with known binding pose, SPROUT LeadOpt is able to apply a reaction knowledgebase and a set of available starting materials to carry out virtual reactions on the fragment to generate hypothetical ligands which are both readily synthesisable and also predicted to bind strongly to the target. Where two or more fragments bind in different regions, SPROUT is able to link them together, redocking to maintain the original poses, although also allowing some movement limited by user selected tolerances.

The technology used will be discussed together with examples illustrating its application

34 Design and application of fragment libraries for protein crystallography
John Badger, john@zenobiatherapeutics.com, Zenobia Therapeutics, 505 Coast Blvd South, Suite 111, La Jolla, CA 92037

The selection of appropriate molecules for incorporation into a fragment screening library is driven by the experimental technique with which binding will be detected and the way in which the hit information will ultimately be used in the lead development process. To design libraries for crystallographic fragment screening we have developed both general methodologies and rule-based filtering software to select appropriate fragment molecules from large commercial collections. Application of these procedures enables the flexible design (and redesign) of libraries for general target screening and the design of small focused libraries in which the molecules have more specific properties. Our approach to identifying early lead development candidates from collections of purchasable or synthesizable compounds uses the specific binding information from small fragment hits found in the crystallographic screen and incorporates procedures that maintain consistency between the dimensions of the lead development molecule and the structure of the target site.

35 Docking small fragments using MCSS minimization
Jürgen Koska, jkoska@accelrys.com1, Eric Yan1, Lakshmi S. Narasimhan2, Qiyue Hu, jerry.hu@pfizer.com2, Jim Na2, and Allister J. Maynard1. (1) Accelrys, 10188 Telesis Court Suite 100, San Diego, CA 92121-3752, (2) Structural and Computational Biology, Pfizer Global Research and Development, La Jolla Laboratories, 10578 Science Center Drive, San Diego, CA 92121

In this work we demonstrate that MCSS (Multiple Copy Simultaneous Search) is a powerful CHARMm-based method for docking and minimizing small ligand fragments in an active protein-binding site. The performance and ability to recover the positions of native ligand-protein complexes was investigated using a novel, fully automated, and workflow-based MCSS implementation. Accurate scoring and placement of fragment is crucial when using MCSS in fragment-based ligand design and we present validation using several small protein-fragment complexes. The results show that MCSS is able to recover the X-ray poses, and, with only some exceptions, score the pose correctly.

36 The discovery of AT7519 and AT9283 using fragment based drug design
Valerio Berdini, v.berdini@astex-therapeutics.com, Computational Chemistry, Astex Therapeutics Ltd, 436 Cambridge Science Park, Milton Road, Cambridge, Cambridge, CB4 0QA, United Kingdom

Astex Therapeutics has pioneered the application of fragment based drug design. Here we will briefly describe how fragment based drug design was used to identify AT7519, a novel CDK inhibitor which is currently in clinical trials. We will then go on to discuss how fragments and molecules identified during the CDK project were used to develop novel Aurora inhibitors. This work led to the identification of AT9283 which is also currently in clinical trials. The talk will discuss how state-of-the-art computational tools and structure based drug design were used to optimise the candidates.

37 Scientific data stewardship: Meeting the challenge in academic libraries
Barbara A. Losoff, Barbara.losoff@colorado.edu, Libraries, University of Colorado at Boulder, Boulder, CO 80309

E-research or networked science is data driven, both a consumer and producer of data. Faced with this data deluge, academic librarians have been redefining their role and potentially the mission of the libraries. As more government granting agencies require institutional data archiving and public access, librarians have become sought-after partners for input on research grants. Although, the issues surrounding data management remain complicated, the basic principles guiding librarianship and research support still apply: harvesting, describing, archiving, and access. As the libraries at Purdue, Stanford, and Cal Tech offer models of data management, the real challenge for academic librarians will be in tailoring data stewardship at the institutional level.

38 Data awareness: Should chemistry information professionals care?
Barbara A. Losoff, Barbara.losoff@colorado.edu, Libraries, University of Colorado at Boulder, Boulder, CO 80309

Data-based research, or eScience, is a growing area in chemistry. Access to research data involves capture, indexing, curation, preservation, and rights management, all areas with which chemistry librarians have some knowledge and experience. Librarians are seen in the increasing number of white-papers on data as having a significant role in the success of data science, especially in education and curation. As this research approach develops in the chemistry discipline, issues are mounting, especially with current publishing models. Chemistry information professionals need to be aware of data issues, data science and data curation, to support the increased data needs of all chemists, understand the development of eChemistry as a research area, and interface with the publishing and communication venues in chemistry. This presentation will discuss the state of eScience in chemistry research and how we at the Cornell University Library are beginning to approach it, with implications for essential skill building for science librarians.

39 NIH public access policy: Opportunity for a new library service?
Erja Kajosalo, kajosalo@mit.edu, MIT Libraries, Massachusetts Institute of Technology, 14S-134, 77 Massachusetts Ave, Cambridge, MA 02139-4307

This presentation will cover the basic information of the new NIH public access policy and author rights issues related to this policy, and report on a recent ethnographic study of NIH funded authors. During fall 2008, the MIT Libraries conducted a qualitative study of NIH funded researchers' publication process in order to better understand the decision-making and workflow process that researchers use to disseminate their research. The results of this study will inform the Libraries about appropriate services to offer to assist NIH researchers when publishing.

40 Yogendra Patel, yogendra.patel@manchester.ac.uk1, Catherine Heyward, C.A.Heyward@liverpool.ac.uk2, and Professor Douglas B. Kell, dbk@manchester.ac.uk1. (1) Manchester Interdisciplinary Biocentre, University of Manchester, 131 Princess Street, Manchester, M1 7DN, United Kingdom, (2) School of Biological Sciences, University of Liverpool, Crown Street, Liverpool, L69 7ZB
Yogendra Patel Catherine Heyward, Professor Douglas B. Kell

Transcription factor Nuclear factor-κB (NF-κB) is a protein complex found in almost all animal cell types. NF-κB is involved in regulating immune response to infection and has been linked to cancer, inflammatory and autoimmune diseases, septic shock, viral infection, and improper immune development. It is also involved in celluar responses to stimuli including stress, free radicals and bacterial/viral antigens. Using over 400 known agonists and antagonists of NF-κB obtained from the literature, we computationally identify structurally similar clusters of compounds which interact at specific locations of the NF-κB cellular pathway.

41 Molecular similarity searching using inference network
Ammar Abdo, ammar_utm@yahoo.com and Naomie Salim, Naomie@utm.my. Information Systems, Chemoinformatic Group, Universiti Teknologi Malaysia, Faculty of Computer Science & Information Systems, D07, Skudai, 81310, Malaysia

Many methods have been developed to capture the biological similarity between two compounds for used in drug discovery. One of the disadvantages in conventional 2D similarity searching is that molecular features or descriptors that are not related to the biological activity carry the same weights as the important ones. To overcome this limitation, we introduced a novel similarity-based virtual screening approach based on Bayesian inference network, where the features carry different statistical weights, with features that are statistically less relevant being deprioritized. Here, similarity searching problem is modeled using inference or evidential reasoning under uncertainty. An important characteristic of the network model is that it permits the combination of multiple queries, molecular representations, and weighting schemes. Our experiments demonstrate that similarity approach based on the network model is outperform the Tanimoto similarity approach with reasonable improvement, thus offering a promising alternative to existing similarity search approaches.

42 A fragment based de novo application in the context of the active site
Carsten Detering, detering@biosolveit.de, Holger Claußen, Holger.Claussen@biosolveit.de, and Marcus Gastreich, marcus.gastreich@biosolveit.de. BioSolveIT GmbH, An der Ziegelei 75, 53757 Sankt Augustin, Germany

FlexNovo is a molecular design program for structure-based de novo searching in large fragment spaces following a sequential growth strategy. Having the active site as structural information, it uses fragment spaces as input that consist of several thousands of chemical fragments and a corresponding set of rules, which primarily specifies how the fragments can be connected with each other. Synthesizability can be ensured by several placement geometry, drug-likeness and diversity filter criteria that are directly integrated in the build-up process.

FlexNovo can be used for fragment expansion, e.g., starting from an Xray structure that has been produced in a fragment screen. Or it can be used entirely in a de novo fashion where the algorithm places fragments arbitrarily in the pocket and then grows the compound from the most promising ones.

We demonstrate the performance of FlexNovo on a few relevant medicinal chemistry projects.

43 Pharmacophore guided fragment based drug design for lead optimization
Scott D. Bembenek, sbembene@its.jnj.com, Computer Aided Drug Discovery, Johnson & Johnson Pharmaceutical Research & Development, LLC, 3210 Merryfield Row, San Diego, CA 92121 and Shikha Varma-O'Brien, shikha@accelrys.com, Accelrys Inc, 10188 Telesis Court, San Diego, CA 92121.

Using a pharmacophore to describe the interactions between a biological target and its corresponding ligands is an established virtual screening tool at the early stages of drug discovery. In recent years, the use of fragment-based approaches in drug discovery has gained wide popularity. In general, a fragment-based approach is very desirable since starting with low molecular weight fragments (rather than full-sized molecules) offers the advantage of increased sampling of chemical space and the possibility of improved drug-like properties. We have introduced an in silico method that utilizes pharmacophores for a combinatorial fragment-based approach applicable to both the design of novel compounds, and for lead optimization. Given a pharmacophore, small molecular fragments can be rapidly assembled into new molecules. Here we illustrate how applying this methodology was instrumental in our lead refinement efforts.

44 Fragment based docking and linking engine of eHiTS
Zsolt Zsoldos, SimBioSys Inc, 135 Queen's Plate Dr, Suite 520, Toronto, ON M9W 6V1, Canada

Theoretically, any docking engine can be used to place small molecule fragments into the active sites of receptors and score them. However, most methods suffer from the under-defined constraints -- small fragment in a large cavity -- thus perform inadequately. In contrast, the eHiTS engine [1] has been designed to work exactly in this scenario: it breaks down larger ligands into small fragments and docks those independently, then reconnects the poses. eHiTS provides very accurate (about 0.5A RMSD) pose prediction for small fragments and capable of linking them up without significant loss of the accuracy. The method will be presented with practical examples on how to use eHiTS for fragment based structure design. Validation results will be presented to demonstrate the method's accuracy.

[1] Z. Zsoldos, D. Reid, A. Simon, S.B. Sadjad, A.P. Johnson: eHiTS a new fast, exhaustive flexible ligand docking system; J.Mol.Graph.Modeling. (26), 1, 2007, 198-212; doi:10.1016/j.jmgm.2006.06.002 >

45 Druglike pieces for the virtual chemistry jigsaw puzzle: Toward optimized fragment spaces
Christof Wegscheid-Gerlach, christof.wegscheid-gerlach@bayerhealthcare.com1, Jörg Degen2, Hans Briem1, Matthias Rarey2, and Andrea Zaliani2. (1) GDD-LGO-MCB-MC-VII, Bayer-Schering Pharma AG, Muellerstr. 178, Berlin, 13342, Germany, (2) Center for Bioinformatics (ZBH), University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany

In silico approaches considering either descriptor-, ligand- or structure-based information for navigating within chemical fragment spaces have been established within the leadfinding phase of a drug design project. One open question still remains about the compilation and setup of fragment spaces. Therefore we have compiled a new and elaborate set of rules for the breaking into retrosynthetically interesting chem. substructures (BRICS) and used this for obtaining chemical fragments from biol. active compounds and vendor catalog sources.

Based on our studies three new fragment sets have been compiled, with different optimized performances in retrieving random sets of queries from different sources, which are available at http://www.zbh.uni-hamburg.de/BRICS .

In addition we performed a comparative study of the BRICS fragment space with fragment spaces derived from kinase inhibitors. In our presentation we will highlight the similarities as well as the differences between these two fragment universes

46 Application of computational methods in pharmaceutical solid form selection
Robert Docherty, Materials Sciences, Pfizer Global Research and Development, Building 530 (IPC 435), Ramsgate Road, Sandwich Kent, CT13 9NJ, United Kingdom

The selection of the solid form for development is a milestone in the conversion of a new chemical entity into a drug product. An understanding of the materials science and crystallisation of a new active pharmaceutical is crucial at the interface of drug substance manufacturing and drug product processing. In this presentation the broad challenges facing pharmaceutical scientists, as a consequence of polymorphism, hydrate and solvate formation during product design will be highlighted. The opportunities presented by structure based computational tools to help address these challenges will be presented in terms of a framework that addresses both the business need and the new emerging regulatory environment.

47 Investigating crystal engineering principles using a data set of pharmaceutical cocrystals
Peter A. Wood, wood@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, United Kingdom

Cocrystallization has recently been gaining popularity within the pharmaceutical industry as a viable method for producing a solid dosage form. The effective use of cocrystallization for this purpose is clearly affected by the capacity with which the solid forms produced can be controlled and predicted. The number of potential coformers that could be used in a cocrystallization screen for a drug molecule is, for example, vastly greater than the number of possible counterions for a salt screen. Towards this goal, the systematic analysis of a family of structures containing a common molecular species can aid significantly in the understanding of the solid state behaviour of the particular system and cocrystallization in general. This contribution introduces a recently developed set of computational tools to analyse crystal packing patterns and applies them to investigate concepts such as motif competition and the adherence to Etter's first rule in a dataset of pharmaceutical cocrystals.

48 Cocrystal design and packing analysis based on a family of crystal structures containing a common molecule
Scott L Childs, EmTechBio, Emory University, 1256 Briarcliff Rd. NE, Atlanta, GA 30306

Crystal Engineering studies can be credited with giving rise to the recent interest in cocrystals (molecular complexes) of pharmaceuticals as a means of improving the physical properties of pharmaceutical dosage forms. Hydrogen bonds have been the traditional tool used for cocrystal design as well for analysis of crystal structures. The fact that hydrogen bonds can be observed in crystal structures and visualised easily does not necessarily mean that they are 'structure-directing' and other, less directional, interactions may be energetically competitive. Topics will include the importance of dispersion-dominated packing interactions in cocrystals compared to that of hydrogen bonds as well as investigating packing similarity, polymorphism, pseudo-isostructurality, and the occurrence of common 1D channel structures within a family of related cocrystals containing a common active pharmaceutical ingredient (API).

49 Hydrogen bond propensities: Knowledge-based predictions to aid solid form selection
Peter T. A. Galek, galek@ccdc.cam.ac.uk, CCDC, 12, Union Road, Cambridge, CB2 1EZ, United Kingdom

A methodology has been developed to assess the likelihood of hydrogen bond occurrence in crystal structures [1]. A reliable prediction is potentially very valuable during pharmaceutical solid form selection since these strong, consistent interactions are crucial to structural stability [2], and likely variations often indicate polymorphism [3].

Its application will be demonstrated on a selection of existing polymorphic APIs. Characterisation literature for these is available, providing relative stabilities for comparison. Stable and metastable polymorphs are shown to differ significantly by the extent of low propensity hydrogen bonds.

The methodology is based on a model function optimized on hydrogen bonding data of related, known compounds. Once a model is derived, only a target chemical diagram is required for prediction owing to the form of descriptors: topological and chemical parameters which describe influences such as steric accessibility, competition between groups, and donor and acceptor type. Their form and influence will also be discussed.

1. Galek, P.T.A., Fábián, L., Allen, F.H., Motherwell, W.D.S. & Feeder, N. (2007). Acta Cryst. B63, 768-782.

2. Bernstein , J. (1993). J. Phys. D: Appl. Phys. 26, B66-76.

3. Singhal, D & Curatolo, W (2003). Adv. Drug. Del. Revs. 56, 335-347.

50 Crystal structure prediction: A decade of blind tests
Frank JJ. Leusen, Institute of Pharmaceutical Innovation, University of Bradford, Bradford, United Kingdom

The goal of predicting the solid state structures of an organic molecule from its molecular structure alone has attracted considerable industrial interest. The difficulty of the task is demonstrated by the regular Blind Test in Crystal Structure Prediction (CSP), which is hosted by the Cambridge Crystallographic Data Centre. In this contribution, the previous Blind Tests are briefly reviewed and the successful application of a new CSP approach to all four compounds (including a co-crystal) of the 2007 Blind Test is presented (see also Neumann, Leusen and Kendrick, Angewandte Chemie International Edition, 47: 2427 – 2430 (2008)). The central part of the new approach is a hybrid method for the calculation of lattice energies that combines density functional theory simulations with an empirical van der Waals correction. Typical applications of the new methodology will be discussed, as well as its limitations.

51 Force field based scoring of protein-ligand binding affinities
Johan Aqvist, aqvist@xray.bmc.uu.se, Department of Cell and Molecular Biology, Uppsala University, Biomedical Center, POB 596, SE-751 24 Uppsala, Sweden

We will discuss recent advances in applying molecular mechanics based scoring methods to protein-ligand complexes. Some key issues that will be addressed are sensitivity to the 3D receptor model, treatment of solvation, effects of conformational sampling, discrimination between binding modes, high-throughput applications and the use of force field energies in QSAR models.

52 Structure based drug design and LIE models for GPCRs
Peter Kolb, kolb@blur.compbio.ucsf.edu1, Daniel M. Rosenbaum, drosenb1@stanford.edu2, Anne Marie Munk Jorgensen3, John J. Irwin, jji@cgl.ucsf.edu1, Brian K Shoichet, shoichet@cgl.ucsf.edu1, and Brian K. Kobilka2. (1) Department of Pharmaceutical Chemistry, University of California, San Francisco, 1700 4th Street, San Francisco, CA 94158, (2) Department of Molecular and Cellular Physiology, Stanford University, 157 Beckman Center, Stanford, CA 94305, (3) Department of Computational Chemistry, H. Lundbeck A/S, Ottiliavej 9, Dk 2500 Valby, Denmark

Aminergic GPCRs have been in the focus of pharmaceutical research for the past decades. Due to the lack of crystal structures, all efforts had to be limited to ligand- and homology model-based methods, however. The recently solved structure of the ß2-adrenergic receptor now offers the opportunity to use structure-based design approaches. Consequently, we carried out a virtual screening campaign using the program DOCK and the 1 M molecules of the "lead-like" subset of the ZINC library. Upon testing of 31 selected molecules, six were found to be active with binding affinities below 7 µM, with the best compound binding with a Kd of 17 nM.

In order to evaluate routes for improving the ranking and investigate the energetic contributions for binding to ß2-adrenergic, we calculated Linear Interaction Energy (LIE) models based on binding data obtained from literature. Specifially, we used the LIECE (Linear Interaction Energy with Continuum Electrostatics) approach that has been developed by Huang and Caflisch. The resulting model with good predictivity was used to reevaluate the six hits of the primary screening as well as an in-house data set. Interestingly, the coefficients for the energy terms differ significantly from previously published LIECE models for proteases and kinases, which demonstrates the distinctness of GPCR binding sites.

53 Learning scoring function parameters from binary data
Markus HJ. Seifert, mhj.seifert@gmx.de, Chem- and Bioinformatics, 4SC AG, Am Klopferspitz 19a, Planegg-Martinsried, D-82152, Germany

Target-specific optimization of scoring functions for protein-ligand docking is able to achieve significant improvements in the discrimination of active and inactive molecules. This concept can be extended by taking into account not only a single target structure but an ensemble of structures from a target family. The objective function, however, has to be generalized for this case and a suitable global optimization algorithm has to be applied. It is shown that the virtual screening performance for kinases improves significantly upon using scoring function parameters optimized specifically for that target family. Additionally, the major reason for improved screening performance on kinase targets is identifed. In summary, a general framework for the global, multi-objective optimization of scoring functions is presented which allows for taking advantage of prior knowledge in a systematic, effective, and robust way.

54 2-D and 3-D adaptive scoring functions for iterative kinase medium-throughput screening (ikMTS) with Profile-QSAR and AutoShim
Eric J. Martin, eric.martin@novartis.com1, David C. Sullivan2, and Prasenjit Mukherjee, pkmukher@olemiss.edu1. (1) Novartis Institute for Biomedical Research, 4560 Horton St, Emeryville, CA 94530, (2) Anacor Pharmaceuticals, Inc, 1020 East Meadow Circle, Palo Alto, CA 94303, CA 94530

Screening our 1.5 million compound archive requires 6 months and $1,000,000. Profile-QSAR is a novel kinase-specific, fragment-based, 2D modeling method that combines data for >100,000 compounds against >70 kinases to produce fast, accurate, kinase activity predictions for iterative screening. Since fragment-based methods loose accuracy for novel chemotypes, docking is also employed. However, conventional docking suffers 3 limitations: 1) it requires a target protein structure, 2) is slow, and 3) does not correlate with affinity. Using medium-throughput experimental activity data, AutoShim adjusts pharmacophore “shims” to produce highly predictive, target-specific, scoring functions. Over 5 months, our entire archive was pre-docked into a “Universal Kinase Surrogate Receptor” of 16 diverse kinase crystal structures. AutoShim can now be “shimmed” for new kinases with experimental binding data to accurately predict activity for 1.5 million compounds in hours instead of weeks, without a crystal structure. Together, Profile-QSAR and AutoShim produced effective iterative screens.

55 Combining quantitative data and qualitative knowledge to score reaction energies
Chloe-Agathe Azencott, cazencot@ics.uci.edu1, Matthew A. Kayala, mkayala@ics.uci.edu1, and Pierre Baldi, pfbaldi@uci.edu2. (1) Bren School of Information and Computer Science, IGB at University of California, Irvine, 6210 Donald Bren Hall, Irvine, CA 92697, (2) Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697

Predictive scoring functions based on statistical learning techniques generally require large amounts of quantitative training data. Unfortunately this numerical knowledge is usually unavailable or prohibitively expensive to obtain.

For practical application however, experts often only require qualitatively precise results to define accurate ranking orders. Inspired by the inherent reaction prediction capability of human chemists, we propose a novel machine learning technique in the context of state energy calculations. QM/MM and wet lab experiments can supply some quantitative energy data, but are impractical to run on a large scale. In contrast, chemists exhibit significant problem-solving ability without making exact numerical calculations. Rather, their decisions are based solely on qualitative knowledge of trends and ranking orders in molecule stability and reaction rates. Our method utilizes the limited quantitative experimental data available together with this qualitative information to yield score functions accurate enough to reproduce the problem-solving capability of human experts.

56 Drug development and solid form selection: Multicomponent crystals
William Jones, wj10@cam.ac.uk, Department of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, United Kingdom

Developing new and stable crystal forms for drug product development remains a challenge from both a commercial viewpoint as well as from our need to further understand molecular aggregation and crystal packing. Our understanding of molecular recognition, supramolecular chemistry and crystallization phenomena help in what is frequently referred to as “crystal engineering”. The ability to couple experimental observations with data in the CSD presents real opportunities. Multicomponent crystals (where two or more distinct chemical species are present in the crystal) is an area of particular interest to pharmaceutical chemists where salts, hydrates and cocrystals (amongst others) can all be possible outcomes of a crystallization process. Screening for all possibilities becomes critical and while addressing some of the above issues I will also outline recent developments in mechanochemical methods as a screening tool.

57 Supramolecular heterosynthons and their role in cocrystal design
Mike Zaworotko, xtal@usf.edu, Miranda L Cheney, mcheney@cas.usf.edu, and David Weyna. Department of Chemistry, University of South Florida, CHE205, Tampa, FL 33647

Crystal engineering facilitates discovery of new crystal forms for long known molecules that are of practical utility such as active pharmaceutical ingredients, APIs. This contribution will focus upon an emerging class of crystal form, pharmaceutical cocrystals, with emphasis upon the following:

- A historical perspective of this long known but little studied class of compounds;

- Statistical analysis of the probability that certain supramolecular heterosynthons will exist in the presence of competing functional groups, i.e. how to select co-crystal formers for APIs using statistics generated from the Cambridge Structural Database;

- Examples of new co-crystals that include some long known natural products and APIs and how they fine tune physical properties of clinical relevance;

- An analysis of polymorphic co-crystals that focuses upon the persistence of supramolecular heterosynthons in polymorphs.

58 Cambridge Structural Database analysis of complementary molecular properties in cocrystals
László Fábián, fabian@ccdc.cam.ac.uk, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, United Kingdom

A database of organic cocrystal structures was extracted from the Cambridge Structural Database. Molecular descriptors were calculated for all molecules in the cocrystal dataset. The resulting database describes pairs of molecules that form cocrystals with each other in terms of the calculated molecular properties.

The properties that are generally similar or complementary for molecules in a cocrystal were identified by using correlations between the corresponding molecular descriptors. Two-dimensional density plots and box plots were created to visualise the observed trends and to elucidate their statistical significance.

The results show that cocrystals are usually formed by molecules of similar shapes and polarities. Analysis of previous cocrystal screening experiments clearly demonstrates that the efficiency of screening can be increased by considering shape and polarity descriptors. Unusual cocrystals that are formed by molecules of different polarities and shapes may help in the qualitative understanding of the chemical reasons behind the statistical results.

59 Applications of the CSD to structure determination from powder data
Alastair J. Florence, alastair.florence@strath.ac.uk1, Ryan Taylor1, Norman Shankland1, and Kenneth Shankland2. (1) Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, 27 Taylor Street, Glasgow, G4 0NR, United Kingdom, (2) ISIS Facility, STFC Rutherford Appleton Laboratory, Chilton, Didcot, Oxfordshire, OX11 0QX, United Kingdom

A key element of preclinical drug development includes the assessment of physical form diversity. In this context, it is not uncommon to see crystal structures of novel polymorphs, solvates and salts being solved from XRPD data using global optimization approaches, such as simulated annealing. As a general rule of thumb, the more complex the structure, the more difficult it is to locate the global minimum in the real-space search. It is, therefore, beneficial to develop strategies that maximise the chances of solving structures that have a high number of degrees of freedom (DoF). Here, we investigate strategies based on the CSD and the Mogul program. When the number of internal DoF in a global optimization is high, Mogul torsion angle search space restrictions can increase the chances of solving a structure. CSD-derived geometry information is also advantageous in Z-matrix construction and in the derivation of restraints in Rietveld refinement.

60 Pharmaceutical crystal forms at the dynamic intersection of science and intellectual property law
Andrew V. Trask, avtrask@jonesday.com, Intellectual Property Legal Intern and Registered Patent Agent, JONES DAY, 222 East 41st Street, New York, NY 10017

Crystal form technology is a powerful tool that can present certain scientific and legal opportunities during innovation in pharmaceutical materials development. From a scientific perspective, the intelligent and efficient design of an optimum crystal form can potentially facilitate development and expedite regulatory approval. From a legal standpoint, these same potential advantages may, in certain cases, confer patentability on innovative advances in the crystal form technology surrounding a development candidate. As a result, crystal form technology represents an important intersection between science and the law—an intersection that continually evolves in response to both scientific and legal developments. This presentation will summarize the latest prominent court cases addressing patentability in the pharmaceutical field. It will then discuss significant recent advances in crystal form technology, and it will offer an outlook on how such scientific advances may impact the legal standard for patentability in this key area of pharmaceutical development.

61 Developing scoring functions for a class of proteins
C. M. Venkatachalam, Shikha Varma-O/Brien, Tedman J. Ehlers, and Jurgen Koska. Accelrys, Inc, 10188 Telesis Court, San Diego, CA 92121

This work consists of developing scoring function to prioritize ligand poses in a receptor site. Our past efforts in this area resulted in the development of LigScore1 and LigScore2 functions(1). They were obtained by looking for scoring functions that would reproduce the observed binding affinities (pKi values) using experimentally observed ligand poses. Those studies employed a variety of protein systems. These functions have some success with predicting binding affinities in the cases tested, while working with other systems suggest that the performance of these functions need improvement. The scoring problem is a complex one, as seen by efforts described in the literature; it seems the task of using a single scoring function to deal with a wide variety of protein systems is a tall order. One then wants to see if a scoring function can be developed for a single class of proteins. If that can be done to a higher degree of accuracy for a handful of protein systems, one may then be able to address the problem of understanding the changes in scoring functions required for different classes of proteins.

We present a workflow based statistical algorthms to fine tune LigScore functions for a specific class of proteins. The workflow involves preparation of proteins including protonation using pK prediction algorithm, hydrogen additions using HBUILD algorthm. The statistical methods involve regression using LigScore parameters obtaining coefficients to optimally fit observed pKi values for each ligand. We compare the LigScore coefficients obtained with two different classes of proteins such as HIV protease and Kinases.

(1) Journal of Molecular Graphics and Modelling, Vol. 23, Issue 5, April 2005, Pages 395-407

62 Development of novel iterative knowledge-based scoring functions for protein-ligand and protein-protein interactions
Xiaoqin Zou, zoux@missouri.edu, Department of Physics, Department of Biochemistry, Dalton Research Center and Informatics Institute, University of Missouri-Columbia, 134 Research Park, Columbia, MO 65211

We have recently developed a novel iterative knowledge-based scoring function for protein-ligand interactions and protein-protein interactions, referred to as ITScore and ITScore-PP, respectively. The key idea is to extract atom-based, distance-dependent pair potentials from a large training set of native and decoy complex structures. ITScore and ITScore-PP have been extensively tested for binding mode and affinity predictions, using diverse test sets published in literature. The results were compared with other scoring functions. ITScore and ITScore-PP showed very good performance. Inclusion of the entropic effect and desolvation effect further improved the predictions.

63 eHiTS scoring function
Zsolt Zsoldos, SimBioSys Inc, 135 Queen's Plate Dr, Suite 520, Toronto, ON M9W 6V1, Canada and Danni Harris, danni@simbiosys.ca, Computational Chemistry, SimBioSys Inc, 135 Queen's Plate Dr, Suite 520, Toronto, ON M9W 6V1, Canada

The eHiTS[1] scoring function departs from traditional atom-based scoring, uses a novel concept of scoring interactions based on Interacting Surface Points (ISP). Statistically derived empirical function is constructed using 4-parameter geometric description of the relationship between ISP pairs. The energy associated with ISP pairs is deduced from statistics using the Boltzmann distribution function. Temperature factors were considered to account for variable uncertainty of the atom positions in PDB X-ray structures. Additional scoring terms: desolvation energy, ligand conformational strain, entropy loss upon binding, pose depth within the binding pocket and reproduction of key interaction patterns. Receptor cavities are automatically clustered based on shape and surface similarity and specific weight sets are adapted for each cluster. Results are demonstrated on the Acetylcholine Binding Protein (AChBP) with key cation-Pi interactions[2]. eHiTS produces the correct pose with the best score and gives good correlation with experimental binding affinities.

[1] doi:10.1016/j.jmgm.2006.06.002

[2] doi:10.1038/sj.emboj.7600828008-->

64 Development of scoring functions for computing protein-ligand binding affinities
Richard A. Friesner, rich@chem.columbia.edu, Department of Chemistry, Columbia University, 3000 Broadway, New York, NY 10027

We will report on our latest version of the Glide XP scoring function which has been developed to calculate binding affinities for diverse compounds. Our new results demonstrate both the ability to rank order diverse compounds, and to reject random database ligands with a proficiency that is significantly better than previous efforts along these lines. The scoring function is global with the exception of core reorganization parameters which are associated with significant induced fit structural changes of the receptor; such effects cannot be modeled in principle by an empirical function which considers only protein-ligand interactions, and therefore must be incorporated into the model as offsets. A number of novel components, including receptor strain energy induced by ligand rings, explicit use of a water displacement functional generated by molecular dynamics, and many special terms for unusual chemical interactions such as pi-cation interactions, have been incorporated into the scoring function.

65 Development of scoring functions for computing protein-ligand binding affinities
Robert D Clark, bclark@bcmetrics.com, Biochemical Infometrics, 827 Renee Lane, St Louis, MO 63141

The question has often been asked of late whether structure-based virtual screening is inherently superior to ligand-based screening or vice versa. A little reflection shows that the distinction between the two approaches is largely an artificial one, particularly when 3D QSAR methods are being compared to docking with adaptive scoring functions. Both areas have a marked proclivity for producing misleading statistics, especially where "performance" is concerned, but they have other things in common as well. The underlying similarities and differences will be discussed, along with recommendations for minimizing the problems encountered in applying either prospectively, where the distinction between "empirical" approaches and those based on "first-principles" is probably more important.

66 Navigating the Family History Archive: Digitizing the Family History Library collection
Dennis L. Meldrum, meldrumdl@familysearch.org and Jeri Jump, jumpjl@familysearch.org. Book Digital Processing Team, FamilySearch, 50 East North Temple St., RM. 599, Salt Lke City, UT 84150

The Family History Archive is a growing collection of thousands of digitized (full text) published genealogy and family history books. The archive includes family, county and local histories, how-to books, magazines and periodicals, medieval books, and international gazetteers. The books come from the FamilySearch Family History Library, and several other major genealogical collections nationwide. It can be accessed from www.familysearch.org or from www.familyhistoryarchive.byu.edu, free of charge. Items may be searched by author, title, surname, keyword, or full text.

We will also briefly talk about the history of the Archive, why partner libraries joined the project and how they were selected, and what criteria they follow to place books in the Archive.

The presentation will largely focus on the processes we follow to digitize so many books, what equipment and software we use – including how it works and what modifications were made so that it would meet our needs.

67 Understanding genetic genealogy and the importance of DNA databases
Bennett Greenspan and Max Blankfeld, max@familytreedna.com. Family Tree DNA, 1445 North Loop West, Suite 820, Houston, TX 77008

Genetic Genealogy is a powerful new tool, which is used in conjunction with family history research. FamilyTreeDNA pioneered this field when in April of 2000 it made available to the wide public what until then was restricted to the academia and research. There are two basic types of DNA tests available for genealogy: Y-DNA and mtDNA tests. The Y-DNA test is only available for males, since the test involves testing the Y-chromosome, which is passed from father to son. Both males and females inherit mtDNA from their mothers. Testing mtDNA provides information about the direct female line of the person. Because the Y-chromosome typically follows surnames, there is a much wider range of applications for Y-DNA testing, and a much broader spectrum of problems that can be solved, and information that can be acquired, especially when utilizing a large comparative database. This will be the main focus of the presentation.

68 Identification of the remains found at the crash site of Northwest Flight 4422 using forensic genealogy and DNA analysis
Colleen Fitzpatrick, CFitzp@aol.com, Yeiser & Associates, 18198 Aztec Ct, Fountain Valley, CA 92708 and Odile Loreille, odile.loreille@us.army.mil, Armed Forces DNA Identification Laboratory, 1413 Research Boulevard, Rockville, MD 20850.

We describe how forensic genealogy and DNA analysis were used to identify severely compromised remains found in the debris field of Northwest Airlines Flight 4422 that crashed in 1948 in a remote area of Alaska. The frozen human arm and hand, discovered in 1999, were assumed to belong to one of the thirty crash victims. Despite the challenges faced by DNA analysis and fingerprint matching such degraded remains, by September 2007 all but two victims had been ruled out by either or both techniques. Victim #29 presented additional problems due to the difficulty in locating a mitochondrial DNA reference for his maternal family line in Ireland. We report how these challenges were overcome by forensic methods of genealogical research combined with new DNA analysis techniques to make a positive identification of remains that had been preserved in a glacier for over 50 years.

69 How the Pilgrims brought colon cancer to the New World and how Utah Population Database outed them
Deborah W. Neklason, deb.neklason@hci.utah.edu1, Jeffery Stevens, jstevens@genetics.utah.edu2, Kenneth Boucher, ken.boucher@hci.utah.edu1, Richard Kerber, rich.kerber@hci.utah.edu1, Geraldine Mineau, geri.mineau@hci.utah.edu1, and Randall Burt, randy.burt@hci.utah.edu3. (1) Department of Oncological Sciences, University of Utah, Huntsman Cancer Institute, 2000 Circle of Hope, Salt Lake City, UT 84112, (2) Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, (3) Department of Medicine, University of Utah

Utah Population Database (UPDB) is a unique resource of genealogic data linked through probability modeling to causes of death and Utah and Idaho cancer records. UPDB has been used by geneticists to select families likely to have a genetic condition and identify the genes involved. The APC gene, responsible for Familial Adenomatous Polyposis (FAP) and colorectal cancer is one such example. A Utah pioneer family from the 1840's and a family from New York carry an attenuated form of the FAP (AFAP). They were linked through genealogy records to a couple who came to America from England around 1630 and descend 16 generations to present day. Genetic analysis of fifteen families from across the USA with this same APC mutation shows that they are related. In view of the apparent age of this mutation, a notable fraction of colorectal cancers in the USA could be related to this founder mutation.

70 Adaptive scoring for comparing ligand binding sites and predicting binding modes and affinities
Leslie A. Kuhn, KuhnL@msu.edu1, Matthew E. Tonero, toneroma@msu.edu2, Jeffrey R. Van Voorst, vanvoor4@msu.edu3, and Mária I. Závodszky, zavodszk@msu.edu2. (1) Departments of Biochemistry & Molecular Biology and Computer Science & Engineering, Michigan State University, 502C Biochemistry Building, East Lansing, MI 48824-1319, (2) Department of Biochemistry & Molecular Biology, Michigan State University, 502C Biochemistry Building, East Lansing, MI 48824-1319, (3) Departments of Computer Science & Engineering and Biochemistry & Molecular Biology, Michigan State University, 502C Biochemistry, East Lansing, MI 48824-1319

We have developed a scoring function training and testing paradigm in which linear combinations of terms can be constructed in a systematic way, with weights determined to fit the metric of interest. This metric may be the RMSD of fit between related ligand binding sites, or the RMSD of a ligand docking relative to its crystallographic binding mode, or the least-squares fit between predicted and experimentally characterized binding affinities. This procedure includes iterative repartitioning of the training set to assess the stability of terms' weights across different sets of proteins, followed by cross-validation of predictive accuracy on proteins not included in the training. Results will be presented for aligning and quantifying similarity between binding sites, and for improving ligand docking and ranking in virtual screening.

71 Avoiding pitfalls in molecular docking
Sally Ann Hindle, Sally.Hindle@biosolveit.de, Carsten Detering, detering@biosolveit.de, Marcus Gastreich, marcus.gastreich@biosolveit.de, and Holger Claußen, Holger.Claussen@biosolveit.de. BioSolveIT GmbH, An der Ziegelei 75, 53757 St. Augustin, Germany

Despite the good quality in uploaded protein structures, there are still uncertainties when it comes to defining an active site. The correct physico-chemical surrounding of a ligand can be crucial though in molecular docking and screening. E.g. defining metal atoms as pharmacophores, assigning an alternative amino acid and protonation state and the correct insertion and orientation of (displaceable) water molecules all play a vital role in the preparation of an active site for docking. Also, it might be necessary to scale the contribution of certain interactions to the overall score.

We show recent advances to prepare an active site for docking and screening and show a few proof-of-concepts.

72 Designing drugs against multiple parameters: Scoring functions for multiparameter ligand based de novo design
Brian B. Masek, bmasek@tripos.com1, Karl M. Smith1, Stephan C. Nagy, snagy@tripos.com1, James R. Damewood2, and Charles L. Lerman2. (1) Informatics Research Center, Tripos International, 1699 S. Hanley Rd., St. Louis, MO 63144, (2) CNS Chemistry, AstraZeneca, 1800 Concord Pike, Wilmington, DE 19850

Successful drug discovery often requires optimization against a set of biological and physical properties. We describe our work on multi-parameter approaches to ligand-based de novo design and studies that demonstrate its ability to successfully generate lead hops or scaffold hops between known classes of ligands for some example receptors. Multiple design criteria, including pharmacophoric similarity, shape similarity, structural (fingerprint) similarity can be employed alongside various selectivity or ADME related properties (e.g. Lipinski properties, polar surface area, similarity to off-targets, etc..) to guide the evolution of structures which meet multiple design criteria.

73 Scoring synthetic feasibility: A very different problem
A. Peter Johnson, P.Johnson@leeds.ac.uk, School of Chemistry, University of Leeds, Leeds, LS2 9JT, United Kingdom, Krisztina Boda, OpenEye Scientific Software, 9 Bisbee Court, Suite D, Santa Fe, NM 87508, Glenn J. Myatt, gmyatt@leadscope.com, Leadscope, Inc, 1393 Dublin Road, Columbus, OH 43215, and J. Christian Baber, Chemical and Screening Sciences, Wyeth Research, 200 CambridgePark Drive, Cambridge, MA 02140

A huge amount of effort has gone into the problem of predicting the binding affinity of given poses of hypothetical ligands docked to protein binding sites. However, if these hypothetical ligands have been produced by de novo design, an equally important consideration is whether they are synthetically accessible. Over the past decade, we have attempted to address this problem in a variety of ways. The CAESA program combines an empirical approach to molecular complexity with a relatively rapid retrosynthetic analysis to find starting materials, the hypothesis being that complexity contained within readily available starting materials is apparent rather than true complexity. An alternative approach, incorporated into the SPROUT program, analyses structural complexity by comparing substitution patterns of ligand structures with those found in known drugs and databases of commercially available starting materials. The relative merits of these approaches will be discussed.

74 Text mining for chemistry and building a public platform for document markup
A J Williams, tony@chemspider.com, ChemZoo, 904 Tamaras Circle, Wake Forest, NC 27587

The identification of chemical names in documents has provided platforms to enable structure-based searching of patents and mark-up chemistry publications. A natural extension is the ability to make chemistry articles, blog pages, wiki pages and other documents searchable by the extracted chemical structures. The ChemSpider database is built on a database of over 21 million unique chemical entities from close to 200 data sources and provides a rich resource of information for chemists. We will report on our efforts to integrate chemical name extraction with the ChemSpider platform to enable structure searching of Open Access chemistry articles, and online chemistry materials. We will unveil our online document markup platform for chemists to make both their open- and closed-access publications searchable by the language of chemistry – the structure.

75 Extending the scope of journal articles: Certifying and publishing experimental data
Irina Sens, irina.sens@tib.uni-hannover.de1, Jan Brase, Jan.Brase@tib.uni-hannover.de1, Susanne Haak, Susanne.Haak@thieme.de2, and Guido F. Herrmann, guido.herrmann@thieme.de2. (1) German National Library of Science and Technology (TIB), Welfengarten 1B, 30169 Hannover, Germany, (2) Thieme Chemistry, Georg Thieme Verlag KG, Rüdigerstrasse 14, Stuttgart, 70469, Germany

Experimental and Theoretical Data (Primary Data) constitute the backbone of research in chemistry. Primary data are recorded, analyzed and stored everyday in every chemistry laboratory. Typical primary data in chemistry are created

* Using the vast array of analytical techniques (GC, HPLC etc.)

* Employing spectroscopic methods (NMR, MS, UV/VIS, IR, X-Ray etc.)

* As a result of theoretical calculations (quantum mechanics, simulation of spectra etc.)

* Or by using the various high-throughput technologies in medicinal chemistry.

Efficient access to primary data is a prerequisite for successful chemical research. Chemists need access to their own data and to reference data from the chemical literature.

So far chemists have not developed a managed system of storing and publishing their primary data. Some journals offer the possibility to augment the publication with supplementary material. However, the accessibility of this material remains far from optimum.

The TIB is aiming to improve this situation. Since 2005 the TIB is recognized as the world's first registration agency for primary scientific data. One of the first scientific disciplines that systematically publish and care for their primary data are the geological sciences. These data –which remain on the local servers-, receive from the TIB a permanent and individual Digital Object Identifier (DOI). Similar to a journal article this technology allows for easy, permanent and error-free reference and retrieval of the primary data.

One objective is to extend this technology to other scientific disciplines. In this context the TIB and Thieme Chemistry have started a collaboration to develop the technology, rules and procedures to publish chemical primary data with their own DOI.

The talk will present first results and invite the community for discussion and input.

76 Ontologies for nanotechnology
Colin R Batchelor, batchelorc@rsc.org, Royal Society of Chemistry, Thomas Graham House, Milton Road, Cambridge CB4 0WF, United Kingdom

Ontologies, formal computer-readable descriptions of the objects of interest in a particular field, are widely used in molecular biology and, along with the InChI identifier, form the basis of the RSC's award-winning Project Prospect. Hitherto the approaches of formal ontology have not been applied to nanotechnology. In this talk we outline good practice in ontology development and describe our recent successes in developing ontologies to represent nanoparticles themselves and the methods used to create them

77 Pistoia alliance: Emerging cross pharma collaboration
Ashley George, ayg8615@gsk.com1, Debra Igo, debra.igo@novartis.com2, Kevin Hebbel, kevin.c.hebbel@pfizer.com3, Nick Lynch, nick.lynch@astrazeneca.com4, Thomas Mueller2, Matthias Nolte, matthias.nolte@chemitment.com5, and Chris L. Waller, chris.waller@pfizer.com6. (1) Cheminformatics, GlaxoSmithKline, New Frontiers Science Park, Third Avenue, Harlow, CM19 5AW, United Kingdom, (2) Novartis, Cambridge, MA 02139, (3) Pfizer, Groton, CT 06340, (4) AstraZeneca, Alderley Park, SK10 4TF, United Kingdom, (5) chemITment, Inc, 47 Lake Road, Amston, CT 06231, (6) Chemistry Informatics, Pfizer, Inc, Eastern Point Road, Groton, CT 06340

The primary purpose of the Pistoia Alliance1 is to streamline non-competitive elements of the pharmaceutical drug discovery workflow by the specification of common business terms, relationships and processes. Every pharma company & software vendor is challenged by the technical interconversion, collation and interpretation of drug/agrochemical discovery data and as such, there is a vast amount of duplication, conversion and testing that could be reduced if a common foundation of data standards, ontologies and web-services could be promoted and ideally agreed within a nonproprietary and non-competitive framework. This would allow interoperability between a traditionally diverse set of technologies to benefit the healthcare sector. Through global collaboration, this pragmatic community will derive and instantiate and make available web-services for consumption by Academic institutions, Vendors and Companies under an Open Source framework. We will describe current progress, learnings and how companies, academics and others can participate in this approach.

78 Cleaning up chemistry for the pharma industry: Delivering a flexible platform for interrogating the FDA DailyMed website
A. Williams, antony.williams@chemspider.com, ChemZoo Inc, 904 Tamaras Circle, Wake Forest, NC 27587 and Rudy Potenzone, rudolphp@microsoft.com, World Wide Industry Technology Strategies, Microsoft, WA

DailyMed is a website hosted by the FDA providing access to information about marketed drugs. This information includes FDA approved labels (package inserts) and provides a standard, comprehensive, up-to-date, look-up and download resource of medication content and labeling as found in medication package inserts. With an intention of enhancing the dataset by making it searchable by chemical structure/substructure we determined that the data contained numerous chemistry errors. We have therefore used a combination of text-mining, automated and manual curation to improve the quality of the data set. In so doing we have also made querying of the data more flexible. Specifically we have used the Microsoft Sharepoint technology to create a portal allowing both text-based and structure-based querying. We will report on the advantages such an approach delivers in terms of flexible interrogation of DailyMed.

79 The use of EPA software and Scranton University green chemistry web page in the green engineering course in Universidad de los Andes
Gabriel Camargo, gcamargo@uniandes.edu.co, Department of Chemical Engineering, Universidad de Los Andes, Cra 1E # 19 A - 40, Bogota, Colombia, Francisco Segura, fr-segur@uniandes.edu.co, Chemical Engineering Dpt, Universidad de los Andes, Carrera 1E No 18 A -70, Bogota, Colombia, Astrid Altamar, aaltamar_posgrado@unilibre.edu.co, Environmental Engineering, Postgraduated Institute, Cra 70 No 53 - 40, Bogota D. C, Colombia, and Joaquin E Tirano, jtirano@uniandes.edu.co, Chemical Engineering Department, Universidad de los Andes, Cra 1E # 19 A -40, Bogota, Colombia

The course of green engineering was proposed as an elective signature in Universidad de los Andes. For de development of this course a planed activities was carried out. Three laboratories practice were done; synthesis of catalysts supports, elaboration of biodiesel and glycerin oxidation by heterogeneous catalysis with impregnated catalysts. In these practices the greenness of the processes were followed by measure of the material balance and waste generated and the EPI suite program was used to evaluate the environmental performance of the reactants intermediates and products substances. Other software available, in the EPA web page, was used too. The green chemistry web page at Scranton University was used to evaluate the engineering aspects of the different green topics. The bibliographic resources of Universidad de los Andes Library was used for more information about the green topics. Other web page was used for the student in the catalysts characterizations techniques, with good results

80 Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction
Christoph Steinbeck, steinbeck@ebi.ac.uk1, Stefan Kuhn1, Steffen Neumann2, Björn Egert2, and Gilleain Torrance1. (1) Chemoinformatics and Metabolism, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, United Kingdom, (2) Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, Weinberg 3, Halle, 06120, Germany

Current efforts in Metabolomics, such as the Human Metabolome Project, collect structures of biological metabolites as well as data for their characterisation, such as spectra for identification of substances and measurements of their concentration. Still, only a fraction of existing metabolites and their spectral fingerprints are known. Computer-Assisted Structure Elucidation (CASE) of biological metabolites will be an important tool to leverage this lack of knowledge. Indispensable for CASE are modules to predict spectra for hypothetical structures.

This talk describes our experiments with different statistical and machine learning methods to perform predictions of proton NMR spectra based on data from our open database NMRShiftDB [1].

A mean absolute error of 0.18 ppm was achieved for the prediction of proton NMR shifts ranging from 0 to 11 ppm. Random forest, J48 decision tree and support vector machines achieved similar overall errors.

NMR prediction methods applied in the course of this work delivered precise predictions which can serve as a building block for Computer-Assisted Structure Elucidation for biological metabolites.

We will also elaborate on our first predictions and structure-recall experiments with large public databases and demonstrate how this can be useful in de-novo CASE contexts. All experiments describe in the course of this talk were performed based on our open source chemoinformatics library, the Chemistry Development Kit (CDK) [2] and open access data.

[1] Steinbeck, C., Krause, S. & Kuhn, S. NMRShiftDB - Constructing a free chemical information system with open-source components. J Chem Inf Comput Sci 43, 1733-1739 (2003).

[2] Steinbeck, C. et al. The chemistry development kit (cdk): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43, 493-500 (2003).

81 Development of test systems for pharmacophore elucidation
Jason C. Cole, cole@ccdc.cam.ac.uk1, Eleanor J. Gardiner, e.gardiner@sheffield.ac.uk2, Valerie J. Gillet, v.gillet@sheffield.ac.uk2, and Robin Taylor, robin.t@virgin.net3. (1) Cambridge Crystallographic Data Centre, 12, Union Road, Cambridge CB5 8QD, United Kingdom, (2) Department of Information Studies, University of Sheffield, Western Bank, Sheffield, United Kingdom, (3) 54 Sherfield Avenue, Rickmansworth WD3 1NL, UK, United Kingdom

Pharmacophore elucidation is a difficult problem involving the determination of the 3D description of ligand-protein interactions in the absence of the protein receptor. One of the reasons for lack of progress in the field is the lack of appropriate test data which hampers algorithm development and can lead to programs which perform well in the case of well-studied examples and poorly in unknown situations. We are developing a challenging set of test systems (ranging in size from 2 to 16 ligands), based on a study of the Astex cross-docking test set. Currently the pharmacophore test set contains ten systems. Previously we have developed a Multi-Objective Genetic Algorithm (Cottrell et al, JCAMD,20,735-749,2006). We describe the construction of the test sets and give results obtained by the MOGA on a selection of the test complexes, illustrating some of the problems posed by this challenging set.

82 Integrating LSQ, PLS and robust regression visualization to find best QSPR models
George D. Purvis III, gpurvis@us.fujitsu.com, Scigress Development, Biosciences Group, Fujitsu, 15244 NW Greenbrier Pkwy, Beaverton, OR 97006 and David T. Stanton, stanton.dt@pg.com, Procter & Gamble, Miami Valley Innovation Center, 11810 East Miami River Road, Cincinnati, OH 45252

Integrated visualization of least squares, partial least squares and robust regression quantitative-structure-property models enables rapid (1) identification of problems in modeled data and structures, (2) location and characterization of outliers, and (3) insights into model interpretation. This talk demonstrates how integrated visualization facilitates the creation of a QSPR model for surface tension from a data set of 399 measurements. Bad data, bad leverage points, bad structures, and inadequacies in descriptor space are rapidly identified and corrected.

83 Automated compound submission and active learning using HT-ADME in silico models
Rishi R. Gupta, CS CoE, Pfizer Inc, Eastern Point Road, MS 8260-1422, Groton, CT 06340, Eric M. Gifford, CS CoE, Pfizer Global Research and Development, Eastern Point Road, MS 8260-1534, Groton, CT 06340, and Matthew Troutman, Director, PDM ADME Technology & Toxicity, Pfizer, MS 8118W-150, Eastern Point Road, Groton, CT 06340.

Recent emphasis on the assessment of the true prediction scope of in-silico models allows us to define a chemical space where we can expect a model to perform within a given accuracy guideline. We can also capture internal statistics for individual models. Using these models, we can reevaluate any need for compound screening while, at the same time, allowing active learning for the model.

We present herein the status of our work towards automated compound submission and active learning. We introduce the concept of “automated submissions”, that is, a mechanism that uses in-silico models and sends only those compounds for screening which it cannot predict with a high level of confidence. This mechanism not only decreases the number of compounds being screened but, also, allows a model to iteratively expand its chemical space where it has limited prediction scope.

We believe that there are several practical applications of this concept. For example, the model can choose compounds outside of the training sets' chemical space to send for screening and, thus, increase chemical space coverage over time. This process delivers significant cost savings.

84 Reaction simulation expert system for synthetic organic chemistry
Jonathan H. Chen, chenjh@uci.edu and Pierre Baldi, pfbaldi@uci.edu. Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697

The long term goal of this project is to develop a computerized system with problem-solving capabilities in synthetic organic chemistry comparable to those of a human expert. At the core of such a system should be the ability to predict the course of chemical reactions to, for instance, validate synthesis plans. Our first approach, based on encoding expert knowledge as transformation rules, achieves predictive power competitive with chemistry graduate students, but requires significant knowledge engineering to expand its coverage to new reactivity. To overcome this limitation and achieve greater predictive power, our current approach is not based on specific rules, but instead upon general principles of physical organic chemistry. These principles allow the system to elucidate the mechanistic pathways and reaction coordinate energy diagrams of simulated reactions. These results directly mimic the qualitative problem-solving ability of human experts, but with the speed, precision, and combinatorial power of an automated system.

85 A BLAST-like tool for chemoinformatics and drug discovery
Pierre Baldi, pfbaldi@uci.edu, Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697

Small molecules can be used as combinatorial building blocks for chemical synthesis, as probes for analyzing biological systems, and for the discovery of drugs and other useful compounds. Large repositories containing millions of small molecules have recently become publicly available. The tools to search these repositories, however, lack the statistical precision and effectiveness of comparable tools developed to search repositories of biological sequences, such as BLAST. A fundamental bottleneck is that the theory of the distribution and statistical significance of chemical similarity scores has not yet been developed. Here we remove this bottleneck by developing: (1) chance models of molecular fingerprints; (2) accurate approximations to the similarity score distribution; (3) accurate approximations to the extreme value distribution of similarity scores; (4) z-scores and e-values (p-values) to measure the statistical significance of a chemical similarity scores. The approach is validated in several projects, including finding new drug leads against important targets.

86 Binding of alkali metal cations (Li+, Na+ and K+) with mono- and bi-cyclic ring fused benzenes: A theoretical study
T. C. Dinadayalane, dina@ccmsi.us and Jerzy Leszczynski, jerzy@ccmsi.us. Computational Center for Molecular Structure and Interactions, Department of Chemistry, Jackson State University, 1400 JR Lynch Street, PO Box 17910, Jackson, MS 39217

The interactions of alkali metal cations (Li+, Na+ and K+) with the cup-shaped molecules – tris(bicyclo[2.2.1]hepteno)benzene and tris(7-azabicyclo[2.2.1]hepteno)benzene have been investigated using MP2(FULL)/6-311+G(d,p)//MP2/6-31G(d) level of theory. The geometries and interaction energies are compared with the metal ion bound complexes of trindene, benzotripyrrole and benzene. The cup-shaped molecules exhibit two faces or cavities (top and bottom). The cavity selectivity of the cup-shaped molecules by alkali metal ions is discussed. As evidence obtained from the values of pyramidalization angles, the host molecule becomes deeper bowl when the lone pair of electrons of nitrogen atoms participates in binding with cations. Molecular electrostatic potential surfaces nicely explain the cavity selectivity in the cup-shaped systems and the variation of interaction energies for different ligands. Vibrational frequency analysis is useful in characterizing different metal ion complexes and to distinguish top and bottom face complexes of metal ions with the cup-shaped molecules.



Newspaper template for websites