#234 - Abstracts

ACS National Meeting
August 19-23, 2007
Boston, MA

 1 Mining the receptorome: A powerful approach for predicting efficacies and side-effects of repositioned medications
Bryan L. Roth, National Institute of Mental Health Psychoactive Drug Screening Program and Department of Biochemistry, University of North Carolina at Chapel Hill, School of Medicine, Department of Pharmacology, 8032 Burnett-Womack, CB # 7365, Chapel Hill, NC 27599, bryan_roth@med.unc.edu, Phone: 919-966-7535

The in vitro pharmacological profiling of drugs using a large panel of cloned receptors, an approach that has come to be known as 'receptorome screening', has unveiled novel molecular mechanisms responsible for the actions and side effects of certain drugs. For instance, receptorome screening has been employed to uncover novel molecular targets involved in the actions of antipsychotic medications and the hallucinogenic mint extract salvinorin A. Receptorome screening has also implicated serotonin 5-hydroxy-t-ryptamine 2B receptors in the adverse cardiovascular effects of several medications and subsequent clinical studies have corroborated this prediction (see Roth NEJM, 2007). Receptorome screening represents one of the most effective methods for identifying potentially serious drug-related side effects at the preclinical stage, thereby avoiding significant economic and human health consequences. Receptorome screening also represents a powerful approach for rationally repositioning existing medications.

Supported by NIMH PDSP and Grants from NIMH and NIDA

 2 Side effect profile prediction: Computational tackling of big pharma's worst nightmare at an early stage
Josef Scheiber, Jeremy L. Jenkins, Andreas Bender, Steven Whitebread, Jacques Hamon, Laszlo Urban, Kamal Azzaoui, James H. Nettles, Meir Glick, and John W. Davies, Lead Finding Platform, Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, josef.scheiber@novartis.com, Phone: 617-871-3697

Adverse effects of drugs that are only identified after a compound enters the clinic seriously limit therapeutic potential and could result in withdrawal from the market. Two well-known examples in recent years were Rofecoxib (Vioxx®) and Cerivastatin (Baycol®, Lipobay®), but there are other examples. Avoiding such adverse effects is therefore a key goal in the development of a drug. It would be desirable to have a computational tool that predicts possible problems even before a compound has been synthesized. Bender et al. have shown a proof-of-principle for predicting adverse events based on chemical structure. In the current study we present an advancement of this method. Approximately 200 marketed drugs were tested against 80 different targets in the Novartis Safety Profiling Panel and the IC50 values were determined. The well-documented adverse effects of these marketed drugs were stored in a database using standard MedDRA terminology. For every target, models were calculated and validated using both a Naïve Bayesian classifier and Linear Discriminant Analysis in conjunction with two chemical descriptors (Extended Connectivity Fingerprints and MDL Public Keys). We present results demonstrating correlations between chemical features and adverse effects on the one hand, and between targets and adverse effects on the other. Therefore the method can be used for predicting adverse events based on chemical structure alone. Furthermore, novel links between targets and adverse effects can be unraveled which are of interest in their own right, but which can also be applied to select targets for in vitro compound profiling.

 3 Development of high-throughput repurposing technologies
J. Prous Jr. D. Aragones

Over the past decade, and despite major advances in new technologies, the pharmaceutical sector has witnessed how the number of new drugs introduced in the market every year has stayed level or decreased while the cost of drug discovery and development has significantly increased. The safety of drugs used in clinical practice is under constant scrutiny and the withdrawal of several compounds in recent years confirms the serious productivity challenges faced by modern biomedical research. BioEpisteme, a knowledge-based project, was initiated to overcome these productivity bottlenecks and to contribute to the faster discovery of new and safer drugs, as well as the finding of new uses for known molecules. In-house developed datamining algorithms have led to a model that characterizes more than 400 different molecular mechanisms of action simultaneously. The development of the project and its application in explaining new therapeutic applications for angiotensin AT1 antagonists will be presented.

 4 Use of integrative pharmacology in drug repositioning
Thomas Barnes, Genomic Pharmacology, Gene Logic, Inc, 38 Sidney St., Cambridge, MA 02139, TBarnes@genelogic.com, Phone: 617-649-2034

Across the pharma and biotechnology industry, reduced hurdles in lead identification are resulting in the screening of druggable targets with weaker disease hypotheses, which will increase the risk and thus incidence of programs that fail in the intended therapeutic area due to lack of efficacy. Nevertheless, these activities will result in a set of chemical tools with which to probe target function and thereby link the corresponding compounds to new therapeutic utility. What is required is sufficiently high throughput methodologies to make de novo links between specific compounds and disease.

We have integrated a set of technologies that provide the means of efficiently associating compounds with potential new therapeutic utility. This is in stark contrast to the unsystematic and serendipitous observations that are classically relied upon to reveal alternative or new drug indications. The promise of these technologies is to expeditiously reduce pipeline gaps within a pharmaceutical industry whose growth is threatened by reduced (and increasingly costly) new product flow.

 5 Transcriptional connectivity map for biomedical discovery
Justin Lamb, Broad Institute of MIT and Harvard, Seven Cambridge Center, Cambridge, MA 02142, justin@broad.mit.edu, Phone: 617-252-1522

Genome-wide transcriptional analysis provides a comprehensive molecular representation of cellular activity, suggesting that mRNA expression profiling could serve as a practical universal functional bioassay. High-throughput high-density gene-expression profiling solutions raise the possibility of capturing the consequences of small-molecule and genetic perturbations at library and genome scale, respectively, and associating these disparate perturbagens with each other and external organic phenotypes to discover decisive functional connections between drugs, genes and diseases. The talk will describe our technology platform, analysis methods and interpretive tools, and illustrate how our solution can be used to identify valuable new activities of bioactive small molecules, with particular emphasis on existing pharmaceuticals.

 6 GAUDI: An integrated tool for navigating through the small molecule - target protein space
Jordi Mestres, Chemotargets SL, Parc de Recerca Biomèdica (483.04), Doctor Aiguader 88, 08003 Barcelona CAT, Spain, Fax: +34 93 2240875, jmestres@imim.es, Phone: +34 93 2240882, and Tudor I. Oprea, Division of Biocomputing, University of New Mexico School of Medicine, MSC11 6145, University of New Mexico, Albuquerque, NM 87131, toprea@salud.unm.edu, Phone: 505 272 6950

In modern drug discovery, it is no longer acceptable to test compounds synthesized within a hit or lead optimisation program against one primary target and a couple of anti-targets. Efforts towards the construction of annotated chemical libraries are connecting hundreds of thousands of compounds to hundreds of protein targets and thus highlight the need for novel integrative tools for the in silico pharmacological profiling of compounds, with potential applications from side-effect alert systems to drug repurposing. GAUDI is a tool designed to extract knowledge from the complex interaction space between small molecules (e.g., chemical genomics) and protein targets (e.g., proteomics). In its first release, it provides an integrative vista for navigating across WOMBAT [1], an annotated chemical library covering a chemical space of 190.000 molecules and a target space of over 1450 proteins. The integration between chemical and biological spaces is achieved by simultaneously combining bio- and chem-informatics tools for the classification of small molecules and target proteins, respectively. The result is a new generation of integrative datamining tools to extract knowledge from data stored in annotated chemical libraries.

[1] M. Olah et al., WOMBAT and WOMBAT-PK. In: Chemical Biology; edited by S.L. Schreiber, T.M. Kapoor, G. Wess. Wiley-VCH 2007, Weinheim, pp 760-786.

 7 Relating protein pharmacology by ligand chemistry
MJ Keiser1, Bryan L. Roth2, BN Armburster2, P Ernsberger2, John J Irwin1, and Brian K. Shoichet1. (1) Department of Pharmaceutical Chemistry, University of California, San Francisco, 1700 4th Street, San Francisco, CA 94143, keiser@gmail.com, jji@cgl.ucsf.edu, Phone: 415-514-4253, (2) National Institute of Mental Health Psychoactive Drug Screening Program and Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599

We present a technique that quantitatively groups and relates proteins based on the chemical similarity of their ligands. Starting with 65,000 ligands annotated into sets for hundreds of drug targets, we computed a similarity score between each set using ligand topology. The significance of the resulting similarity scores, normalized using a statistical model, were expressed as a minimum spanning tree to map the sets together. Although these maps are connected solely by chemical similarity, biologically sensible clusters nevertheless emerged. Links among unexpected targets also emerged, among them that methadone, emetine and loperamide (Imodium) may antagonize muscarinic M3, alpha2 adrenergic and neurokinin NK2 receptors, respectively. These predictions were subsequently confirmed experimentally. Relating receptors by ligand chemistry organizes biology to reveal unexpected relationships that may be assayed using the ligands themselves. It has not escaped our notice that this approach may be useful for drug repurposing.

8 Hypothesis-driven drug reprofiling based on a novel systems biology approach.
Fredric S. Young, Vicus Therapeutics, LLC, 55 Madison Avenue, Suite 400, Morristown, NJ 07960, fyoung@vicustherapeutics.com, Phone: 973-919-0549

We have developed a hypothesis driven drug re-profiling approach to create a pipeline of product candidates in pre-clinical and clinical development. We identify combination therapies of marketed drugs that target reaction blocks associated with the central disease causing processes. We identify the central processes through use of a pattern classifier of homeostasis and pathology. The pattern classifier is derived from the set of flux invariants associated with a self-organized control state of an absorbing state phase transition with multiple fluxes and multiple compartments. In our systems biology approach, biological systems are modeled using an object process methodology with a top down control analysis based on a universal flux control module. The role of specific genes, proteins and drug targets are defined as a function of their place in the hierarchical network of flux modules that carry out specific disease causing processes.

9 CRC Handbook of Chemistry and Physics: E-book and beyond
Fiona Macdonald1, David R. Lide2, and Robert Morris1. (1) Taylor and Francis/CRC Press, 6000 Broken Sound Parkway NW, Boca Raton, FL 33411, Fax: 561-998-2559, Fiona.macdonald@taylorandfrancis.com, Phone: 561-998-2564, (2) CRC Press, Editor, Gaithersburg, MD 20878

For more than 90 years the CRC Handbook has been a fixture on laboratory shelves. Its transition from print classic to interactive reference tool will be discussed, and plans for the future will be unveiled

10 Challenges in building e-books collections
Andrea Twiss-Brooks, John Crerar Library, University of Chicago, 5730 S. Ellis Ave, Chicago, IL 60637-1403, atbrooks@uchicago.edu, Phone: 773-702-8777

Faculty, researchers, and students who use academic libraries are familiar and comfortable with print journals and books. Recent changes in scientific journals publishing have been embraced by users and electronic only access to journals is more or less accepted by the academic community. Scholarly books and monographs are viewed somewhat differently by faculty and students, and e-books have not yet become entrenched in quite the same way as e-journals have. Some of the challenges facing acceptance of e-books in academic libraries will be examined. Among the issues addressed will be search and discovery, integration of e-books into the research process, and collection management.

 11 Doing 18th century chemistry in the 21st century: the value of 18th and 19th century digitized books and journals
Stephen A. Koch, Department of Chemistry, State University of New York at Stony Brook, Stony Brook, NY 11794-3400, Fax: 631-632-7960, Stephen.Koch@sunysb.edu, Phone: 631-632-7944

The 1997 discovery, that the hydrogenase enzymes have cyanide as native ligands for iron in their active sites, caused the author to begin research in the area of iron-cyanide chemistry. The fact that this area has been under active investigation for more than 300 years provided interesting challenges when it came to doing literature searches of previous work. The recent availability of backfiles of digitized/searchable journals and the even more recent availability of digitized/searchable chemistry books have greatly aided this effort. Most important, reading and understanding the early work in the area has actually had a direct effect on our current research direction. As an added bonus, the ability to integrate 18th and 19th century chemistry and chemists with my research results has made my lectures on my research work much more interesting and enjoyable. My approach to using digitized 18th and 19th century books and journals will be presented.

12 eBook customers and product design
Caroline F Wain, Marketing and Sales, Royal Society of Chemistry Publishing, Thomas Graham House, Science Park, Milton Road, Cambridge, United Kingdom, wainc@rsc.org, Phone: +44 1223-420-0066

eBook platforms provide a service to both the scientist and the librarian. The publisher is challenged with delivering functionality to meet the requirements of both customer groups, considering their needs at each stage of the product development process.

13 Expect at least six times more usage from e-books than the print version: The acquisitions and usage of a large e-book collection at Texas A&M University
Rustin Kimball1, Gary Ives2, and Kathy M Jackson1. (1) Reference Department, Evans Library, Texas A&M University, 5000 TAMU, College Station, TX 77843-5000, Fax: 979-458-0112, rkimball@lib-gw.tamu.edu, kathy-jackson@tamu.edu, Phone: 979-862-1909, (2) Acquisitions Department, Evans Library, Texas A&M University, College Station, TX 77843-5000

While the availability of e-books generally is met with great enthusiasm from college students, providing library access to e-books from a variety of vendors (whose platforms often are very different) presents many challenges to libraries. The Texas A&M University Libraries provide access to large NetLibrary and Ebrary collections, both of which include many chemistry related titles. In addition, we purchased the electronic reference books from Wiley and Elsevier a year ago. We also offer Knovel, CHEMnetBase, ENGnetBase, and Safari. Recently, our library placed an order for all of the Springer electronic books as well. Currently, our library offers over 60,000 electronic books. This presentation will discuss the different types of electronic books, and will cover the acquisitions and service issues for each type. We will compare the usage figures for electronic books in chemistry and related sciences -to the circulation figures for print books in those cases in which we have both the electronic and print book. We will discuss the reaction of users to these collections, as well as the methods employed by science librarians to publicize their availability.

14 MLR-1023: A drug candidate for type II diabetes with a novel molecular target discovered by using an in vivo drug repositioning approach
Michael S. Saporito1, Christopher A. Lipinski2, Alexander Ochman1, Dana Koemer1, Jan Batten1, and Andrew Reaume1. (1) Melior Discovery, Inc, 860 Springdale Drive, Exton, PA 19341, msaporito@meliordiscovery.com, Phone: 610-280-0633, (2) Scientific Advisor, Melior Discovery, Waterford, CT 06385-4122

Drug repositioning is increasingly recognized as an effective strategy to uncover new therapeutics with reduced developmental risk. Melior Discovery has a unique repositioning approach involving a platform comprised of 35 in vivo models representing diverse therapeutic areas. The power of this platform is illustrated by our lead compound, MLR-1023. This compound, originally developed for ulcers, exhibits robust activity in a panel of clinically relevant models of type II diabetes. For example, when compared to metformin in acute studies, MLR-1023 produced an equivalent glucose lowering response at a significantly lower dose. In comparison to rosiglitazone in chronic studies, MLR-1023 exhibited equivalent efficacy without promoting weight gain. Of importance was the identification of a previously unknown molecular target for type II diabetes. This example of Melior Discovery's approach demonstrates the potential for capturing new indications from existing molecules, and the potential for expanding our understanding of the underlying biological basis of disease.

15 Construction of a virtual library of endocrine disruptors for in silico target fishing
Christian Laggner1, Lyubomir G. Nashev2, Daniela Schuster1, Thierry Langer1, and Alex Odermatt2. (1) Department of Pharmaceutical Chemistry, Computer Aided Molecular Design Group, University of Innsbruck, Institute of Pharmacy, Innrain 52c, Innsbruck A-6020, Austria, Fax: +43-512-507-5269, Christian.Laggner@uibk.ac.at, Phone: +43-512-507-5268, (2) Institute of Molecular and Systemic Toxicology, Department of Pharmaceutical Sciences, University of Basel, Basel 4056, Switzerland

The accumulated exposure to naturally occurring compounds, drugs, consumer products, and industrial chemicals that disturb endocrine functions may cause serious health problems, such as sexual and behavioural disorders, asthmatic and allergic diseases, as well as certain forms of cancer. We present a chemical library of compounds with suspected endocrine disrupting effects that is suitable for different virtual screening approaches, thus facilitating the identification of potential targets of endocrine disruptors. Names and CAS numbers for over 143000 substances related to effects on the endocrine system were taken from the publicly available Endocrine Disruptor Priority Setting Database and were used to retrieve the corresponding chemical structures from the PubChem Project, a rapidly growing collection of chemical information from a variety of sources. The combined entries were filtered for errors before constructing our final screening database. The wide applicability of this library underlines the power and usefulness of publicly available chemical information.

16 Emergency discovery of novel antimicrobials among known drugs in response to new and re-emerging infectious threats
Artem Cherkasov, Division of Infectious Diseases, University of British Columbia, 2733 Heather Str, Vancouver, BC V5Z 3J5, Canada, Fax: 604-875-4013, artc@interchange.ubc.ca, Phone: 604-875-4588

Emergence of new infections is an increasing public health threat. The problem is that conventional antibiotic development is time-consuming, not very efficient and expensive. Moreover, current legal regulations require years of rigorous studies before a new antibiotic can enter the public sector. It becomes increasingly evident that such methodology doest not keep with emerging and re-emerging infections. As a partial but very rapid solution to this challenge we propose to identify established therapeutics with already approved toxicity and bioavailability properties that also exhibit sufficient activity against novel and re-emerging human pathogens. To assist such discoveries we have developed several QSAR approaches such as quantitative models of ‘Antibiotic-likeness' and ‘Bacterial-metabolite-likeness' enabling accurate recognition of antimicrobial substances from large collections of chemical structures. The developed models were able to relate several drugs from Merck database (with no antimicrobial annotation) to predicted antimicrobial action which has later been confirmed by other literature sources.

17 Effective and rapid bio-activity profiling using pharmacophore-based parallel screening
Theodora M. Steindl1, Daniela Schuster2, Johannes Kirchmair3, Remy Hoffmann4, Christian Laggner2, Gerhard Wolber3, and Thierry Langer2. (1) Computer-Aided Molecular Design Group, University of Innsbruck, Innrain 52c, Innsbruck A-6020, Austria, Fax: +43-512-507-5269, Theodora.Steindl@uibk.ac.at, Phone: +43-512-507-5264, (2) Department of Pharmaceutical Chemistry, Computer Aided Molecular Design Group, University of Innsbruck, Institute of Pharmacy, Innrain 52c, Innsbruck A-6020, Austria, Fax: +43-1-8174955-1371, thierry.langer@uibk.ac.at, Phone: +43-699-1507-5252, (3) Inte:Ligand GmbH, 2344 Maria Enzersdorf, Austria, (4) Accelrys, Orsay 91898, France

3D Pharmacophore-based parallel screening is introduced as an in silico method to predict the potential biological activities of potential drug molecules. This study presents an application example employing a Pipeline Pilot-based screening platform and a collection of structure-based pharmacophore models built using the LigandScout software for automatic large-scale virtual activity profiling. An extensive set of HIV protease inhibitor pharmacophore models was used to screen different test sets consisting of active and inactive compounds. In addition, we investigated, whether it is possible in a parallel screening system to differentiate between similar molecules / molecules acting on closely related proteins, and therefore we incorporated a collection of other protease inhibitors including aspartic protease inhibitors. The results of the screening experiments show a clear trend towards an enhanced signal to noise ratio (true positives/false positives and true negatives/false negatives).

18 Knowledge-based prediction for alternate indications and targets for known drugs.
A. W. Edith Chan, BioFocusDPI, Commonwealth House, 1 New Oxford Street, WC1A 1NU London, United Kingdom, Fax: +44 (0) 207 074 4700, edith.chan@glpg.com, Phone: +44 (0) 207 074 4642, and John P Overington, BioFocusDPI, London WC1A 1NU, United Kingdom

The concept of finding new uses for known drugs represents a significantly lower risk commercial strategy compared to developing New Chemical Entities (NCEs). There are two general approaches to expanding clinical utility for a known drug: 1) predicting new indications for the compound through the known molecular target and pathway, and 2) predicting new targets (and then new indications) for a drug. Both of these approaches rely crucially on integration of multiple information sources, but rely on fundamentally different approaches for their implementation. Several of our approaches use these databases, along with a series of target sequence and compound structure similarity calculations to make predictions of likely alternate targets or bioactivities for a compound. In this presentation, we outline our approaches of building and then applying a series of highly normalized pharmacology databases to the problem of predicting the primary or alternate molecular targets for a series of known drugs. Secondly we outline the application of these databases to a series of clinical microarray datasets. Finally, some results from a large scale prediction on a collection of ‘historical' drug candidates will be shown.

19 Synergistic advantages of drug reprofiling and clinical trial offshoring in India
J. Maki, Vicus Therapeutics, LLC, 55 Madison Avenue, Suite 400, Morristown, NJ 07960, jmaki@vicustherapeutics.com, Phone: 973-919-0549

The combination of drug reprofiling and clinical trial offshoring offer dramatic improvement in the cost, speed, and risk of clinical development of drug products. We will provide a case study of our FDA-sanctioned Phase 2 clinical trial being conducted in the US and India of our reprofiled drug product for cancer cachexia. Cancer cachexia is a catastrophic wasting disorder associated with advanced cancer for which there is no FDA approved therapy. We will highlight the unique synergistic advantages of reprofiling and offshoring and the steps necessary to realize such advantages. In addition, we will review recent changes in the DCGI (Indian FDA-regulatory equivalent), Indian clinical research infrastructure, and the acceptance of Indian data by the FDA that is driving the explosive growth of clinical research in India.

20 Drug reprofiling platform as a risk leverage strategy in drug discovery
Akinori Mochizuki, Sosei, 4F Ichiban-cho FS Bldg, 8 Ichiban-cho, Chiyoda-ku, Tokyo 102-0082 NA, Japan, Fax: + 81 (0)3 5210 3291, amochizuki@sosei.com, Phone: 81 (0)80 3469 1998

Although significant advances have been achieved in various toxicological predictions, an attrition rate of drug development up to Phase II appears to be remained at same level over the past 20 years.

Sosei's approach to reduce a risk of failure in development by POC study in human is to utilise the compounds that are already know to be tolerable to human. Sosei collect such compounds and incorporate into unique compound library. Sosei, together with technology-based biotechnology companies as an alliance partner, apply various technologies on the library to unlock hidden pharmacology. Development of those once halted compound for new usage enable us to predict lower risk of failure than that of new chemical entities.

In addition to this unique strategy, we carry out various reprofiling approaches including conventional medicinal chemistry method and formulation development on existing drugs, in order to leverage the risks in discovery and development, and concurrently maximise business opportunities and revenues.

21 Taking advantage of therapeutic switching: commercialisation in a world of generic substitution
David Cavalla, Cambridge, England, david.cavalla@ntlworld.com, Phone: +44-1223-858577

Therapeutic switching, the discovery and development of secondary uses for existing drugs, has three substantial advantages in terms of reduced risk, cost and time. Together with the opportunity created by new therapeutic use patents, this represents a highly efficient route to commercially protected new medicines. There are multiple classes of such programs, depending on whether the composition of matter patent supporting the original development is still in force, and whether the active ingredient was ever successfully developed for its original indication. However, the potential for off-label competition from generic products needs to be carefully considered in order to realise the optimal potential from this approach. Case histories will be presented, including examples in the fields of fibrosis and cachexia. These highlight (i) the value of new biology and (ii) the importance of differentiation among an ostensibly similar class of agents to identify improved non-genericisable therapeutics.

22 Improving cross-searchability of interactive e-books in Knovel Library by normalizing chemical data
Sasha Issac Gurke, Product Development, Knovel Corporation, 13 Eaton Avenue, Norwich, NY 13815, Fax: 607-337-5090, sgurke@knovel.com, Phone: 607-337-5600

To ensure cross-searchability and consistency in presentation of interactive tables, chemical names and property data are normalized as e-books are loaded into Knovel Library. The challenges and techniques used for normalization are discussed.

23 Aligning authors, publishers, and customers - finding the right solution for eBooks in chemistry
Michael Forster, STM Books, John Wiley & Sons, Inc, 111 River Street, Hoboken, NJ 07030, mforster@wiley.com, Phone: 201-748-7699

The benefits and advantages of the electronic medium are perhaps greater in the field of chemistry for information that is currently published in the form of print books than in any other discipline. However, the challenges faced by publishers and their customers in delivering these benefits to users are not trivial - and are posed by issues of technology, user behavior, and the marketplace, to name a few. This presentation will provide a brief look at the current market in eBooks, look at some specific issues that exist with respect to chemistry as content matter, and then examine the issues that affect publishers, authors, and the customers and users who make up this community of interest. Some possible future developments and trends for the short and medium term will also be identified, as well as their associated limiting factors.

24 E-Books in chemistry: Are they being used?
Beth Thomsett-Scott, Reference and Information Services, University of North Texas Libraries, P.O. Box 305190, Denton, TX 76226, Fax: 940-565-3695, bscott@library.unt.edu, Phone: 940-369-6437

Electronic journals were widely and rapidly accepted by most faculty and students in chemistry. This paper will examine the trends in usage between e-books and e-journals in chemistry at one university to see if chemistry students and faculty have adopted e-books as quickly as they adopted e-journals. In addition, usage statistics for e-books will be compared to those of their print counterparts. Results will be presented and conclusions discussed with thoughts for the future.

25 Transformation of reference books in chemistry from print to electronic: What works and what doesn't
Meghan Lafferty, Science & Engineering Library, University of Minnesota, 108 Walter Library, 117 Pleasant St SE, Minneapolis, MN 55455, Fax: 612-625-5583, mlaffert@umn.edu, Phone: 612-624-9399

An increasing number of classic reference works in chemistry long available in book form are now also available online. While undeniably more convenient to our users, some of these works take better advantage of the unique features of the electronic medium than others. I will examine how well a variety of chemistry-related reference works have been converted into online versions. Some of the works I will compare include the Kirk-Othmer Encyclopedia of Chemical Technology, the Merck Index, chemistry reference books in Knovel, and CHEMnetBASE and other netBASE products. I will address the following questions and make recommendations. What features offer an improvement over the print? What features send users to the print versions unless they have no other choice? Which works are the best examples of truly transformed books and why?

26 Fractal properties of representations of chemical libraries
Martin Grigorov, BioInformatics, Nestlé Research Center, PO Box 44, Canton de Vaud, Lausanne 1000, Switzerland, Fax: +41 21 785 94 86, martin.grigorov@rdls.nestle.com, Phone: +41 21 785 89 39

There is emerging evidence that real-world datasets are statistically self-similar and thus fractal. In this work I investigate some global topological properties of representation of chemical libraries in spaces defined by molecular descriptors. New algorithms are developed and used in this work, such as the dimension reduction of such chemical data sets by singular value decomposition and the introduction of the correlation dimension as a natural dimension of a chemical space. It is shown that the representations of molecular data sets in chemical spaces possess self-similar properties, characteristic of fractal objects. This important insight allows for a compact statistical description of the datasets as well as for the inference of the number of chemically similar structures existing in the vicinity of any member of such fractal set.

27 Recent trends in library design: "rational design" revisited
Dora Schnur, Computer-Assisted Drug Design, Bristol-Myers Squibb, Pharmaceutical Research Institute, P.O. Box 5400, Princeton, NJ 08543-5400, Phone: 609-818-4004, and Cullen L. Cavallaro, Pharmacopeia, Inc

Diversity has historically played a critical role in the design of combinatorial libraries, screening sets and corporate collections used for lead discovery. Large library design in the 1990's ranged from arbitrary through property based reagent selection to product based approaches. Over time, however, there has been a downward trend in library size as information about the desired targets increased due to the genomics revolution and the increasing availability of target protein structures from crystallography and homology modeling. Concurrently, computing grids and CPU clusters have facilitated the development of structure based tools that screen hundreds of thousands of molecules. Smaller “smarter” combinatorial and focused parallel libraries have replaced those un-focused large libraries in the twenty-first century drug design paradigm. While diversity still plays a role in lead discovery, target family and target specific approaches dominate current efforts in library design. This talk will highlight these library design trends and explore the use of software developed by R. Pearlman for sparse matrix library design.

28 Generating diverse and biologically relevant ensembles of ligand conformers: Addressing flexible rings using a generalized knowledge-based approach
Brian B. Masek1, Roman Dorfman2, Karl M. Smith1, and Robert D. Clark3. (1) Tripos, Inc, 1699 S. Hanley Rd., St. Louis, MO 63144, Fax: (314)-951-3409, bmasek@tripos.com, Phone: (314)-951-3409, (2) Informatics Research Center, Tripos, Inc, St. Louis, MO 63144, (3) Informatics Research Center, Tripos, St. Louis, MO 63144

Very rapid conformational sampling is critically important in many areas of computer-aided drug design. We have developed an alternative approach wherein a selected force field is used to minimize randomized conformations of a drug-like training set of molecular structures. Torsional profiles characteristic of each type of bond in the training set are then extracted from these conformations. We have extended this method to encompass the treatment of flexible rings and the inversion of pyramidal Nitrogen. The conformations produced are biochemically relevant, as indicated by their ability to efficiently reproduce ligand conformations found in X-ray crystal structures.

29 Ligand based virtual screening using BCUT descriptors
Uta Lessel, Department of Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss 88397, Germany, Fax: +49/7351/83-3062, Uta.Lessel@bc.boehringer-ingelheim.com, Phone: +49/7351/54-3062

The presentation shows some ways how to apply DiverseSolutions and the BCUT descriptors for ligand based virtual screening. The results are compared with enrichments produced by other ligand based virtual screening techniques, e.g. Daylight Fingerprints or Feature Trees.

30 ChemModLab: A Web-based cheminformatics modeling laboratory
S. Stanley Young1, Atina D. Brooks1, William Welch1, Morteza G. Khaledi2, Douglas Hawkins1, Kirtesh Patil1, Gary W. Howell1, Raymond T. Ng1, Moody T. Chu1, and Jacquline M. Hughes-Oliver1. (1) NISS, PO Box 14006, Research Triangle Park, NC 27709, Fax: 919 685 9300, young@niss.org, Phone: 919 685 9328, (2) Department of Chemistry, North Carolina State University, Raleigh, NC 27695

ChemModLab is a free, web-based toolbox for fitting and assessing quantitative structure-activity relationships (QSARs). Its elements include a cheminformatic front end to supply molecular descriptors, a set of statistical methods for fitting models, and methods for validating the resulting model. Input is a SD file for compounds and a text file for biological activity or the user can directly input their own descriptors (keeping compound structures confidential). Submitted data sets can be made public or kept private. Five types of descriptors are available and twelve different statistical methodologies are included, largely from the R platform. As promising new QSAR methodologies emerge from the statistical and data-mining communities, they will be incorporated into ChemModLab. The Web site also incorporates links to public data sets. The capabilities of ChemModLab are illustrated using a variety of data sets. Predictive quality varies greatly with the descriptor and modeling method choice.

31 Bridging the gap between discovery data and development decisions
Jeffrey M. Skell, Genzyme, DMPK & Pharmaceutics, 153 Second Ave., Waltham, MA 02451, Fax: 508-661-8517, Jeffrey.Skell@genzyme.com, Phone: (781) 434-3601

Many current discovery programs incorporate pharmaceutics properties (e.g., solubility, permeability, and lipophilicity) into their hit identification and lead optimization testing cascades. However, the uses of these properties are often limited to a pass/fail criterion. Upon candidate nomination, a broader range of laboratory activities are initiated including physical (salt/crystal) form selection and formulation development. The development scientist charged with advancing an optimized formulation of the nominated compound often has little or no control over the single most critical aspect of his charge: the molecular entity. Recently introduced screening techniques attempt to address this deficiency by incorporating standardized physical form and formulation tests into the lead optimization process, at the cost of significantly increasing compound requirements before compound nomination. This presentation will explore the concepts implemented in software tools for modeling solution-phase properties, their ability to address solid-phase properties, and their impact on bridging the gap between discovery data and development decisions.

32 CONCORD and early 3D Search systems.
Andrew Rusinko III, Alcon Laboratories, Inc, 6201 S. Freeway, Ft. Worth, TX 76134, Fax: 817-302-3701, Phone: 817-551-8140, and Karl M. Smith, Tripos, Inc, St. Louis, MO 63144

The exploration and development of search systems based on the three-dimensional (3D) structure of a molecule and not just its connection table, represented a major paradigm shift in cheminformatics and molecular modeling in the late 1980's and early 1990's. Techniques such as molecular surface area and volume calculations, 3D-pharmacophore and shape search, as well as docking studies required accurate small-molecule molecular geometries as a starting point. Since the primary source of large collections of structures at the time was corporate databases, a method was needed to automatically produce reasonable geometries of drug-like molecules quickly from corporate collections. The computer program CONCORD was developed to “rapidly generate high-quality approximate molecular coordinates.” This presentation traces the origins of CONCORD, the impact it had on early 3D search systems and will describe the current status of this classic program, some 20 years later.

33 Application of DiverseSolutions (DVS) in the establishment and validation of a target class-directed chemistry space
Eugene L. Stewart, Peter J. Brown, James A. Bentley, and Timothy M. Willson, Computational, Analytical, and Structural Sciences, GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, Fax: 919-315-0430, Eugene.L.Stewart@gsk.com, Phone: 919-483-0152

We illustrate the use of DiverseSolutions (DVS) in its application to a problem of pharmaceutical interest, target class-directed compound selection and synthesis. We will present the use of DVS in the establishment of a chemistry space for the nuclear receptor (NR) target class and the application of this space in the selection of compounds for screening against orphans within this family of receptors. We will also present the results of a prospective validation study that will demonstrate the utility of these methods and the effectiveness of the chemistry space in this instance. Lastly, we will discuss techniques and general workflow for the application of the NR-directed chemistry space in selecting monomers and compounds for synthesis which meet the appropriate target class criteria.

34 Flexible ligand alignment protocols and their use in de novo design
James R. Damewood and Charles L. Lerman, CNS Chemistry, AstraZeneca, 1800 Concord Pike, Wilmington, DE 19850, Fax: 302-886-5792, Phone: 302-886-5792

A major activity in the design phase of drug discovery involves generating viable ideas of what to make next. NovoFLAP is a ligand-based computer-aided design (CAD) approach that generates new, medicinally relevant ideas starting from compounds known to be active at a biological target of interest. NovoFLAP combines the evolutionary de novo design capabilities of EA-Inventor with FLAP, a robust, ligand-based scoring algorithm. Specific examples of how NovoFLAP has been used to successfully design new and interesting ideas in drug discovery programs will be presented.

35 Cheminformatics for computational chemistry and computer-aided molecular discovery
R. S. Pearlman1, Yubin Wu1, Karl M. Smith2, and Brian B. Masek2. (1) Laboratory for the Development of CADD Software, University of Texas, College of Pharmacy, Austin, TX 78712, pearlman@naphthyl.phr.utexas.edu, Phone: 512-471-3383, (2) Tripos, Inc, St. Louis, MO 63144

Traditional cheminformatics technologies were designed to address the traditional needs of (1) identifying chemical compounds and (2) associating experimentally derived information with those compounds. This presentation will address the evolving needs of (3) identifying the various structures – 2D protomers, 2.5D proto-stereomers, and 3D proto-stereo-conformers – which individual chemical compounds can and do exhibit in various Natural environments (e.g., crystal, solvent, membrane, receptor, etc.) and (4) associating computationally derived information with those structures and the corresponding compounds. In particular, we will address the typically unappreciated consequences which protonation and tautomerization equilibria can have upon both atom-centered and bond-centered chiralities of proto-invertible chiral centers. We also need (5) a robust method to associate any given structure with its corresponding, canonically identified compound. This presentation will discuss algorithms and software tools which address these needs. We will also suggest a “bio-activity-oriented” hierarchical approach for the management of both experimentally and computationally derived chemical information.

36 Molecular profiling of inhibitor analogs of Indinavir and the HIV mutation pattern
Barun Bhhatarai, Department of Chemistry, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810, bhhatarb@clarkson.edu, Phone: 315-268-2357, and Rajni Garg, Department of Chemistry & Biochemistry, California State University San Marcos, San Marcos, CA 92096

Study of mutants associated with Indinavir and its related congeners was performed and the results analyzed using Cheminformatics approach. In continuation of our previous understanding of ‘different pocket sizes for different mutants', this study was aimed to explore the effect of substituents' binding on three major pockets of HIV protease viz. P1' P2 and P3. The information obtained was used to design effective substituents which can be used with other novel pharmacophore(s) to generate new leads. Different mutant variants such as K60C, V18C, NL4-3 (molecularly cloned strain), 4X and Q60C including WT were considered and their binding pattern relating to IC50 and CIC95 data was studied. Maximum of 36 data-points for each mutant position aiming at particular viral pocket were retrieved from the literature. Total of 36x5 data-points for each biological activity were collected. Quantitative statistical relationships were developed using various descriptors and regression techniques. It is anticipated that the results of this study would help in the development of efficient chemical probes/leads by evolution of existing examples.

37 Pharmacodynamic modeling of C2 symmetric HIV-1 protease inhibitors
Raghava Chaitanya Kasara, Chemistry Department, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699, Fax: 315-265-6610, kasarar@clarkson.edu, Phone: 315-265-2357, and Rajni Garg, Department of Chemistry & Biochemistry, California State University San Marcos, San Marcos, CA 92096

The overall goal of this comparative study was performed to predict the pharmacodynamic prediction of C2 symmetric HIV-1 protease inhibitors published by Kempf et al. To understand the binding patterns of the drug molecule at the receptor site, QSAR studies were performed using various statistical analysis. A large dataset was compiled and QSAR models on pharmacodynamic models were developed. These models indicate important physicochemical parameters to have a vital role in the binding interaction at receptors site (viral protease). These models have the potential to be used as in-silico virtual screening tool for predicting the pharmacodynamic profiles of HIV-1 protease inhibitors.

38 Proper use of cross-validation while descriptor-thinning: Naïve versus true q2
Ramanathan Natarajan1, Subhash C. Basak1, Douglas M. Hawkins2, and Jessica Karaker3. (1) Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Highway, Duluth, MN 55811, Fax: 218-720-4328, rnataraj@nrri.umn.edu, Phone: 218-720-4342, (2) School of Statistics, University of Minnesota, Minneapolis, MN 55455, (3) Department of Mathematics, University of Wisconsin-Eau Claire, Eau Claire, WI 54702-4004

In QSAR modeling of property/ bioactivity of chemicals using calculated molecular descriptors, we are faced with the usual problem of “few compounds and many descriptors.” Hence, variable-selection (descriptor-thinning) methods are used to select a proper subset of descriptors to develop QSAR models. It is vital to incorporate the descriptor selection, as well as any parameter selection, as part of the modeling procedure to be cross-validated for assessment of the model. When the cross-validation step does not include all such elements of the modeling procedure, the “naïve q2” thus estimated suffers from an upward bias. Application of proper cross-validation that includes descriptor thinning is necessary for developing QSAR models with good predictive ability. The importance of embedding descriptor selection as well as parameter selection inside the cross-validation step, resulting in calculation of the “true q2”, is highlighted by a comparison of true q2 with naïve q2 for a few sets of compounds.

39 Using a chatbot to access chemical information
Dazhi Jiao, School of Informatics, Indiana University at Bloomington, Wells Library 043, Bloomington, IN 47408, djiao@indiana.edu, Phone: 812-856-0089

A chatbot is a computer program designed to interact with users through intelligent conversations. AIML, the Artificial Intelligence Markup Language, is a technology commonly used in developing chatbots. In this poster, I will propose chatting as an interface for scientists to retrieve chemical information and perform scientific computations. Chatbot technologies such as ALICE and AIML will be introduced. I will also discuss a prototype of AIML-based chatbot that can be used to access information in PubChem, and other chemical databases through web services.

40 Revised chemical component dictionary for the Worldwide Protein Data Bank
Muhammed Yousufuddin1, Dimitris Dimitropoulos2, Zukang Feng1, Jeramia Ory1, Hyunmi Sun1, John Westbrook1, Kim Henrick2, and Helen Berman1. (1) Rutgers, The State University of New Jersey, Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Chemistry, 610 Taylor Rd, Piscataway, NJ 08854, myousuf@rcsb.rutgers.edu, Phone: 732-445-0103, (2) EMBL Outstation-Hinxton, MSD-EBI, Cambridge, United Kingdom

The RCSB PDB, in collaboration with the MSD-EBI, has developed and released a new and expanded chemical component dictionary. This new dictionary, which now contains around 8000 small molecules, has been improved by removing redundant ligands such as SUL, correcting any valence errors, and providing IUPAC atom labeling for standard amino acids and nucleotides. In addition, the new dictionary contains many additional data items such as stereochemical assignments, idealized coordinates, and SMILES strings.

The contents of this new chemical component dictionary have been used to remediate the entire PDB archive, which currently contains over 42,000 entries. The detailed annotation of small molecules in the archive makes greater integration with cheminformatics databases and pharmaceutical applications possible.

The wwPDB is accessible from www.wwpdb.org. We acknowledge the support of our funding agencies: RCSB PDB (NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS) and MSD-EBI (Wellcome Trust, EU, CCP4, BBSRC, MRC and EMBL).

41 Systematic, automated analysis of solubilising groups in oral drugs
Paul N Mortenson1, Miles S. Congreve2, and Christopher W. Murray2. (1) Computational Chemistry Group, Astex Therapeutics, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom, Fax: +44 1223 226201, p.mortenson@astex-therapeutics.com, Phone: +44 1223 435014, (2) Astex Therapeutics, Cambridge CB4 0QA, United Kingdom

Medicinal chemistry programs frequently produce molecules that are potent but relatively insoluble in water. Such compounds present a range of problems, but most importantly they are unlikely to be suitable candidates for oral (or intravenous) administration. A common solution to these problems is the addition of a polar solubilising group to the molecule. We present here the results of a systematic analysis of solubilising groups present in marketed oral drugs. Proprietary software tools have been written and used to automatically extract these groups from a database of oral drugs, as well as a larger database of advanced drug candidates. A design tool has also been created that allows chemists to create virtual libraries of solubilised molecules, starting from a template that they sketch. Appropriate solubilising groups are virtually attached to the template, and the library thus created can then be further profiled for example by docking.

42 MUT-HIV: Mutation database of HIV proteases
Rajni Garg, Department of Chemistry & Biochemistry, California State University San Marcos, 333 S. Twin Oaks Valley Rd., San Marcos, CA 92096, Phone: 760-750-8069, Srinivas Alla Reddy, Organic Division I, Indian Institute of Chemical Technology, Hyderabad 500 007, India, Xiaoyu Zhang, Department Computer Science, California State University San Marcos, San Marcos, CA 92096, and Ahmad R Hadeagh, Computer Science, California State University San Marcos, San MArcos, CA 92096

Identification of mutational patterns, and designing smart HIV drugs, which can be active even after the mutation occurs, is a challenging problem. With our continuum interest, we have developed a database of HIV protease proteins (MUT-HIV). The MUT-HIV database contains the information about both wild type and mutated HIV proteases. All the crystal structures of the HIV proteases deposited in the PDB are extracted. Details of mutated amino acids along with other properties like crystallization conditions, bound inhibitors etc. are stored in the database. Several physical, chemical and electronic properties of the bound inhibitors and binding pockets are calculated and organized in the database. The information obtained from the mutation patterns will be correlated with the inhibitor's descriptors. This database will be a valuable tool to predict the types of mutations that can occur for newly designed inhibitors and in the design of the inhibitors for multi target approach.

43 Desktop cheminformatics: A new free application for end users
Tim Dudgeon, Petr Hamernik, Gyorgy Priok, Szilard Dorant, and Ferenc Csizmadia, ChemAxon Kft, Márámaros köz 3/a, 1037 Budapest, Hungary, tdudgeon@chemaxon.com, Phone: +44 1865 331167

InstantJChem is an extensible desktop application designed to bring sophisticated cheminformatics to chemists. Structure databases can be quickly created in embedded or enterprise databases (allowing collaboration between multiple users). Each database allows structural and non-structural data in multiple formats to be quickly imported/exported. Chemical business rules can be applied using Standardizer to allow structure canonicalization (nitro representation, salt removal...). Structure based calculations and predictions (logP, pKa, RuleOf5, bioavailability...) are available using the Chemical Terms language. Advanced structure searching techniques can be combined with queries on data fields and Chemical Terms filters and applied rapidly to large data sets. Results can be viewed in a tabular format or with custom designed forms. As such, InstantJChem provides a simple platform to perform complex structure based analysis and prediction, including HTS analysis, SAR analysis, library overlap analysis, compound acquisition and ADMET predictions. The core functionality of InstantJChem is freely available.

44 Decision making for research informatics: Technical, financial and organizational considerations and method
Gregory Fond, Kelaroo, Inc, 312 S. Cedros Ave., Suite 320, Solana Beach, CA 92075, gfond@kelaroo.com, Phone: 858-259-7561

As start-up companies accumulate scientific data, they must assess many competing options for managing drug discovery and development information. Many resources are available to assist start-up companies in selecting and accessing the best research informatics (RI) platforms and applications. This presentation introduces a method to assist small and medium size companies in sorting through the many RI options available to them. It is based on Kelaroo's experiences with over 30 biotechnology and pharmaceutical companies whom Kelaroo has provided with custom and commercial cheminformatics and bioinformatics products and professional services. The formalization of the method is achieved using basic elements of technical, financial and organizational analysis. This presentation also illustrates how in the maturing RI industry small and medium-size biopharmaceutical companies can mitigate the trade-offs between cost, flexibility and scalability. The findings are derived empirically from discussions with industry analysts, biopharmaceutical companies of various sizes, providers of RI platforms as well as developers of custom and commercial software applications. The presentation includes case studies for illustration purpose.

45 Chemical Compliance Analytical System (C-CAS)
George R. Thompson, Chemical Compliance Systems, Inc, 706 Route 15 South, Suite 207, Lake Hopatcong, NJ 07849, Fax: 973-663-2378, georgethompson@chemply.com, Phone: 973-663-2148

Chemical inventories are a valuable resource of information for numerous departments and applications throughout an organization, when properly constructed and effectively analyzed. However, an inventory system can be no “smarter” than the data it contains. At least five primary databases are required for the broadest benefits from a chemical inventory: (1) chemical/product container, (2) chemical cross-reference dictionary, (3) MSDSs, (4) chemical health/safety/ecological hazards, and (5) applicable regulatory List of Lists. Additional data and criteria will greatly enhance the utility of the chemical inventory—e.g., physical/chemical properties, process and product usage, “green” and biobased criteria, hazard ranking criteria, generic chemical classes, incompatible and/or alternative chemicals, etc.

C-CAS is a true cradle-to-grave container tracking system that can include all of the above databases (and more), identifies each container by a bar code, and tracks the precise location of that container in real time throughout its lifetime. Literally, hundreds of reports are available from C-CAS: by chemical/product/location, by manufacturer, at reorder thresholds, and for hazard classes by room, department, building, or the entire organization. C-CAS can also identify any of 650 regulations that affect a chemical, or product, and can calculate when reporting thresholds are exceeded. Additionally, C-CAS serves as the input module to our Chemical Hazard and Environmental Management System and our Chemical Homeland Security System, and can feed quantitative data into any pre-existing ISO-14001 EMS. In short, C-CAS is an invaluable tool for diverse users with seemingly unrelated responsibilities.

46 Chemistry informatics in academic laboratories
Michael P. Hudock, Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, 607 S. Mathews Avenue, Urbana, IL 61801, hudock@uiuc.edu, Phone: 217-333-4335

A substantial amount of early discovery chemistry is occurring every day not only in large industrial laboratories, but also in non-industrial settings, such as academic institutions, hospitals and government research centers. In many situations these non-industrial settings are generating substantial amounts of data but do not have a formal informatics solution to manage and mine the resulting data. Typically the most basic requirements of such systems generally do not differ. Using primarily open source software we show it is possible to build a client-server based system to handle these most basic requirements of uniting chemical structures with activity data and also even more advanced features for data mining and modeling structure-activity relationships. Using a rapid development model and standardized database architecture, feature requests can be accommodated on a short timescale. This system is routinely used in our group and has been able to detect otherwise unrecognized trends in data.

47 Complete chemical inventory management
Robert D. Feinstein, James Moeder, Bret Daniel, Andrew Reum, and Gregory Fond, Kelaroo, Inc, 312 S. Cedros Ave., Suite 320, Solana Beach, CA 92075, rdf@kelaroo.com, Phone: 858-254-6727

Chemistry intensive organizations need to search, source and manage thousands of reagents, building-blocks and advanced intermediates. Increasingly, they must do this while minimizing IT infrastructure, avoiding disparate applications, rigid solutions and expensive licensing. We will describe Kelaroo's experience with systems addressing these operational and business needs.

This presentation will address the integration of workflows involving chemistry, purchasing, stockroom and EH&S departments. Enabling capabilities include simultaneous searching of in-house reagents and commercial catalogs; acquiring reagents from both inventory and vendors efficiently and economically; managing chemical inventory effectively (including receiving, dispensing, tracking, reconciling and EH&S reporting); and enforcing business policies to save companies substantial time and money.

The presentation will also discuss technical and business trends that are reshaping the Research Informatics landscape. This represents a paradigm shift towards integrating best-of-breed applications that are plug-and-play, web-based, full-featured, highly configurable and available as internal systems or hosted as a pay-as-you-go service.

48 Developing proprietary systems in a small company environment
Chandu Nair, Scope e-Knowledge, 515 Madison Avenue, 21st Floor, New York, NY 10022, chandu@scopeknowledge.com, Phone: 646-706-2575

As a remote knowledge services company, Scope fulfills diverse content and data requirements of various clients. Obtaining off-the-shelf, ready to use products catering to Scope's requirements is difficult and not very cost effective either.

Scope has therefore put in place a hybrid software team which comprises an internal and external team of experts to create proprietary systems.

In the knowledge space, Scope believes that achieving 100% automation is unrealistic; therefore, Scope has a philosophy of “assisted automation”. Applications are developed in such a way so that they are scalable, ensure better control and enable constant improvement, Scope follows the approach of continuous development and quantum deployment; applications are continually tweaked but deployment is done in a staged manner when it reaches a critical mass.

AGILE methodology is used in developing software. The software team and project operations team finalize the requirements together. Consequently, the applications developed are user friendly and meet user requirements more precisely.

Case studies will be discussed to illustrate these points.

49 Outsourcing of discovery informatics -the new Indian model
Eric A. Jamois and Sai Subramaniam, Strand Life Sciences, 1902 Wright Place – Suite 200, Carlsbad, CA 92008, Fax: 760-918-5505, ejamois@strandls.com, Phone: 760-918-5582

Although outsourcing to China, Russia and India has reached mainstream status, its success rides on the execution. Interestingly, some companies have questioned the viability of overseas operations, primarily on grounds of operational efficiency. Initial outsourcing models were founded on the allocation of large amounts of junior resources towards projects with disappointing returns. More recently, Indian and other geographies have turned to senior resources recruited directly from their target markets. With direct insight into requirements and a project level understanding of the challenges at hand, there is now a direct shift into greater efficiency and higher end deliverables.

We will describe several projects undertaken at Strand Life Sciences in terms of their challenges and solutions provided. We will discuss the implementation of a data analysis and visualization platform in pharmaceutical discovery. We will also describe how some specific components can be integrated in an existing environment to provide new capabilities for image analysis, SAR and other applications.

50 Chemical inventory services
John Jegla, Symyx Technologies, 70 Wood Avenue, Iselin, NJ 08830, john.jegla@symyx.com, Phone: 802-242-9017, and Mitchell A. Miller, Symyx Technologies, Fairfax, VT 05454

Over the years, Symyx has built a number of applications to manage repositories of chemical materials and physical inventories. This work has been done for a variety of organizations in the chemical and pharmaceutical industries. Our experiences have revealed there to be significant differences between organizations regarding the definitions of chemical entities and the operations required to support inventory-related workflows, even within a given industry segment. This puts a premium on providing software solutions that are not just comprehensive and flexible, but also extensible at all levels via customer-accessible developer kit functionality. To support such efforts, we have dissected the ensemble of requirements and features into a set of application components:

Data model (Representations of the primary entities in an inventory system) Application functions (searching, browsing, object life cycle maintenance, user experience management) Application features (client configuration)

Understanding these components and designing them for easy reuse leads to greater efficiency in operation and more satisfied users in the long run. Here we present our experience in the hope that it will be instructive to others.

51 Use of Chem SW CisPro for inventory and MSDS management
Scott C. Boito, North American Info Center, Rhodia, Inc, 350 George Patterson Blvd, Bristol, PA 19007, Fax: 215-781-6002, scott.boito@us.rhodia.com, Phone: 215-781-6229

Rhodia moved laboratory facilities in late-2005 and with the new location it was decided to implement a new chemical inventory system. Chem SW's CisPro system was selected because it allowed management of our MSDS collection electronically with access to both the inventory records and the accompanying MSDS. The implementation of the system and the continuous evolution will be discussed.

52 Partnering with the libraries: Chemical information instruction for a large freshmen core chemistry course
Angela Locknar, Engineering and Science Libraries, MIT, 14S-134, 77 Massachusetts Ave., Cambridge, MA 02139, Fax: 617-253-6365, locknar@mit.edu, Phone: 617-253-9320, and Donald R. Sadoway, Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 8-203, Cambridge, MA 02139-4307, Fax: 617-253-5418, dsadoway@mit.edu, Phone: 617-253-3487

Providing instruction in finding and using information ("library" skills) is common in first year English courses, but these skills are just as relevant for first year students in the sciences. Should faculty in the sciences be expected to teach these skills, or should they call upon their librarian colleagues? This presentation will describe the collaboration between an engineering and science librarian and a faculty member to deliver information skills to a large freshman level core chemistry course. An innovative pilot course, using the students themselves to help determine how to teach their peers, will be discussed. Scaling this project to reach the over 500 students enrolled in the core course, including the creation of online tutorials, will also be addressed.

53 Providing for graduate student information needs at a large research university
Jeremy R Garritano, Mellon Library of Chemistry, Purdue University, 504 W. State St., West Lafayette, IN 47907, jgarrita@purdue.edu, Phone: 765-496-7279

Graduate students at a large research university often have many information needs—from choosing a research advisor, to creating and pursuing their research agenda, to deciding on where to go after graduation. In addition, many of them have insufficient information seeking skills. The M.G. Mellon Library of Chemistry at Purdue University attempts to address many of these issues by the focused and proactive provision of resources and services to graduate students. Besides instructing graduate students on common chemical information resources, the staff of the Chemistry Library provides additional services, such as after-hours access and assistance with bibliographic management software, to enhance the educational and research experience. In addition, a biweekly series of seminars, called Ice Cream Seminars, are provided every year to help acclimatize new graduate students to these and other resources provided by the Purdue University Libraries. This talk will highlight the services provided by the Chemistry Library, from the day potential graduate students visit campus to the day they graduate and are off to pursue future endeavors (and sometimes, even after that).

54 Chemical information course at a small public liberal arts college
Allan K. Hovland, Department of Chemistry, St. Mary's College of Maryland, 18952 E Fisher Road, St. Mary's City, MD 20686, Fax: 240-895-4996, akhovland@smcm.edu, Phone: 240895-4354

St. Mary's College of Maryland is a small public liberal arts college. The introduction to chemical literature course was first offered about 15 years ago. The role of the course in the chemistry curriculum was heightened when a requirement for a year-long research experience was implemented a few years ago. A strong collaboration between the chemistry faculty and the library staff has developed. As is universally true, the issue of access to materials has been one of the greatest challenges. This has been met in part by the participation in consortial arrangements. The role of information literacy in chemistry will be expanding in light of the implementation of a new core curriculum requiring information literacy components across the curriculum.

55 Undergraduate cooperative access to information resources
R. G. Landolt, Department of Chemistry, Texas Wesleyan University, 1201 Wesleyan Street, Fort Worth, TX 76105, Fax: 817-531-4275, rlandolt@txwes.edu, Phone: 817-531-4890

This project addresses the following objectives, to: Provide FACULTY with fundamental sophistication in Chemical Informatics; Teach STUDENTS to determine if information exists, how to retrieve it and assess its quality; and Enable DECISION-MAKERS to see how institutional resources may be used efficiently. To date, students and faculty at 4-year institutions have been provided insights regarding access and use of Chemical Abstracts and ACS Publications Journals. Optimum progress has occurred by establishing consortia of institutions, with active involvement of faculty and institutional librarians. Efforts are underway to identify issues of concern regarding online access for 2-year programs, including Community College Chemistry and Chemical Technology.

56 Bio- and chem-Informatics: Where do the twain meet?
N. Sukumar1, Curt M. Breneman2, Kristin P. Bennett3, Charles Bergeron3, Theresa Hepburn2, C. Matthew Sundling4, Shekhar Garde5, Rahul Godawat5, Ishita Manjrekar5, Margaret McLellan2, and Mike Krein2. (1) Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute / RECCR Center, 110 8th St., Troy, NY 12180-3590, Fax: 518-276-4887, nagams@rpi.edu, Phone: (518)276-4235, (2) Department of Chemistry / RECCR Center, Rensselaer Polytechnic Institute, Troy, NY 12180, (3) Department of Mathematics, Rensselaer Polytechnic Institute, Troy, NY 12180, (4) Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180-3590, (5) Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180

With continuing advances in epigenetics, proteomics, interactomics, metabolomics and RNA interference, bioinformatic data is increasingly becoming 3-D (structure-based) rather than just linear (sequence-based). A unified approach to cheminformatics and bioinformatics can thus enable a rich cross-fertilization of computational methods developed independently in different disciplines. We have developed a gamut of new software tools (Dixel, Protein-Recon, QPEST) and descriptor families (sequence similarity kernels, hydration-based descriptors) at Rensselaer that present bioinformatics data in a format familiar to cheminformaticians and cheminformatics data in bioinformatics-like format. Some modeling applications such as prediction of binding affinities of T cell receptors to leukemia vaccine polypeptides, ranking of transcription factor binding sequences and identification of pyruvate kinase activators and inhibitors will be presented.

57 Enabling systems biology: Automated elucidation of metabolite structures
Christoph Steinbeck, Miguel Rojas, Tobias Helmus, Egon Willighagen, and Stefan Kuhn, Research Group for Molecular Informatics, Cologne University Bioinformatics Center (CUBIC), Zuelpicher Str. 47, D-50674 Cologne, Germany, c.steinbeck@uni-koeln.de, Phone: 0049-221-470-7426

Identification and structure elucidation of unknown metabolite structures based on their spectroscopic properties form the basis for successful metabolome simulations. In a process known as dereplication, a scientist would record molecular fingerprint spectra and search spectral databases to check whether the compound at hand is already known. Only if this search in unsuccessful, it is reasonable to reach for one of the more sophisticated ab-inito tools for computer-assisted structure elucidation. Here we describe the use of free software, the easy access provided by the World Wide Web and the collaborative potential of the Open-Source movement to build a completely transparent system for computer-assisted structure elucidation and identification. Methods for the prediction of mass and NMR spectra have been developed and used as part of a fitness function in our structure elucidation systems based on stochastic chemical space generators.

58 Genome scale enzyme-metabolite and drug-target interaction predictions using Support Vector Machines
Jean-Loup Faulon, Computational Bioscience Dept, Sandia National Laboratories, P.O. Box 5800 - MS 1413, Albuquerque, NM 87185, Fax: 505-284-1323, jfaulon@sandia.gov, Phone: 505-284-0770

Biological and chemical databases are increasingly populated with information linking protein sequences and chemical structures (Kegg, PubChem DrugBank, MDDR). There is now sufficient information to apply machine learning techniques to predict interactions between chemicals and proteins on a genome-wide scale. Current machine learning techniques use as input either protein or chemical information. A novel Support Vector Machine method will be presented for predicting protein-chemical interaction using heterogeneous input consisting of both sequences and chemical structures. The method relies on fusing protein sequence data with chemical structure data by representing each with a common cheminformatics description. The approach will be demonstrated by predicting proteins that can catalyze reactions, even when the reactions have no known enzymatic catalysts, and predicting when a given drug can bind a target, also in the absence of prior binding information for that drug and target.

59 Structural similarity of binding sites in analogous enzymes
Yang Shen1, Dmitri Beglov2, Ryan Brenke3, Dima Kozakov2, and Sandor Vajda2. (1) Department of Manufacturing Engineering, Boston University, Boston, MA 02215, yangshen@bu.edu, (2) Department of Biomedical Engineering, Boston University, 44 Cummington St, Boston, MA 02215, Fax: 617-353-6766, vajda@bu.edu, Phone: 617-353-4757, (3) Program in Bioinformatics, Boston University, Boston, MA

Two enzymes are analogous if they have the same EC number (or their EC numbers differ only in the last digit), but are evolutionarily unrelated, i.e., they lack both sequence and structural similarity. Analogous enzyme pairs are relatively rare, but occur in all major classes, assumed to be the results of convergent evolution. Research on analogous enzymes is very limited: it consists of searches for non-homologous enzymes with the same EC number and studies of specific cases of convergent evolution. It is known that at least in a number of cases the spatial arrangement of the catalytic residues is conserved, but very little is known about the similarity of the binding sites that occur on different protein scaffolds. In this work we use a new method developed to assess molecular similarity for the structural superimposition of enzyme binding sites. The physicochemical properties of the cavity-flanking residues are represented by pseudocenters. Given two sets of such pseudocenters, our goal is finding the largest subset of pseudocenters in both clefts in direct correspondence with each other geometrically as well as chemically. The proposed method performs an exhaustive evaluation of the correlation function in the discretized 6D space of mutual orientations of the two point sets using a very efficient algorithm involving Fast Fourier Transforms. The method is applied to a number of analogous enzyme pairs. Advantages over the more traditional structure comparison method based on the maximum clique algorithm are discussed.

60 Using reaction mechanism to measure enzyme similarity
Noel M. O'Boyle1, Gemma L. Holliday2, Daniel E. Almonacid3, and John B. O. Mitchell3. (1) Cambridge Crystallographic Data Centre, 12 Union rd, Cambridge CB2 1EZ, United Kingdom, Fax: 0044-1223-336033, oboyle@ccdc.cam.ac.uk, Phone: 0044-1223-762531, (2) EMBL-EBI, Cambridge CB10 1SD, United Kingdom, (3) Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Cambridge CB2 1EW, United Kingdom

As more and more mechanistic data on enzymes becomes available, the ability to identify similar mechanisms in other enzymes is becoming more important. Such information may be used to identify mechanistically convergent or divergent enzymes, to study the link between structure and function, to perform literature searches, and to validate experimental results. However, existing methods for measuring enzyme similarity (evolutionary distance, structural similarity, classification by function) do not take chemical mechanism into account. We have developed the first method to give a quantitative measure of the similarity of reactions based upon their explicit mechanisms. The method combines classic cheminformatics techniques (Tanimoto coefficient, Euclidean distance of fingerprints) with the Needleman-Wunsch alignment algorithm used in bioinformatics. We present an analysis of the MACiE database of enzyme mechanisms using our measure of similarity, contrast functional and mechanistic classification schemes, and identify some examples of convergent evolution of chemical mechanism.

61 Introduction to licensing chemical technology and intellectual property
Brian C. Meadows, Needle & Rosenberg, PC, 999 Peachtree Street, Suite 1000, Atlanta, GA 30309, Fax: 678-420-9301, bmeadows@needlerosenberg.com, Phone: 678-420-9300

The increasing investment and reliance on intellectual property for value creation has elevated the licensing of technology to the forefront of today's global economy. The licensing process can vary from seeking revenues for your own existing technology to seeking rights in the technology of others. This presentation will explore various perspectives and strategies for creating value through the licensing of chemical technology and intellectual property.

62 Licensing and technology transfer: An academic perspective
Tena Herlihy, Technology Licensing Office, MIT, Room NE25-230, Five Cambridge Center, Kendall Square, Cambridge, MA 02142, Fax: 617-258-6790, tenazara@mit.edu, Phone: 617-253-6966

Licensing and technology transfer agreements between the academic and industrial sectors are increasing in number. It is important to understand the nature and obligations of academic institutions and thus their needs, policies, and limitations. This presentation will cover strategies for handling a number of issues that often arise when negotiating patent licenses with academic institutions. A brief history of the Bayh-Dole Act will be given as a background to how universities came to be involved in technology transfer, followed by a discussion of topics that are unique to the academic environment. For example, universities are often very limited in the representation and warranties they can give. Also, unlike commercial agreements, there will always be retained rights under an academic agreement, and the agreement is likely to include due diligence provisions to make sure the technology is developed. The presentation will conclude with a report on likely changes in academic agreements as a result of recent case law.

63 Licensing and technology transfer: Planning for the future
Craig M. Sorensen, Director, Strategic Research Alliances, Vertex Pharmaceuticals Incorporated, 130 Waverly Street, Cambridge, MA 02139, Fax: 617-444-6865, Craig_Sorensen@vrtx.com, Phone: 617-444-6523

The pharmaceutical research environment today is very different from what it was even five years ago. The demands for a robust, productive research pipeline are arguably greater now than they have ever been and with it the challenge of finding new ways to meet this demand is increasing as well. It is widely accepted that strategic in- and out-licensing is one way to augment internal efforts to generate a robust pipeline and while, in and of itself, this is not a new concept the strategies and practices that have been put in place around these licensing activities have become entrenched and may no longer be sufficient to meet the challenges of tomorrow. What we are rapidly discovering is that the best practices of yesterday and today may not necessarily be the best practices for tomorrow. As a result of the increasing globalization of research and ever more complex interplay between pharma companies, CRO's, and academia, there is rapidly emerging a requirement for new paradigms of interaction in order to ensure success for all parties. In this presentation some of the concepts and paradigm shifts around licensing that Vertex has developed and successfully implemented will be presented.

64 Role of information management in pharmaceutical licensing and partnering
Shuntai Wang, Pfizer, Inc, 50 Pequot Avenue, B2231, New London, CT 06320, shuntai.wang@pfizer.com, Phone: 860-732-1941

At large pharmaceutical companies, information management as a cross-functional process leverages all sources of public information for the identification and assessment of licensing opportunities. These sources include news, scientific and medical conferences, pipeline databases, company public disclosures, scientific literature and patents. Information management complements direct human interactions for successful licensing and partnering. Even for those smaller companies with limited resources, effective information management can play an important role in successful product licensing.

65 Government and academic issues on IP rights and licensing in Europe
Stephen R. Adams, Magister Ltd, Crown House, 231 Kings Road, Reading RG1 4LS, United Kingdom, Fax: +44 118 966 6620, stevea@magister.co.uk, Phone: +44 118 966 6520

This talk will examine some of the mechanisms for technology dissemination in Europe, with particular reference to the influence of industry interests and the European Union (EU). Methods for funding scientific research in Europe are markedly different to those of the US, as are the resulting methods of handling of the IP rights arising from that research. Central government funding is still a major contributor, with relatively little coming from private endowments or alumni foundations. There is a mixed experience of creating spin-off commercial enterprises from university-based research; whilst some are world leaders, not all have been successful in assisting the technology transfer process. The handling of IP rights on inventions from academia varies between the countries of Europe, although there are some EU-wide regulations, particularly in relation to technology transfer. In 2004, there was a major reform in EU technology transfer block exemptions, similar to US ‘safe harbor' regulations.

66 Impact of recent court decisions and intellectual property trends on licenses and agreements
Patrick Waller, Shareholder, Biotechnology and Chemical Groups, Wolf Greenfield, 600 Atlantic Avenue, Boston, MA 02210, pwaller@wolfgreenfield.com, Phone: 617-646-8223

This presentation will review the impact of recent court decisions on reach-through royalties, implied licenses, and rights to improvements relating to chemical and pharmaceutical products. The discussion will address specific decisions on licensing and patent issues and also explore intellectual property trends relating to pharmaceutical compounds, formulations, salts, and structural derivatives in the context of a license or agreement.

67 Development of mathematical biodescriptors for proteomics maps
Subhash C. Basak and Brian D. Gute, Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, Fax: 218-720-4328, sbasak@nrri.umn.edu, Phone: 218-720-4230

In the post-genomic era, "omics" technologies are generating copious data related to the effects of biological and chemicals agents on living systems. Proteomics methods such as two-dimensional gel electrophoresis (2-DE) provide data on 1,000 to 2,000 proteins. Novel methods are needed in order to extract meaningful information from these proteomics maps. Our research team has developed four classes of methods for characterizing proteomics maps using discrete mathematics and statistics: 1) association of graphs/matrices with proteomics maps, 2) information theoretic biodescriptors, 3) spectrum-like representations of proteomics maps, and 4) statistical approaches to identify critical protein biomarkers. Each of the first three methods generates a single, compact, numerical biodescriptor or a set of numerical descriptors to characterize the map. The fourth approach identifies a set of critical proteins related to the bioactivity or toxicity being studied.

68 Prediction of small molecule targets based on protein domains: Extrapolation into unknown target space
Jeremy L. Jenkins1, Andreas Bender1, and Dmitri Mikhailov2. (1) Lead Finding Platform, Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, jeremy.jenkins@novartis.com, Phone: 617-871-7155, (2) Lead Discovery Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA 02139

Cheminformatics and bioinformatics databases are often maintained in silos with minimal effort to federate chemical and genomic data. One successful cross-over between disciplines we recently presented was the prediction of ligand targets by mining target-annotated chemical databases. However, one restriction of this approach is that only targets present in the original database could be predicted. In this work we further push those boundaries in target space; By annotating 1,300 target classes in the WOMBAT database with the InterPro domains found in the targets, we have created thousands of probabilistic models that associate chemical substructures with protein domains. The models can be applied to orphan compounds for in silico "domain fishing", enabling target predictions that extrapolate to proteins outside the training set. Examples of employing the approach to triaging cell-based high-throughput screens are provided, as well as their application in ranking the proteins pulled down in small-molecule affinity chromatography experiments.

69 Phylochemical Tree" for drug targets: Putting biological activities into context via ligand-based similarity measures
Andreas Bender, Jeremy L. Jenkins, and John W. Davies, Lead Finding Platform, Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, andreas.bender@novartis.com, Phone: 617-871-3972

After recognizing that polypharmacology (i.e., activity against multiple targets) is an inherent property of most, if not all, small molecules which modulate biological functions, a subsequent question arises: Which targets are more frequently associated with each other than others? The answer to this question puts biological activities in relation to each other[1,2], not via sequence-based similarities, but rather by means of ligand-based commonalities of ligands showing the same activity in biological systems. We employ methods from evolutionary biology informatics to produce graphical representations of ligand-based bioactivity space and further apply the resulting phylochemical mappings to rationalize off-target effects. These representations are still based on conventional ligand-based fingerprints. On the other hand, biological readouts can be used directly to represent compounds by means of their impact on biological systems[3] and an analogous analysis can be performed, this time on experimental readouts. We present visualizations of chemical space, based on ligand properties, as well as applications to the prediction of off-target (or, rather, secondary target) effects.

[1] "Bayes Affinity Fingerprints" Improve Retrieval Rates in Virtual Screening and Define Orthogonal Bioactivity Space: When Are Multitarget Drugs a Feasible Concept? A. Bender, J. L. Jenkins, M. Glick, Z. Deng, J. H. Nettles and J. W. Davies, J. Chem. Inf. Model., 2006, 46, 2445-2456.

[2] Relating protein pharmacology by ligand chemistry. M. J. Keiser, B. L. Roth, B. N. Armbruster, P. Ernsberger, J. J. Irwin and B. K. Shoichet. Nature Biotech. 2007, 25, 197 – 206.

[3] Chemogenomics Data Analysis: Prediction of Targets and the Advent of "Biological Fingerprints". A. Bender, P. A. Clemons, J. L. Jenkins, D. Mikhailov, D. W. Young, J. H. Nettles, M. Glick, and J. W. Davies. Comb. Chem. High Throughput. Screen., 2007, in press.

70 Structural genomics approach to the assessment of biologically relevant diversity of compound collections
Jerry Osagie Ebalunode1, Zheng Ouyang2, Jie Liang3, and Weifan Zheng1. (1) Department of Pharmaceutical Sciences, Biomanufacturing Research Institute and Technology Enterprise (BRITE), North Carolina Central University, 1801 Fayetteville Street, Durham, NC 27707, jebalunode@nccu.edu, Phone: 919-530-7013, (2) Bioengineering Department, University of Illinois at Chicago, Chicago, IL 60612, (3) Bioengineering Department, Carolina Exploratory Center for Cheminformatics Research (CECCR), University of Illinois at Chicago, Chicago, IL 60612

In the past decade, high throughput screening (HTS) and rapid parallel synthesis (RPS) have dramatically changed the drug discovery industry. In recent years, these same technologies also form the very basis for the NIH chemical genomics initiative. One of the main issues in both drug discovery and chemical genomics research is how to assess the diversity of compound collections so that the information obtained from HTS is maximal and meaningful. Traditional diversity measures only look at the self dissimilarity of the compound collection and ignore information from the biological space. We have developed a new approach (BioMD) to the problem whereby binding site shape information derived from computational geometry analysis of the structural genome is used in evaluating the fitness (i.e. biological relevancy) of molecules in a collection. This strategy allows us to consider not only the self dissimilarity but also the biological relevance of the individual compounds in the diversity assessment process. In this talk, I will present some preliminary data demonstrating the application of BioMD to several publicly available database and virtual libraries derived from Diversity Oriented Synthesis (DOS).

71 Combining natural language processing with substructure search for efficient mining of scientific literature
Shaillay Kumar Dogra and Ramesh Hariharan, Cheminformatics, Strand Life Sciences Pvt. Ltd, No. 237, Sir C V Raman Avenue, Raj Mahal VIlas, Bangalore, India, Fax: +91-80-23618996, shaillay@strandls.com, Phone: +91-80-23611349

Running ‘Natural Language Processing' engine on scientific literature can yield information on interactions between biological entities like proteins and small molecules. Such an approach, when run on Medline abstracts in December 2005 yielded around 231,400 Protein-Small molecule and 110,850 small molecule-small molecule interactions. Clearly, there is a plethora of information available for analysis. However, the nature of the search, which is ‘text' driven, limits such an approach. What is of immensely more use is to run a ‘substructure' search using the query compound of interest against the small molecule interactions database. The resulting hits can then be analyzed to check if the query compound has potentially similar biological interactions. This gains significance in a drug discovery setting wherein compounds are being virtually designed and optimized for good ADME properties. An additional dimension to optimize now could be avoiding undesirable interactions with specific biological targets or with other small molecules.

72 Methods for effective virtual screening and scaffold-hopping in chemical compounds
Nikil Wale1, George Karypis1, and Ian A Watson2. (1) Department of Computer Science, University of Minnesota, Minneapolis, MN 47408, nwale@cs.umn.edu, Phone: 612-626-9874, (2) Eli Lilly and Company, Indianapolis, IN 46285

Methods that can screen large databases to retrieve a structurally diverse set of compounds with desirable bioactivity properties are critical in the drug discovery and development process. In this presentation we will show a set of such methods, which are designed to find compounds that are structurally different to a certain query compound while retaining its bioactivity properties (scaffold hops). These methods utilize various indirect ways of measuring the similarity between the query and a compound that take into account additional information beyond their structure-based similarities. Two sets of techniques are presented that capture these indirect similarities using approaches based on automatic relevance feedback and on analyzing the similarity network formed by the query and the database compounds. Experimental evaluation shows that many of these methods substantially outperform previously developed approaches both in terms of their ability to identify structurally diverse active compounds as well as active compounds in general.

73 Bioinformatics to chemistry to therapy: Some case studies deriving information from the literature
Donald Walter, Customer Training, Thomson Scientific, 1725 Duke Street Suite 250, Alexandria, VA 22314, Fax: 703 519 5838, Don.Walter@Thomson.com, Phone: 703-706-4220

Bioinformatic information in nucleic acid and amino acid sequences can be the first step in devising chemotherapeutic treatments for a variety of ills. I am going to show several techniques using a variety of literature and patent sources where patented sequences can be linked to specific drugs and types of drugs.

74 Automated generation of pharmacophore type constraints to improve FlexX docking
Andrea Volkamer, Thomas Lengauer, and Andreas Kämper, Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany, Fax: +49 681 9325-399, Phone: +49 681 9325-303

The use of pharmacophore constraints is an established technique for improving docking results. Usually the pharmacophore has to be specified manually. In this work we present a fully automated technique that incorporates information on the buriedness of the binding pocket and structure-based pharmacophore features into the docking engine of FlexX. Key interaction points in the active site are calculated with a GRID-based energy function. Those points that pass several newly developed filters are merged to a small number of pharmacophore features. The automatically generated pharmacophores agree well with manually derived results. The performance of the method has been validated on several difficult docking tasks as well as on virtual screening scenarios using FlexX-Pharm, the pharmacophore module of FlexX. Docking results are improved in 95 % of the test cases. In general, the enrichments in virtual screening runs are higher and the compute-times are smaller than in the respective unconstrained screenings.

75 Scoring function to rank pharmacophoric alignments and its application to h1 antagonists
Thuan T. H. Huynh Buu1, Gerhard Wolber2, Thierry Langer2, Peter Lackner3, and Gerald Lirk1. (1) University of Applied Science Hagenberg, Hauptstraße 117, 4232 Hagenberg, Austria, thuan.huynhbuu@yahoo.de, Phone: +43-650-377-0479, (2) Inte:Ligand GmbH, 2344 Maria Enzersdorf, Austria, (3) Department of Molecular Biology, University of Salzburg, 5020 Salzburg, Austria

Virtual screening using 3D pharmacophores has evolved into an important and successful method for drug discovery over the last decades. We recently presented an efficient alignment method for super-positioning shared chemical features of pharmacophores and/or molecules in 3D space. Although efficient super-positioning techniques are of utmost importance to guarantee high throughput in virtual screening technologies, there is a need for automatically assessing the relevance and quality of a specific alignment for processing large data sets. Being aware of the problems of scoring functions in docking approaches the presented ranking approach has a different scope, since the position in 3D space is already defined by the single alignment solution coming from the alignment algorithm. The presented scoring function is therefore designed not to select poses of one single molecule, but to select those molecules, which better fit to a pharmacophore (or a shared feature pharmacophore hypothesis) compared to others. Geometric, steric and energetical contributions have been used for implementation and parametrization and applied to a diverse set of H1 antagonists. We used a pseudo-structure-based approach using a homology model and docked a data set of selected, active H receptor ligands using GOLD, and compared this to a ligand-based approach using multiple conformations generated by OMEGA within the LigandScout framework.

76 New approaches to 3D pharmacophore searches in virtual screening for bioactive molecules
Yevgeniy Podolyan, Department of Computer Science & Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union St SE, Minneapolis, MN 55455, podolyan@cs.umn.edu, Phone: 612-626-9873, and George Karypis, Department of Computer Science, University of Minnesota, Minneapolis, MN 55455

Virtual screening for bioactive molecules is becoming increasingly popular as the microprocessor prices decline and their speed increases. This allows for a fast screening of a large library of molecules that are potentially active against a specific target completely in silico. One approach is to search for molecules that are similar to the known active ones using techniques such as 3-dimensional alignment or various-dimensionality descriptor-based methods. One such technique is based on pharmacophores, which are the functional or structural elements of the molecule that are believed to be responsible for biological activity. Analog-based methods that use pharmacophores in the virtual screening include those using 3- and 4-point pharmacophore binary fingerprints, feature vectors, maximum common substructure searching, etc. to find analogs. We will discuss the benefits and shortcomings of such methods as well as present results of the methods based on new approaches to using 3D pharmacophores in virtual screening.

77 Plate cherry picking: A novel semi-sequential screening paradigm for cheaper, faster, information-rich compound selection
Meir Glick, Lead Finding Platform, Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, meir.glick@novartis.com, Phone: 617-871-7130

This study describes a novel semi-sequential technique for in silico enhancement of high-throughput screening (HTS) experiments now employed at Novartis. It is used in situations in which the size of the screen is limited by the readout (e.g., high content screens) or the amount of reagents or tools (proteins or cells) available. By performing computational chemical diversity selection on a per plate basis (instead of a per compound basis), 25% of the 1,000,000-compound screening was optimized for general initial HTS. Statistical models are then generated from target-specific primary results (percentage inhibition data) to drive the cherry picking and testing from the entire collection. Using retrospective analysis of 11 HTS campaigns, we show that this method would have captured on average two thirds of the active compounds (IC50 < 10 uM) and three fourths of the active Murcko scaffolds while decreasing screening expenditure by nearly 75%. This result is true for a wide variety of targets, including G-protein-coupled receptors, chemokine receptors, kinases, metalloproteinases, pathway screens, and protein-protein interactions. Unlike time-consuming “classic” sequential approaches that require multiple iterations of cherry picking, testing, and building statistical models, here individual compounds are cherry picked just once, based directly on primary screening data. Strikingly, we demonstrate that models built from primary data are as robust as models built from IC50 data. This is true for all HTS campaigns analyzed, which represent a wide variety of target classes and assay types.

78 On some aspects of validation of predictive QSAR models
Kunal Roy1, J Thomas Leonard2, and Partha Pratim Roy2. (1) Pharmaceutical Technology, Jadavpur University, Raja S C Mullick Road, Jadavpur, Kolkata 700032, India, kroy@pharma.jdvu.ac.in, Phone: 91-9831594140, (2) Jadavpur University, Kolkata 700032, India

Quantitative structure-activity relationships (QSARs) represent predictive models derived from application of statistical tools correlating biological activity (including therapeutic and toxic) of chemicals (drugs/toxicants/environmental pollutants) with descriptors representative of molecular structure and/or property. The success of any QSAR model depends on accuracy of the input data, selection of appropriate descriptors and statistical tools, and most importantly validation of the developed model. Validation is the process by which the reliability and relevance of a procedure are established for a specific purpose. Leave one-out cross-validation generally leads to an overestimation of predictive capacity, and even with external validation, no one can be sure whether the selection of training and test sets was manipulated to maximize the predictive capacity of the model being published. In this paper, we present some representative examples of validation of QSAR models in order to explore possible importance of the method of selection of training set compounds, setting training set size and impact of variable selection for training set models for determining the quality of prediction.

79 Tailoring molecular similarity metrics for property estimation
Brian D. Gute1, Subhash C. Basak1, and Douglas M. Hawkins2. (1) Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, Fax: 218-720-4328, bgute@nrri.umn.edu, Phone: 218-720-4284, (2) School of Statistics, University of Minnesota, Minneapolis, MN 55455

Quantitative molecular similarity analysis (QMSA) methods use a variety of calculated molecular descriptors and experimental properties in the creation of chemical structure spaces. These spaces are often used in selecting structural analogs and estimating a wide variety of properties: physicochemical, pharmacological, and toxicological. Traditionally, descriptor sets are selected arbitrarily, intuitively by an expert, or through a variety of data reduction techniques. 'Tailoring' is a new approach that selects indices that are strongly correlated with the property of interest. Studies have been carried out on a variety of chemical databases to examine the effectiveness of tailored vis-à-vis arbitrary similarity spaces in property estimation. The spaces considered here are all derived from the same set of topological indices, only the selection methods vary. Ridge regression and recursive partitioning will be discussed as useful approaches in descriptor selection.

80 PIME: A quantitative predicting application to find the isoelectric points for peptides
Daming Li, Computational Design and Modeling, LITEC Systems Corporation, New York, NY 10006, dli@litecsys.com, Phone: 212-812-6320

Quantitative phosphoproteomics analysis is becoming a hotspot and provides the possibilities to study the dynamics of protein phosphorylation and to better understand the regulatory networks of key processes in cells. In this paper we present a quantitative application, which predicts the isoelectric points for peptides with and without methyl esterification. Numerical simulation of this model shows that methylated phosphopeptides and non-phosphopeptides can be grouped on the basis of the number of phosphate groups and basic residues in each peptide. The theoretical results are supported by experiments. We developed a SOAP webservice component and an Excel add-in and it can be easily integrated into Spotfire™ DecisionSite and SciTegic™ Pipeline Pilot.

81 Learning optimum Decision Trees: Influence of parameter choice and feature selection
Shaillay Kumar Dogra, Cheminformatics, Strand Life Sciences Pvt. Ltd, No. 237, Sir C. V. Raman Avenue, Raj Mahal Vilas, Bangalore, India, Fax: +91-80-23618996, shaillay@strandls.com, Phone: +91-80-23611349

Decision Tree (DT), as a classification algorithm, has certain advantages over other methods like Neural Networks or Support Vector Machines. Apart from producing interpretable models, DTs can inherently select those descriptors that are of relevance to modeling the given property, during tree building itself. However, in context of cheminformatics data, which is characterized by high dimensionality of feature-space and less number of samples available for training, DTs tend to suffer. Here, ‘parameter tuning' and ‘feature selection' become of importance. In this study, we present our findings about the influence of parameters such as ‘attribute selection measure', ‘tree stopping criterion' and ‘tree pruning method' on the size and performance of the learned Decision Trees. Further, we introduce an initial feature selection, using wrappers, before invoking DT learning to take care of high-dimensional data. Finally, we compare our results with those obtained from ‘Decision Forest', which is an ensemble of DTs.

82 Evaluation of 3D descriptors in virtual screening
Xia Ning, Department of Computer Science & Engineering, University of Minnesota, 464 DTC, 117 Pleasant Street, SE, Minneapolis, MN 55455, xning@cs.umn.edu, Phone: 612-624-5384, and George Karypis, Department of Computer Science, University of Minnesota, Minneapolis, MN 55455

In recent years there has been an increased interest in using structural descriptors in conjunction with advanced supervised learning algorithms (e.g. support vector machines and neural networks) for solving various problems arising in virtual screening. This research resulted in the development of highly effective activity and/or property prediction methods and has provided an objective and data-driven assessment of the characteristics that a descriptor set should have in order to achieve good performance. Unfortunately, this research has primarily focused on topological descriptors and to a large extent has ignored the various 3D descriptors.

In this talk we discuss our results in evaluating the various parameters of the design space for 3D descriptors and how they impact the machine learning based virtual screening approaches. Specifically, our work focuses on the questions like: What kinds of 3D elements of the compound structures are the most significant for bioactivity and how to efficiently extract them? How to quantitatively measure and represent these significant 3D elements in descriptors so as to optimally balance the trade-off between generality and specificity of structure representation? What is the best way of using the 3D descriptors in kernel-based machine learning approaches in order to take great advantage of both the descriptors and the learning method? We address these questions by performing a comprehensive experimental evaluation using different 3D descriptors on a wide-range of datasets.

83 Relative chirality index: Novel approach for the numerical characterization of molecular chirality
Ramanathan Natarajan and Subhash C. Basak, Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Highway, Duluth, MN 55811, Fax: 218-720-4328, rnataraj@nrri.umn.edu, Phone: 218-720-4342

Quantitative treatment of chirality is very essential because successful chirality measures will be able to direct asymmetric synthesis of new agrochemicals and pharmaceuticals. Though Cahn-Ingold-Prelog rule is very successful in discriminating configurational isomers and assign them the absolute configuration they fail to quantify molecular chirality. Even several of the commonly used topological indices, 3-D descriptors, and quantum chemical descriptors of energetics cannot differentiate enantiomers or diastereomers. Some attempts to develop topological indices to differentiate stereoisomers and enantiomers are not very successful as they treat chirality as a discontinuous measure (+1 or -1) and hence, have limitation in applying to QSAR of diastereomers. We have developed a novel topological index describing molecular chirality. This new index treats chirality as a continuous measure and hence we prefer to call it the Relative Chirality Index (RCI). Calculation of relative chirality indices and their application in QSAR modeling will be presented with appropriate examples.

84 An address book for chemical space: The Chemical Structure Lookup Service (CSLS).
Markus Sitzmann1, Igor V. Filippov2, Wolf-Dietrich Ihlenfeldt3, and Marc C. Nicklaus1. (1) Laboratory of Medicinal Chemistry, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, Frederick, MD 21702, sitzmann@helix.nih.gov, Phone: 301-846-5974, (2) Laboratory of Medicinal Chemistry, SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD 21702, (3) Xemistry GmbH, D-35094 Lahntal, Germany

We give an overview of our recent work in the context of our Chemical Structure Lookup Service (CSLS). This service comprises (at the time of this writing) a collection of approx. 80 chemical structure databases from commercial and public sources, indexes approximately 40 million molecules representing approximately 27 million unique chemical structures, and continues to grow. We focus on our procedure for the normalization of the chemical structures, which is a crucial step in the processing of chemical databases coming from different sources. It is needed for finding a canonical representation of a chemical which otherwise might be missed because of differing encoding due to certain chemical features (e.g. different tautomers, different resonance structures etc.) or to ill-defined parts of the structure (e.g. misdrawn functional groups, missing hydrogen atoms, missing charges or incorrect valences). This structure normalization is performed for any incoming structure set to be registered, or searched by, in CSLS. We also discuss our structure-based hashcode identifiers, which are calculable for any small molecule. They are specifically designed to enable a fine-tunable yet rapid compound identification even in very large datasets. They can be set to be sensitive to a variety of chemical features such as tautomerism, different resonance structures drawn for a charged species, and presence or absence of certain fragments like counterions. One specific such identifier, called FICuS, is one of the crucial mechanisms for identification and lookup of chemicals in CSLS – enabling CSLS to function essentially as an “address book” of any small molecule. FICuS and the other identifiers are however not dependent on the infrastructure of this service. CSLS is freely available at http://cactus.nci.nih.gov/lookup. The service recognizes over 20 chemical structure representation formats as input data, including SD files, SMILES strings, InChI identifiers, or FICuS hashcodes.

85 Spectral reference databases: Traditional, open access, or somewhere in between?
Gregory M. Banik, Leo Collins, Marie Scandone, and Ty Abshear, Informatics Division, Bio-Rad Laboratories, TWO PENN CENTER PLAZA, SUITE 800, 1500 JFK Blvd., Philadelphia, PA 19102, gregory_banik@bio-rad.com, Phone: 267-322-6931

At their inception, spectral reference databases were made available in the same fashion as other primary and secondary published resources: for sale to libraries or individuals on either a perpetual license or annual subscription basis. Recently, the open access initiative has led to the creation of pilot initiatives for open access spectral data such as NMRShiftDB.

The evolution of spectral resources will be discussed from the first reference spectra collection (the Sadtler reference spectra, now celebrating its 60th anniversary) to today's nascent open access initiatives. In addition to the traditional and open access models, a third model will be described for the open deposition of spectra by third-parties that retains a peer-reviewability feature to ensure quality and accountability in the creation of spectral reference collections. The use of data-driven software technologies to further ensure data quality will also be discussed.

86 Protein-ligand interaction fingerprints: Method, user interface and case studies
Alex M. Clark, Research & Development, Chemical Computing Group, Inc, 1010 Sherbrooke St West, Suite 910, Montreal, QC H3A2R7, Canada, Fax: 514-874-9538, aclark@chemcomp.com, Phone: 514-393-1055

An implementation of protein-ligand interaction fingerprints will be described. Fingerprints are generated according to the presence of hydrogen bonds, ionic interactions and displacement of solvent accessible surface area, between the ligand and surrounding residues. Elements of the accompanying user interface will be presented, which makes straightforward work of gaining insights from the derived data. Several case studies will be described, including studies based solely on fingerprints derived from docking poses, improvement of structure activity relationships by mixing crystal data with docking poses, and comparative examination of selective inhibitors of families of proteins.

87 Using the PDBML schema to disambiguate PDB files
Howard J Feldman, Research, Chemical Computing Group Inc, 1010 Sherbrooke St. W., Suite 910, Montreal, QC H3A2R7, Canada, Fax: 514-874-9538, hfeldman@chemcomp.com, Phone: 514-393-1055

Recently a collaboration between MSD-EBI, PDBj and RCSB has made available the PDB Exchange Dictionary (http://pdbml.rcsb.org/schema/pdbx.xsd), adapted from the mmCIF dictionary. The data structures provided allow much disambiguation compared to the aging PDB format, for example introducing the concept of entities – unique molecules within the record. They are also more aligned with modern relational database practices – only store each piece of information once. However the problem remains that most popular software uses PDB format for both input and output. We look at some of the hurdles involved in converting PDB files to PDBML format and present a new database system, Protein SILO (PSILO) which overcomes these. The benefits of using correctly built PDBML files include more accurate interpretation of ligands and small molecules, more precise definitions of experimental conditions, and far more powerful search capabilities when stored in a relational database.

88 Visualizing biological activity profiles using target affinity maps
Fabian Bendix1, Vlad Sladariu1, Thierry Langer2, and Gerhard Wolber3. (1) Computer Science Group, Inte:Ligand GmbH, Mariahilferstrasse 74B/11, 1070 Vienna, Austria, Fax: +43181749551371, bendix@inteligand.com, Phone: +4369915075555, (2) Department of Pharmaceutical Chemistry, Computer Aided Molecular Design Group, University of Innsbruck, Institute of Pharmacy, Innsbruck A-6020, Austria, (3) Inte:Ligand GmbH, 2344 Maria Enzersdorf, Austria

While high-throughput virtual screening is used to simply narrow down a list of potential drug candidates among a number of compounds, mining results from cross-target screening can quickly become very complex. The analysis of complex activity patterns is a human, intensive, and explorative task. Our goal in this work is to provide a powerful environment for analyzing activity profiles. These profiles are defined as biological activity patterns each corresponding to a set of computationally predicted target affinities. Our framework visualizes and categorizes the results as quickly and directly perceivable activity maps. These maps can then be used to identify the activity scope of one molecule or a set of molecules at one sight, and allow to be used for application scenarios like the identification of unwanted biological effects or minimizing off-target effects. Our activity maps are enhanced for interactive use with linking and brushing techniques for directly linking molecule lists to target points on the map. The power of visualization and human exploration abilities are put together to solve the crucial task of mining drug candidates to quickly identify those with better simulated activity profiles.

89 Tautomer and conformer focusing in structure-based drug discovery
Hongyao Zhu, Computational Chemistry & Cheminformatics, Plexxikon, Inc, 91 Bolivar Dr, Berkeley, CA 94710, Fax: 510-647-4048, hzhu@plexxikon.com, Phone: 510-647-4114

In ligand-protein co-crystal structures, it is often observed that a bound ligand is not in the lowest-energy conformation or not in the preferred tautomerization state for the unbound ligand. This phenomenon is referred to as “conformer focusing” or “tautomer focusing”. Therefore, the assessment of binding free-energy contribution from such conformer focusing and tautomer focusing is essential to reliable estimate binding affinity for ligand design. Taking into account energy difference between tautomerization states and between conformation states in the binding free-energy calculations represents one of the most challenging problems in computational chemistry. Several common structural-motifs are reported to illustrate general considerations of tautomer and conformer focusing in the structure-based molecular design.