#227 - Abstracts
ACS National Meeting
March 28-April 1, 2004
Dixel modeling of gene expression
N Sukumar1, Curt M. Breneman1, Kristin P. Bennett2, Charles Lawrence3, and Inna Vitol3. (1) Department of Chemistry, Rensselaer Polytechnic Institute, Cogswell Laboratory, 110 8th Street, Troy, NY 12180-3590, Fax: 518-276-4045, email@example.com, firstname.lastname@example.org, (2) Department of Mathematics, Rensselaer Polytechnic Institute, (3) Wadsworth Center
Sequence-specific binding of proteins to DNA is arguably the most important foundation of cellular function, since it exerts fundamental control over the abundance of virtually all cellular functional macromolecules. Identification of promoter sequences and transcription factor binding sites in the genome thus represents one of the grand challenges of the post-genomic era. The most successful bioinformatics methods today are based on models that represent DNA by sequences of letters (motif methods). Unfortunately, the sequence data used for training and validation is quite limited. Motif models are thus hampered both by small sample sizes and by an abstract representation that has little to do with the energetics of binding. It is here that cheminformatics can supply additional information and introduce a more accurate and sensitive chemical representation of DNA-protein interactions. Drawing upon our experience with E.coli transcription factors and sigma factors, we show how characterization of DNA through features of electron densities sampled on the vdW surfaces of the major and minor grooves (“Dixels”) captures the effects of environmental perturbations of neighboring base pairs, without requiring additional sequence data for training
Integration of biological and chemical information: Faster decisions from linked data and visualizations
Gavin M Fischer, Application Scientist, OmniViz Inc, 2 Clocktower Place, Suite 600, Maynard, MA 01754, email@example.com
Visualizations are the best way for people to understand data. Presenting anyone with long lists of numbers rarely helps the understanding of the data, never mind the interconnectedness within that data. This is even more true when crossing between domains, such as between chemistry and biology. Both sides understand, in theory if not practice, what the other is doing. However, the lack of a common language between them necessitates new approaches for integrating analysis; visualizations are a key to this. The understanding of HTS data, with linked biologic pathways illustrating the context in which the target is being tested, and microarrays showing how responses map against the genome, allow for more rapid decisions. Both chemists and biologists have analysis techniques that can, and should, aid the others. I will show some examples of this integration working, and talk about linking this with literature analysis to understand the BIG picture, whilst not losing sight of the details on either side.
The BioPrint® pharmaco-informatics platform: A large profile database for the development of relevant predictive models
Frédérique Barbosa, Molecular Modelling, Cerep, 128, rue Danton, 92500 Rueil Malmaison, France, Fax: 33 1 55 94 84 10, F.Barbosa@cerep.fr
Linking biological and chemical information for use in computational approaches in order to predict biological activity, ADME profiles and adverse drug reactions (ADR) is critical for enhancing the drug discovery process. However, modeling approaches have been hampered by the lack of large, robust and standardized training datasets. In an extensive effort to build such a dataset, the BioPrint® database is continuously constructed by systematic profiling of drugs available on the market, as well as numerous reference compounds (at present, BioPrint includes more than 2,200 compounds and 172 different assays). The database is composed of several large datasets: compound pharmacology profiles, and complementary clinical data including therapeutic use information, pharmacokinetics profiles and ADR profiles. These data have allowed the development of predictive QSPR and QSAR models. Models based on chemical structure are strengthened by in vitro results that can be used as additional compound descriptors to predict complex in vivo endpoints.
Keeping up with the changing face of Medline and MeSH - 3 keys to improving searches
Soaring Bear, MeSH, NLM/NIH, 8600 Rockville Pike B2E17, Bethesda, MD 20894, Fax: 301-402-2002, firstname.lastname@example.org
National Library of Medicine provides dozens of medical, chemical, sequence, and structural databases which can all be searched at one time with the new Entrez interface (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi) The information explosion requires prudent search strategies for quicker finding of the data gems you are seeking in the growing haystack of science results. Ambiguities of word meanings confound and frustrate. To help, the MeSH group of the National Library of Medline is continually updating the terms and concept structure of the MeSH indexing vocabulary (http://www.nlm.nih.gov/mesh/2003/MBrowser.html) used for Medline (http://Pubmed.gov). Some recent examples of these changes in biology and chemistry are described and how you can keep up with and use these changes for better search results. Three easy steps to better Medline searches will be presented by an NLM expert. A balance of widening (with OR terms) and narrowing (with NOT terms) can be facilitated with three tools provided by Pubmed: Details, Display Citation and Mesh Browser.
Steric and electronic requirements of enzyme reactions
Johann Gasteiger1, Martin Reitz1, and Oliver Sacher2. (1) Computer-Chemie-Centrum and Institute of Organic Chemistry, University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen 91052, Germany, Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de, (2) Molecular Networks GmbH
Genes express proteins, enzymes, that govern biochemical reactions. A more detailed understanding of these reactions requires an analysis of how the substrates fit into the enzymes and of the physicochemical effects influencing the bond breaking and making in enzyme reactions. In order to advance such studies we have built a database of biochemical pathways that represents chemical structures and reactions on the atomic level giving access to each atom and bond of the substrates of enzyme reactions. This database allows the study of transition state hypotheses of enzyme reactions. Furthermore, the analysis of the physicochemical effects operating at the reaction site allows a classification of enzyme reactions that goes beyond the traditional EC code for enzymes.
Linking chemical scaffolds to gene families to help elucidate molecular mechanisms
Chihae Yang1, Paul E. Blower1, Kevin Cross1, Glenn Myatt1, Wolfgang Sadée2, and Ying Huang2. (1) Leadscope, Inc, Columbus, OH 43212, Fax: 614-675-3732, email@example.com, (2) College of Medicine and Public Health, The Ohio State University
The significant investment in “omics” technologies and large amount of information generated by these new paradigms have not yet led to dramatic productivity increases in the drug discovery process. Linking biology to chemistry still remains the bottleneck. To link the vast amount of genomics information to small molecule discovery, we previously correlated the gene expression profiles of 60 NCI cancer cell lines to compound activity patterns of the same cell lines, resulting in many possible gene-compound pairs. In this paper, genes in specific biological process pathways were correlated with active chemical scaffolds, whose associations were used to build molecular hypotheses. Gene hierarchical classifications, based on biological process, were used to differentiate gene expression patterns of various cell types. The results from the gene hierarchy analysis are compared to other computational methods for extracting subsets of differentiating genes. This methodology allows us to extend our hypotheses from individual gene-compound pair mappings to a systems approach of linking gene families to compound scaffolds.
Streamlining drug discovery informatics: Accelerating the flow from gene to structure to pre-clinical candidate
Dean R. Artis, Informatics, Plexxikon Inc, 91 Bolivar Drive, Berkeley, CA 94710, Fax: (510) 548-4785, firstname.lastname@example.org
Plexxikon’s Scaffold-Based Drug Discovery™ platform relies on a unique combination of low-affinity biochemical screening of a proprietary target-neutral compound library and structural characterization via high-throughput x-ray crystallography, coupled to a powerful infrastructure for computational analysis and design that bridges traditional bioinformatics and cheminformatics. Use of these integrated systems has resulted in the identification of many novel chemical starting points with facile synthetic approaches and a target structure-directed optimization path. This has enabled the efficient synthesis of lead compounds with compelling bioactivity against proteins of interest in the kinase, phosphodiesterase and nuclear receptor families. Examples highlighting the role of Informatics approaches in Plexxikon’s efforts will be discussed, including efforts leading to the rapid development of a new class of anti-diabetic compounds with excellent potency, selectivity, pharmaceutical properties and in vivo efficacy.
Linking bioinformatics to cheminformatics in biological networks
Barbara A. Eckman, Life Sciences, IBM, 1475 Phoenixville Pike, West Chester, PA 19380, email@example.com, and Julia E. Rice, IBM Almaden Research Center
As high-throughput biology generates large volumes of data about the "parts list" of living organisms, the need grows for robust, efficient systems to manage metabolic and signaling pathways, chemical reaction networks, protein interaction networks, etc. Network data is arguably best represented as graphs, which are not well supported by standard relational database management systems. IBM Research is extending DB2 with advanced graph operations, to support such queries as: "Find all proteins related to protein A (i.e. within a given path length of A) in a protein interaction graph, and retrieve related assay results and compound structures.” “Find all pathways where compound x inhibits or slows a reaction, and retrieve Gene Ontology classifications for all proteins involved in the reaction.” “Find a subgraph of a large pathway that has the same structure and involves the same enzyme as the subgraph that I have circled, and retrieve associated protein and compound annotations.”
Technical and people disconnects hindering knowledge exchange between chemistry and biology
Christopher A. Lipinski, Exploratory Medicinal Sciences, Pfizer Global Research and Development, Groton Laboratories (retired), Eastern Point Road, mail stop 8200-36, Groton, CT 06340, Fax: 860-715-3149, firstname.lastname@example.org
Both technical and people factors hinder knowledge exchange between chemistry and biology. For both disciplines software effort is expended on data with little value. For example, capture and subsequent analysis of large volumes of primary HTS data is difficult because of the very high noise factor and hence is not very useful. Public access to primary literature data is very different between the disciplines. Much of searchable biology data is in the public domain while most of chemistry structural data is not. Batch mode data searching is feasible in biology but in chemistry batch mode searching capability is primitive. A problem exists with chemistry needs for batch mode chemical structure searching capability, for example with CAS SciFinder a leading software search tool. The time course of data capture and the very different complexity levels of gene and protein structure representation compared to chemical structure representation contribute to this issue. On the people side, software lags in capture of high level meta data, i.e. why decisions are made. Meta data capture is complicated by people issues particularly those between chemists and biologists. Discipline based disconnects occur distressingly often and are frequently overlooked as a cause of lost productivity. Many of the problems between chemists and biologists are directly traceable to differences in training and hence in attitudes and outlook. Most synthetic chemists are math averse and any type of communication to chemists relying on mathematical equations will be under appreciated or even ignored. Chemists are superb at pattern recognition but biologists are not. This causes confusion and conflict with biology when a medicinal chemist makes a judgment in just a few seconds as to the quality of a compound structure. Expert systems that could capture the pattern recognition skills of medicinal chemists are badly needed.
Relating chemical and biological space: An in-silico platform technology approach to accelerate the discovery of novel medicinally relevant small molecules
Stephan C. Schürer, Director, Content Development, Sertanty, Inc, 1735 N. First Street, Suite 102, San Jose, CA 95112, Fax: 408 487 4011, email@example.com
In the post-genomic era of drug discovery, a promising approach appears to be the systematic exploration of target families. It is critical in this process to utilize all available and relevant SAR data and consider various synthetic methodologies to most efficiently arrive at novel molecules that have desired properties and are also amenable to further optimization. Sertanty, Inc. has developed a discovery informatics platform – LUCIATM – that facilitates archival, sharing, integration, and exploration of synthetic methods and biological activity data. Using LUCIA, novel small molecules can be generated in-silico and prioritized against computationally efficient eScreensTM and ADMET models. eScreens are derived from an integrated gene family-wide SAR knowledge base and can improve as new experimental data is generated. Successful application of the technology has resulted in the identification of novel ABL Kinase inhibitors in a four month project and offers promise in both accelerating and enriching the success-rate of collaborative hit identification and lead optimization. Our next generation ChIP (Chemical Intelligence Platform) system explores chemical space in-silico based on forward analysis of synthetic pathways. Utilizing dynamic transforms that are generated from common representations of chemical reactions, ChIP prospectively “mix-n-matches” compatible synthetic strategies to generate novel compositions of matter with probable improvements in potency, selectivity and ADMET profiles.
Critical assessment of chemo- and bio-informatics applications development, or, "It's the infrastructure, stupid"
Doron Chema, Department of Medicinal Chemistry, Hebrew University of Jerusalem, School of Pharmacy, Jerusalem 91120, Israel, firstname.lastname@example.org
The increasing need for bridging chemo- and bio-informatics is an excellent opportunity to reassess the development of applications in these fields and the expected consequences of bridging together these disciplines. Examination of the current situation may lead to the conclusion that both fields currently suffer from a software crisis. This crisis involved several aspects of the application developing process. The data format standardization problem is a well-known aspect of this crisis, as many similar files and databases formats co-exit, sharing similar goals. Another aspect of this crisis may be called “too many tools for too small missions.” It is a fact that even a modest project usually demands developers to manage several code environments, which in turn were designed and implemented with a specific scientific goal(s) in mind. Ironically, the existence of many niche tools effectively causes the lack of appropriate developing tools. This may end in many times in a situation that much of the developing work is done from scratch, causing a huge waste of resources. It is our belief that these major difficulties, which can be found in high frequency in both fields are already causing major bottlenecks that have even higher potential to block or delay any significant progress of the integrated field. In this talk an approach for overcoming these barriers in the infrastructure level will be described, followed by introduction of a new infrastructure technology.
Cross-discipline analysis made possible with data pipelining
J.R. Tozer, SciTegic, Inc, 9665 Chesapeake Dr. #401, San Diego, CA 92123, Fax: 858 279 8804, email@example.com
While cheminformatics and bioinformatics use completely different data formats and analysis tools, the data pipelining approach makes is possible to apply them together. Chemical compound structures and activities can be processed in the same computing environment that analyzes gene expression profiles or protein sequences. We will discuss some interesting research questions that can only be addressed by the coordinated analysis in bioinformatics and cheminformatics (e.g., clustering gene targets using the correlation of their expression levels in a series of cells with the biological activity on those cells of a set of test compounds).
Informatics integration at Arena Pharmaceuticals
Gareth Jones, Arena Pharmaceuticals, Inc, 6166 Nancy Ridge Drive, San Diego, CA 92121, Fax: 8584537210, firstname.lastname@example.org
The development of platform-independent web-based computing allows ordinary users unprecedented access to corporate information. At Arena we have developed a web-based informatics system that allows all employees access to chemical, screening, genomic and gene-expression data. This system was designed specifically to allow users with little or no computing experience the ability to browse, analyze, update and edit chemical and biological data. This results in real-time distribution of experimental data and allows on the fly analysis and search of information. Additionally, communication between disparate groups working on the same project has been greatly facilitated.
The data system is based on a three-tier system with an Oracle database in the back-end. The middle tier comprises a web-server with perl CGI and Java programs. Extensive use has been made of Java applets on the client web-browser. A separate Linux cluster provides cheminformatics services to the middle tier, which are accessed using XML/RPC protocols.
Systematic bioactivity classification of ligands onto a protein target ontology: Application for library design and virtual profiling of a compound collection
Mark A. Hermsmeier1, Dora Schnur2, and Bradley C. Pearce1. (1) New Leads Chemistry, Bristol-Myers Squibb, P.O. Box 4000, Princeton, NJ 08543, Fax: 609-252-7446, (2) Compter Assisted Drug Design, Bristol-Myers Squibb
Profiling the in-silico biological content of our screening deck and the ability to create target class libraries are greatly facilitated using a data platform that integrates ligand databases and a protein target ontology. The data platform that has been developed integrates the non-proprietary Gene Ontology from the GO Consortium with three commercially available Ligand databases. The structures in these ligand databases have in turn been linked to the screening compounds by atom pairs similarity. The activity associations and similarity results are stored in a relational database for rapid retrieval of results. A web interface has been deployed that allows browsing the Protein Target Ontology and drilling down to view associated ligands in the commercial databases and similar structures in the screening deck. The data platform also allows rapid in-silico profiling of the screening compounds.
Proteomica™ – An integrated system for analysis of biological and chemical data
Michael Farnum1, Sergei Izrailev1, and Dimitris Agrafiotis2. (1) 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Dr, Exton, PA, PA 19341, Fax: 610-458-8249, email@example.com, (2) Research Informatics, 3-Dimensional Pharmaceuticals, Inc
In recent years, there has been an explosion of the amount of chemical and genomic data. Chemical information has been driven by high-throughput screening and analysis of large libraries of chemical compounds, both physical and virtual, while genomic information has been generated through full genome sequencing and annotation as well as by DNA microarray and other high-throughput experiments. The number of protein crystal structures deposited in the Protein Data Bank has also grown at an unprecedented rate. Much effort has been made to relate the structure and properties of chemical compounds to the structure and function of genes and proteins. However, chemical and protein sequence information has been largely analyzed separately, in part because very few databases and software packages provide the connectivity required for analyzing and browsing the data simultaneously. Proteomica™ is an architecture designed to integrate both types of information. It is leveraged by advanced dimensionality reduction techniques and provides the capability to visualize similarity in both the property space of small molecules and the sequence space of target proteins. Proteomica™ enables scientists to ask iterative questions about biochemical experiments by combining information from external and in-house sources. This presentation will demonstrate both the principles and implementation of the system.
Fedora: Federated access to chemical and biological data
Scott Dixon, Vera Povolna, and David Weininger, Metaphorics, 441 Greg Ave, Santa Fe, NM 87501, firstname.lastname@example.org, email@example.com
Fedora is a technology which enables the rapid development of special purpose HTTP servers designed for the analysis and integration of biological and chemical information. These servers containing seemingly disparate data can communicate with one another via a web browser and provide the capability to mine data for complex relationships. The Fedora servers include a metabolic pathway network (Empath), Protein-Ligand Association Network (Planet), Traditional Chinese Medicines (TCM), the World Drug Index (WDI), and others.
Case study of IP information management at a small pharmaceutical company
Susan Wollowitz, Wollowitz Associates, 455 Moraga Rd, Suite C, Moraga, CA 94556, Fax: 925-247-1289, firstname.lastname@example.org
A case study will be presented of how a small pharmaceutical company addressed their intellectual property information acquisition and document management needs. The situation was initially evaluated including the demand for IP creation and prosecution, the current capabilities and the operational contraints. Issues identified were a need for an improved document tracking system, better access to patent information and an ability to proactively monitor the competitive landscape. The presentation will discuss the options considered and selected as well as a retrospective evaluation of the decision success.
Low-income patent management
John Santacruz, Division of Small Chemical Businesses, 1263 Fulton Street, Rahway, NJ 07065, email@example.com
Patent management on a low-income budget is a growing concern for Small Chemical Businesses due to limited resources and multitasking of personnel. Two methods of legal representation that significantly reduce the annual costs of patent management will be discussed. The two methods will be compared to the traditional method of private law firm representation. The literature and laws in this area will be briefly reviewed.
Minimizing intellectual property cost - maximizing intellectual property return
Gianna Arnold, and Corinne Marie Pouliquen, Epstein Becker and Green, 1227 25th Street, NW, Suite 700, Washington, DC 20037-1175, Fax: 202-296-2882, firstname.lastname@example.org
Today’s small business owner faces a vast array of decisions related to the appropriate protection, utilization, and management of intellectual assets. This discussion will focus on tools and strategies to maximize the use of intellectual property dollars, by minimizing actual cost, and by maximizing return. Topics addressed include establishing a scientific advisory board; establishing process and screening criteria to obtain/maintain patents; promoting and easing the burden of invention disclosure; reducing costs associated with use of outside counsel; capitalizing on intellectual property as a business asset; and aligning intellectual property resources with corporate strategy.
Patent searching for small chemical businesses
Barbara Hurwitz, Barbara Hurwitz, consulting, 36 Waverly Street, Portland, ME 04103, Fax: 207-228-6418 - Abstract
Patent searches are run for small chemical companies either directly for the company or through the company’s outside counsel. Using three small businesses as case studies, we can see how interacting with these small companies differs from working with the staff of a large chemical and pharmaceutical company.
Information sources for small companies
Sandy Burcham, Service Is Our Business, Inc, 111 Lincoln Terrace, Norristown, PA 19403-3317, Fax: 610-630-0863, email@example.com
This paper will discuss the various sources available to small companies - in order to aid in the determination of the ways to best spend their resources.
Comparison of free Internet-based intellectual property (IP) tools with contracting IP research to third party information professionals
Michael I. Montembeau, and Gerri B. Potash, Nerac, Inc, 1 Technology Drive, Tolland, CT 06084, Fax: 860-872-7856, firstname.lastname@example.org
Chemical businesses, whether large or small, have an enormous need for intellectual property information. This need is particularly burdensome for small chemical businesses which often cannot afford to hire full-time information staff, let alone full-time patent information staff. As a result, the small chemical businesses are left to appointing a lead IP person, who must juggle their new IP duties with their research tasks and other duties.
This presentation will: 1) outline the tools and capabilities of the free internet-based intellectual property resources, 2) compare the internet-based resources with those of a third-party information, such as Nerac.com; and 3) discuss the advantages and disadvantages of each resource and how one would make effective use of these resources.
This presentation will also describe how chemical businesses can benefit, not only from the Intellectual Property resources at Nerac, but also from the use of the extensive chemical and engineering related databases Nerac has compiled as a research and analysis tool
Professional tools and services supporting the small to medium enterprise
Anthony J. Trippe, Science IP/Chemical Abstracts Service, 2540 Olentangy River Rd., Columbus, OH 43210, email@example.com, and Rebecca A. Wolff, Product Marketing Management, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: 614-461-7149, firstname.lastname@example.org
Employees at small to medium enterprises must wear many different hats. With each “hat” that they wear, they also strive to optimize their time, present a professional image, and add value to their work. CAS provides a number of tools and services that can assist the multi-hat wearer to not only meet these needs, but to also meet the needs of both their internal and external customers.
This presentation will explore how to use the latest STN software to:
1) take advantage of the patent content available on STN, 2) analyze the results to meet business critical needs, and 3) create professional-looking reports and tables.
For smaller organizations in particular, without the benefit of a sizable staff of information professonals, certain projects may require additional expertise or outside assistance to meet a critical deadline. For these situations, CAS has created Science IP, the CAS Search Service. This function is staffed with searching and analysis experts who can assist on a project by project basis. During this presentation, examples of searches with legal ramifications will be discussed and details will be provided on the advantages of working with Science IP on these types of requests.
The Questel-Orbit alternative for chemical information
Elliott Linder, Questel*Orbit, Inc, 7925 Jones Branch Drive, Fax: 703.873.4701, ELinder@questel.orbit.com, and Joseph M Terlizzi, Questel-Orbit, 8000 Westpark Drive, email@example.com
For over 25 years, Questel·Orbit has offered information specialists an extensive collection of online patent databases containing chemical information. For broad subject searching, the European, International, and US classifications in our exclusive PlusPat database can be used, with easy lookup using the ECLA and USPCL dictionary files. Narrower searching can be conducted using the US, EP, and PCT full-text databases. For specific chemical searching, our exclusive Merged Markush Service (MMS) for chemical structure searching is available, as are codes and indexing in databases produced by Derwent, IFI, CAS, INPI, and others. Special features allow the creation of “super” display records composed of fields from any database on the system. The standardization of patent numbers system-wide makes cross-file searching for complementary information simple. Built-in statistical analysis tools are easy-to-use and valuable for competitive intelligence. This presentation will review how the techniques and features outlined above are applicable for small chemical businesses.
Instruments on the Grid: UK national crystallography grid service
Jeremy G. Frey, Chemistry, University of Southampton, Department of Chemistry, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 23 8059 3781, firstname.lastname@example.org
We will describe the processes and infrastructure needed to develop and deploy a grid service for access to and interaction with the UK EPRSC National Crystallography Service (NCS) developed as part of the CombeChem e-Science Pilot Project and with the assistance of the Centre of Excellence in Combinatorial Chemistry, all largely based at the University of Southampton. UK. Special consideration will be given to a discussion of the sample tracking database and the implementation needed to run this national service, the implications for the security of the service, and the system employed to meet these requirements. The user interface, archiving methods and notification systems will also be described along with the results of the initial users experience.
Computational science and engineering online: A web-based grid-computing environment for research and education in computational science and engineering
Thanh N. Truong, Department of Chemistry, University of Utah, 315 S, 1400 E, Room 2020, Salt Lake City, UT 84112, Fax: 801-581-4354, email@example.com
We present the development of an integrated extendable web-based simulation environment called Computational Science and Engineering On-line (CSEO) that allows computational scientists to perform research using state-of-the-art tools, querying data from personal or public databases, discuss results with colleagues, and access resources beyond those available locally from a web browser. Currently, CSEO provides an integrated environment for multi-scale modeling of complex reacting systems. A unique feature of CSEO is in its framework that allows data to flow from one application to another in a transparent manner. A particular example is demonstrated to show how results from fundamental quantum chemistry simulations are used to calculate thermodynamic and kinetic properties of a chemical reaction, which subsequently are used in the simulation of a combustion reactor. Advantages, disadvantages, and future prospects of a web-based simulation approach are then discussed. CSEO can be accessed at http://cseo.net.
Grid computing: How applications are finally catching up to the technology
Chris Crafford, Engineering, United Devices, 12675 Research Blvd., Bldg. A, Austin, TX 78759, Fax: 512-331-6235, firstname.lastname@example.org, and Seetharamulu Peddaiahgari, Director, Life Sciences Applications, United Devices
The completion of the human genome has transformed drug discovery and molecular targeting, vastly increasing the potential number of druggable targets as well as information about their possible binding sites. Computer power is essential to identifying and learning more about these targets. With the appropriate grid solution, researchers can explore drug actions, speed the development cycle and reduce costs, without sacrificing precision. Several research organizations and top pharmaceutical companies are already using the technology to gain a competitive edge. Multiple case studies will be presented illustrating how researchers, with the help of top application providers are using grid computing now to achieve success.
Virtual screening using grid computing
W Graham Richards, Central Chemistry Lab, University of Oxford, South Parks Road, Oxford, OX1 3QH, United Kingdom, email@example.com
The screen saver project currently involving the Chemistry Department at the University of Oxford, United Devices Inc and Accelrys Inc now involves some 2.5 million PCs in over 220 countries and has provided more than 250,000 years of CPU time: an effective 100 teraflop facility. Such power permits the virtual screening of billions of drug-like molecules against defined protein targets within days or weeks. A review of the project and the results obtained so far and future opportunities will be presented.
OpenMolGRID, a Grid-based large-scale drug design system
Laszlo Urge1, Ákos Papp1, István Bágyi1, Géza Ambrus2, and Ferenc Darvas1. (1) ComGenex Inc, 33-34 Bem rpk, Budapest, H-1027, Hungary, Fax: +361-214-2310, firstname.lastname@example.org, (2) RecomGenex, Ltd
Pharmaceutical companies are facing the challenges that modern drug discovery requires precise "high-throughput" in silico systems that are not only able to handle millions of structures, but can also give accurate predictions for the requested properties. On the other hand, mergers in the pharmaceutical industry demand the integration of geographically distributed information and computation resources. These challenges make indispensable the usage of GRID systems. As a consequence, chemical applications developed for traditional environments have to be redesigned to meet the requirements of this new technology. OpenMolGRID is going to be one of the first realizations of the GRID technology in drug design. The system is designed to build forward- and reverse-QSAR models, and generate novel structures with favorable properties. The lecture details the realization of implementing traditional chemical IT tools to solve large-scale library design scenarios. The development of OpenMolGRID is partly funded by the European Commission (IST-2001-37238).
BioSimGRID: A distributed database for biomolecular simulations
Jonathan W Essex1, Kaihsu Tai2, Stuart Murdock1, Muan Hong Ng3, Bing Wu4, Steve Johnston3, Hans Fangohr3, Paul Jeffreys4, Simon Cox3, and Mark Sansom2. (1) School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 (0)23 8059 3781, email@example.com, (2) Department of Biochemistry, University of Oxford, (3) e-Science Centre, University of Southampton, (4) e-Science Centre, University of Oxford
Biomolecular simulations provide data on the conformational dynamics and energetics of complex biomolecular systems. We aim to exploit the Grid infrastructure developing in the UK to enable large scale analysis of the results of such simulations. The BioSimGRID project (www.biosimgrid.org) will provide a generic database for comparative analysis of simulations of biomolecules of biological and pharmaceutical interest. The system will have a service-oriented computing model using Grid-based Web service technology to deliver analysis. Data mining services will be provided for the biomolecular simulation and structural biology communities, using a Python scripting environment. To address the security problem of the heterogeneous BioSimGRID environment, a Grid certificate-based and a user/password-based authentication mechanism will be integrated across the system. The back-end of BioSimGRID is based on a relational database, with appropriate indexing to optimize performance of the analysis package.
Comb-e-Chem: GRID-enabled chemical crystallography and a new opportunity for structural chemistry
Michael B. Hursthouse, Department of Chemistry, University of Southampton, Southampton SO17 1BJ, United Kingdom, Fax: 44-2380-596723, M.B.Hursthouse@soton.ac.uk
We are exploring the feasibility of an e-Science approach to provide an integrated, GRID-enabled, Chemical Structure and Property Environment, incorporating a co-ordinated high-throughput crystal structure determination and property measurement capability, with distributed structure and property calculations and data-base mining. We developing new software for automated pattern searching in crystal structures, with a view to learning more about crystal structure assembly, polymorphism and materials properties. In a related E-Bank project, we are developing procedures for automated archiving and dissemination of fundamental data, subsequent processing and calculations, and the derived knowledge, so that publications in which the new information can be assessed and presented, are not compromised by the need to carry with it the data. This presentation will report and review the status of these activities.
Semantic Grid computing - the WorldWideMolecularMatrix
Yong Zhang1, Robert C. Glen2, Peter Murray-Rust3, Henry S. Rzepa4, and Joe A Townsend2. (1) Unilever Centre for Molecular Sciences Informatics, University of Cambridge, Lensfield Road, Cambridge, United Kingdom, firstname.lastname@example.org, (2) Department of Chemistry, Unilever Centre for Molecular Science Informatics, (3) Unilever Centre for Molecular Informatics, University of Cambridge, (4) Chemistry, Imperial College
The Semantic Web is Tim Berners-Lee's vision of knowledge-based computing for the Web. We have shown how this can be adapted to chemistry. Our implementation uses XML-CML for molecules and properties and the new IChI as a unique key calculated directly from the connection table. A molecule can be precisely differentiated from any other and retrieved by conventional database methods.
The NCI database has ca 250,000 molecules which we converted into CML using openbabel. These are stored in a native XML database, Xindice, and searched by the XPath language. We can retrieve molecules within 50 milliseconds.
Molecular properties were calculated using MOPAC2003, using Condor and the spare cpu time on 24 PCs. Times per molecule varied from 0.5 sec to 500,000 seconds; the calculations took 4 months.
The XML results are Openly available on our WorldWideMolecularMatrix, WWMM. A chemist submits a molecule. If its properties already exist they are returned; otherwise the computation is run. For new molecules the results are provided through a RSS system (CMLRSS).
The system is a peer2peer Grid for chemical information and computation. The software can be downloaded and we invite other groups to run servers with varied functions so a Semantic Grid for chemistry becomes possible
Adaptive informatics infrastructure for multi-scale chemical science
James D. Myers1, Larry Rahn2, David Leahy2, Carmen M. Pancerella2, Gregor von Laszewski3, Branko Ruscic4, and William H. Green Jr.5. (1) Collaboratory Group Leader, Battelle / Pacific Northwest National Laboratory, Battelle Blvd. MS K1-87, Richland, WA 99352, Fax: 509-375-6631, email@example.com, (2) Sandia National Laboratories, (3) Mathematics and Computer Science Division, Argonne National Laboratory, (4) Chemistry Division, Argonne National Laboratory, (5) Department of Chemical Engineering, Massachusetts Institute of Technology
The Collaboratory for Multi-scale Chemical Sciences (CMCS, cmcs.org) is enabling the flow of information across physical scales and scientific disciplines ranging from subatomic quantum chemistry to predictive simulations of chemical processes such as combustion. CMCS is using advanced collaboration and metadata-based data management technologies to develop a portal providing distributed research support, community interactions, and data discovery, management, and annotation capabilities. The portal assists in documenting and browsing data pedigree and in communicating dependencies between data produced at one scale and computations using it at the next. A variety of standards-based mechanisms for extracting metadata from files, translating between schema, converting data formats, and integrating external applications (such as Active Thermochemical Tables) are being developed to minimize the work required to adopt CMCS capabilities. These capabilities are being piloted by involving key national chemistry resources (data and software) and by supporting distributed groups performing informatics-based chemical research in combustion science.
The application of distributed computing to computer simulations
Jonathan W Essex1, Christopher J. Woods1, Adrian P. Willey1, Luca A. Fenu1, Andrew C. Good2, Andrew R. Leach3, Richard A. Lewis4, and Jeremy G. Frey1. (1) School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 (0)23 8059 3781, firstname.lastname@example.org, (2) Structural Biology and Modeling, Bristol-Myers Squibb, (3) Computational Chemistry and Informatics, GlaxoSmithKline Research and Development, (4) Lilly Research Centre
Distributed computing is a very popular, and potentially very powerful, approach for accessing large amounts of computational power. Under the umbrella of the comb-e-chem project, we have examined both freely available, and commercial distributed computing software. In this paper, our experiences will be described. The performance of coarsely parallel computations, such as protein-ligand docking, and more tightly coupled replica-exchange molecular dynamics computer simulations will be assessed. Issues of security will also be discussed, and in particular how security determines the availability and utility of computers within a large organisation.
Virtual Research Parks enable multi-organizational collaboration
Gary G Benesko, Life Sciences, IBM, 755 Cypress Rd., St. Augustine, FL 32086, Fax: 419-735-6288, email@example.com
A Virtual Research Park (VRP) is a secure, state-of-the-art, Web-based research environment that supports and facilitates joint R&D, collaboration, and commercial activities among Life Science Communities¨ whose boundaries extend beyond any one enterprise or geography. Each Community can consist of multiple related organizations and individuals united by common interests, such as
Structure-activity relationships for the design of molecules (STARDoM): The development and implementation of grid-enabled, automated predictive QSAR modeling
Alexander Tropsha1, Scott Oloff2, Alexander Golbraikh1, Chi-Duen Poon3, Terry O'Brien4, Michael Blocksome4, Rich Dulaney4, Madhu Gombar4, and Virinder Batra4. (1) Laboratory of Molecular Modeling, School of Pharmacy, The University of North Carolina at Chapel Hill, 301 Beard Hall, CB# 7360, UNC-CH, Chapel Hill, NC 27599, firstname.lastname@example.org, (2) Department of Pharmacology, University of North Carolina at Chapel Hill, (3) Department of Chemistry, University of North Carolina, (4) IBM Life Sciences
QSAR models are typically generated with a single modeling technique. Our research has demonstrated that multiple models should be generated for any dataset to ensure their statistical significance, and predictive power. We have developed a combinatorial QSAR approach which explores all possible combinations of various descriptor sets and optimization methods coupled with external model validation. This approach required integration of multiple individual protocols dealing with descriptor generation, model development and validation, and model application to external database mining to identify potentially active hits. The integration of the protocols developed at UNC was achieved in collaboration with the IBM’s Life Sciences team using the WebSphere framework and implemented on the North Carolina BioGrid through a Globus Toolkit. This solution is automated, efficient, and accessible to users via a web interface. It was successfully applied to the discovery of novel anticonvulsant agents as well as novel ligands of the P2Y12 receptor.
Development of a personal computing environment for molecular design on Grid
Umpei Nagashima1, Takeshi Nishikawa1, Satoshi Sekiguchi1, Sumie Tajima2, Toru Yagi2, Takeshi Kitayama2, and Makoto Haraguchi2. (1) Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology, Umezono 1-1-1, Tsukuba, Japan, Fax: +81-29-861-5301, email@example.com, (2) Bestsystems Inc
We are developing a personal computing environment for molecular design on Grid as an attempt of computational chemistry on Grid environment. In this talk, we introduce tow products: Molworks(http://www.molworks.com) and Gaussian Portal. MolWorks supports molecular modeling, input data generation, output analysis and Job controls of Molecular orbital calculation on Grid. Property estimation function of molecules is also supported. Gaussian Portal is an attempt to construct a framework for Grid-enabled application service provider. These tow products are expected to realize a desktop virtual laboratory for Chemists and achieve high throughput by PC clusters, supercomputers and databases integration with intelligent scheduler.
Heterojunctions of nanomaterials and organic-inorganic nanoassemblies
Cengiz S. Ozkan, Electrical and Chemical Engineering, Biomaterials and Nanotechnology Laboratory, Center for Nanoscience Innovation for Defense, University of California, Riverside, CA CA 92521, firstname.lastname@example.org
Nanomaterials including carbon nanotubes and nanocrystals have considerable potential as building blocks in future nanoelectronics and bio-nanotechnology applications. The unique electrical, mechanical, and chemical properties of CNT’s have made them intensively studied materials in the field of nanotechnology within the last decade. Nanocrystals or quantum dots provide a remarkable opportunity for designing artificial solids, since they possess unique and controllable physical and chemical properties based on composition, structure and their size. Another heavily investigated area includes the conjugation of inorganic nanomaterials with biomolecules including DNA and protein for various applications in bio-nanotechnology. In this talk, I will first describe approaches for the synthesis of nano-assemblies of carbon nanotubes and quantum dots. Such functional nanostructures could become better alternatives for the fabrication of nanoscale electronic and photonic devices. They could also be useful for the bottom-up assembly of nanosystems as part of larger or microsystem technologies. Detailed chemical and physical characterization of the nanostructures will be presented via transmission electron microscopy and Fourier transform infrared spectroscopy. Next, approaches for encaspulating biological molecules including DNA inside carbon nanotubes which could be useful for a number of applications including novel electronics, DNA sequencing and drug delivery systems will be presented. DNA-oligo labeled with nano-colloid particles are encaspulated into multiwalled carbon nanotubes and the nanoassemblies are characterized via transmission electron microscopy and energy dispersive spectroscopy.
Effects of the presence of nanotubes on heat transfer in microfluidics
Nishitha Thummala, and Dimitrios V Papavassiliou, School of Chemical Engineering and Materials Science, The University of Oklahoma, 100 E Boyd, SEC T-335, Norman, OK 73019-1004, Fax: 405-325-5813, email@example.com
The drive for technical advancements in the micro/nano world, emerging from the desire to manipulate flow fields at smaller and smaller scales, is indeed challenging. An effective and reliable numerical tool for the analysis of transport properties in microfluidics is the Lattice Boltzmann Method (LBM). It can efficiently link the microscopic and macroscopic phenomena. Our group is using LBM to simulate single-phase flow in configurations like parallel plates, porous media. The paper will focus on simulation of heat transport from surfaces that have nanotubes aligned vertically as line sources or horizontally as point sources. Lagrangian Scalar Tracking (LST) methods are used to track the trajectories of heat particles released in the flow field, and to synthesize the behavior of the mean temperature profile from the behavior of the instantaneous sources of heat. The effect of the presence of nanotubes on the heat transfer characteristics will be discussed.
Computational nanotechnology: Bridging lengthscales with Materials Studio
Amitesh Maiti, Gerhard Goldbeck-Wood, and Scott Kahn, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, firstname.lastname@example.org, email@example.com
Nanotechnology holds tremendous economic and scientific potential, yet it will cost industry a considerable amount of time, money, and resources to research and develop new processes, devices, and synthesis techniques. The use of rational materials discovery software tools in conjunction with experimentation can lower this barrier significantly, and lead to new insights that may not be possible otherwise. Technologically important nanomaterials come in all shapes and sizes. They can range from small molecules to complex composites and mixtures. Depending upon the spatial dimensions of the system and properties under investigation, computer modeling of such materials can range from first-principles Quantum Mechanics, to Forcefield-based Molecular Mechanics, to mesoscale simulation methods, to the prediction of structure-property relationships. All of the above computational techniques are available in Accelrys’ integrated PC platform Materials StudioTM, as illustrated through a number of recent applications: (1) carbon nanotubes (CNTs) as nano electromechanical sensors (NEMS); (2) Metal-oxide nanoribbons as chemical sensors; (3) mesoscale modeling of polymer-CNT nanocomposites; and (4) mesoscale diffusion of drug molecules across cell membranes.
Another big challenge for the nanotechnologist is the very large space of possible material parameters and processing routes. Recent developments in Materials Informatics provide crucial knowledge management and data mining tools for better, cheaper and faster materials development. Design of Experiment, Combinatorial and High Throughput materials design software help to focus research and development on the most promising areas.
Chemical information resources for nanotechnology
Robert A Stembridge, Global Marketing Services, Thomson Scientific, 14 Great Queen Street, London, United Kingdom, firstname.lastname@example.org
Nanotechnology is a young area dating back to Richard Feynman's intellectual demonstration in 1959 of the possibility of placing a facsimile of the entire Encyclopaedia Britannica on a pin-head. Much information is still in the realm of research papers published in learned journals and on the web, but increasingly practical applications of the technology are appearing in the patent literature, particularly in the area of chemical nanotechnology. This paper will illustrate these trends, examine the challenges for the user of tracking multiple sources of this information and discuss possible solutions to these problems.
A method for estimating the composite solubility vs. pH profile
Michael B. Bolger, Pharmaceutical Sciences, USC School of Pharmacy and Simulations Plus, Inc, 1985 Zonal Ave. PSC 700, Los Angeles, CA 90089, Fax: 323-442-1390, email@example.com, Christel Bergstrom, Department of Pharmacy, Uppsala University, Robert Fraczkiewicz, Life Sciences Department, Simulations Plus, Inc, and Per Artursson, Division of Pharmaceutics, Uppsala University
Purpose: To predict the shape of the composite solubility vs. pH profile by using purely in silico estimation. Method: The complete solubility vs. pH profile for 25 monobasic drug molecules was collected and molecular descriptors were generated using QMPRPlus. We then examined relationships between intrinsic solubility and several other molecular descriptors to predict the solubility factor (ratio of solubility for ionized over unionized). Results: A simple linear relationship between intrinsic solubility and solubility factor showed that the solubility factor is inversely proportional to the experimental value of intrinsic solubility. We then developed a multiple linear regression equation to predict log of solubility factor using intrinsic solubility and number of hydrogen bond donors and acceptors as independent variables. Conclusions: A relationship between log of intrinsic solubility and solubility factor, when corrected for the number of hydrogen bond donors and acceptors can provide a good estimate of salt solubility for a small set of monoprotic basic drugs.
A systematic name generator module for Marvin
Szilveszter Juhos, Gyorgy Pirok, and Ferenc Csizmadia, ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary, Fax: +36 1 4532659, firstname.lastname@example.org
Constructing systematic names for single molecules based on IUPAC rules can be rather time-consuming and requires chemists experienced in complex nomenclature. Naming a large number of structures manually is practically impossible so several automatic name generating software tools have been developed.
Our module is a platform-independent Java plugin linked to Marvin to facilitate generating IUPAC names for individual molecule sketches or for whole databases via batch processing. It can be easily integrated into other Java applications or applied over intranet/web pages. The throughput and accuracy of name generation will be demonstrated in the poster.
Chemical information in Medline/PubMed
Beryl M. Benjers, Index section, National Library of Medicine, Bethesda, MD 20894, Fax: 301-402-2433, email@example.com
MEDLINE contains more than 12 million citations from 1966 to present. Pre-1966 citations are now being added in the OldMEDLINE. More than 4,500 journals in languages from around the world are indexed. Last year over 537,000 indexed citations were added to MEDLINE. Indexers analyze the article and index at an average rate of four articles/hour, applying 8-10 subject terms from MeSH, NLM’s controlled vocabulary. New indexers attend a rigorous two-week training course at NLM and then work closely with a reviser, who reviews their work. An asterisk with a MeSH subject term indicates the main point of an article, and that the article will be cited under that term in Index Medicus, the print counterpart of MEDLINE. MEDLINE citations and abstracts are available as the primary component of NLM’s PubMed database and retrieval system, which is searchable free-of-charge via the Internet.
MeSH contains 22,568 descriptors, of which 7,355 are chemical descriptors, supplemented by 138,526 chemical concepts (Supplementary Concept Records). New MeSH descriptors are added annually while Supplementary Concept Records are added daily as they are encountered in the indexed literature. New chemicals are electronically flagged for the chemical specialists, who study, research, update, and/or create new records as needed, and add them to the indexed citation and MeSH Browser. This allows MEDLINE citations to be indexed with the existing terms as well as the new ones.
MEDLINE indexing of chemical concepts includes coordination with a Pharmacological Action (PA) when appropriate. Indexing Information (II) terms may also be added with chemicals (e.g. disease/organism associated with a chemical).
The MeSH Browser is available at http://www.nlm.nih.gov/mesh/2004/MBrowser.html and can be searched by MeSH terms, Supplementary Concepts, ID, II, PA, RN, RR and EC numbers. MEDLINE/PubMed can be searched by MeSH terms, Supplementary concepts, authors, text words, journal, etc.
The National Library of Medicine (NLM) Home pages (http://www.nlm.nih.gov) offer information and links to other databases, such as MEDLINEplus and CHEMIDPlus.
Conformational folding process of a small-peptide predicted by using CONFLEX conformation search and GRID technology
Hitoshi Goto1, Kazuo Ohta2, Umpei Nagashima3, Yoshihiro Nakajima4, Mitsuhisa Sato4, and Hiroshi Chuman5. (1) Department of Knowledge-based Information and Engineering, Toyohashi University of Technology, Toyohashi 441-8055, Japan, Fax: 81-532-48-5588, firstname.lastname@example.org, (2) Conflex Corporation, (3) Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology, (4) Graduate School of Systems & Information Engineering, University of Tsukuba, (5) Faculty of Pharmaceutical Sciences, University of Tokushima
Among the fundamental problems in elucidation of biomolecular functions with the aid of theoretical and computational chemistry, the first difficulty to overcome is the conformational flexibility problem, especially, related to the folding problem of proteins. To resolve these challenging problems, we have started on improvements of our original conformational space search method gCONFLEXh using parallel computing and Grid techniques. In the previous ACS meeting, we reported a master-and-worker parallelization and GRID world-wide distributed computing techniques used in CONFLEX conformation search algorithm, and those performances data of some small peptides. In this Anaheim meeting, a folding process of a small polypeptide, which is predicted by conformational analyses using a clustering technique based on the conformational distance matrix among backbone conformations, will be presented. Some interesting animations and movies are also demonstrated.
Combining fingerprints and other descriptors in virtual HTS
Zsuzsanna Szabo, Miklos Vargyas, Ferenc Csizmadia, and Gyorgy Pirok, ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary, Fax: +36-1-453-2659, , email@example.com
Various aspects of virtual screening using molecular descriptors of 2-dimensional chemical structures have been investigated over the last two years at ChemAxon. The work involved the implementation of various descriptors and metrics as wellas the optimization of some of the parameters. The poster to be presented summarizes our results to date.
When setting up a virtual screening experiment, researchers are faced with the problem of choosing the right combination of the available descriptors. Additionally, some descriptors may allow several parameters which overall increases the degree of freedom dramatically. Finally, when comparing descriptor values one can choose from numerous dissimilarity metrics. To cope with this freedom of choice an automated optimization tool has been implemented.
This tool has proved to be successful in helping chemists to choose suitable descriptors, metrics and parameter values for virtual screening. It will be demonstrated that optimization can increase the enrichment ratio of the screening procedure
Drug discovery using grid technologies and DrugML
Michiaki Hamada, Science and Technology Group, Fuji Research Intstitute Corporation, Tokyo 101-8443, Japan, firstname.lastname@example.org, Yuichiro Inagaki, Science and Technology Group, Fuji Research Institute Corporation, Tokyo 101-8443, Japan, email@example.com, Hitoshi Goto, Toyohashi University of Technology, Umpei Nagashima, National Institute of Advanced Industrial Science and Technology, Shigenori Tanaka, Toshiba Research and Development Center, and Hiroshi Chuman, Tokushima University
A number of computer resources, such as CPUs and storages, can be connected over networks to construct a huge virtual computing environment using grid technologies. Our project "g-Drug Discovery" aims at developing a platform for drug design using grid technologies, on which various analysis and calculations are conducted, such as molecular mechanics method, replica exchange method, docking with proteins, molecular orbital method, and 3-dimensional quantitative structure activity relationship. For storing data of structures of compounds, descriptors, and calculation results, we are making DrugML by extending CML. One can use these grid technologies with DrugML in from rough screening with drug likeness or ADMET properties to screening by very precise calculation.
Investigation of molecular chirality in 3D chemical structure databases
Zengjian Hu1, William M. Southerland1, and Shaomeng Wang2. (1) Department of Biochemistry and Molecular Biology, Howard University College of Medicine and the Howard University Drug Discovery Unit, 520 West Street, Northwest, Room 324, Washington, DC 20059, firstname.lastname@example.org, email@example.com, (2) Departments of Internal Medicine and Medicinal Chemistry, University of Michigan
In recent years, virtual screening of chemical databases using molecular docking has emerged as the most important tool and a well-established method in drug discovery for finding new leads. The first step in virtual screening is to create a searchable database of three-dimensional structures of small. In the past few years, we have created 9 small molecule 3D searchable databases which contain more than 1,000,000 molecular entries, and could be used to discover interesting ligands for various pharmaceutical targets. When production of 3D chemical databases for screening purposes, we found that there is no information about absolute stereochemistry (R-S) and double bond geometry (E-Z) of most compounds contained in the 2D chemical database connection tables. Today more than 50% of marketed drugs are chiral. Chiral drugs have become a major focus of most pharmaceutical companies, which are safer, exhibit fewer side effects, and are more potent than the drugs previously used. As chiral molecules will certainly play a role in the exploitation of 3D space for the development of new drugs, the creation of a 3D database with the consideration of chirality of molecules will be beneficial for the discovery of lead compound binding to molecular targets. As the first step, we analyzed the chirality of molecules in our 10 three-dimensional databases. It was found that about 29% of the compounds in these databases were chiral compounds with about 62% compounds in CGE database being chiral compounds while only about 14% compounds in MCC database have chirality. It could be seen that most chiral molecules in these 3D databases have only one chiral center, but it is not rare for compounds with more than 10 chiral centers. The maximum of chiral centers in a molecule could be more than 60. It is well known that in general, if a molecule has n chiral centers, there are 2n different possible stereoisomers. Therefore, the entries in a 3D databases considering chirality will be doubled for molecules with one chiral center if there is no any symmetry elements in the molecule. The creation of th---[ABSTRACT CUT OFF]
Molecular modelling for organic chemists: A chemical informatics problem
Jonathan M Goodman, Unilever Centre for Molecular Science Informatics, Cambridge University, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, J.M.Goodman@ch.cam.ac.uk, and María A. Silva, Unilever Centre for Molecular Science Informatics, University of Cambridge
Both molecular modelling and organic chemistry generate and use large amounts of information, which should be mutually beneficial. However, it can be difficult to persuade experimental organic chemists to use molecular modelling, as force field methods cannot be applied to many transition states and molecular orbital methods are too slow to calculate the behaviour of many reactions before the experimental result makes the calculation of less immediate interest. We use a combination of molecular mechanics and molecular orbital methods in a ‘Chemical Information Laboratory’ (http://www.ch.cam.ac.uk/SGTL/gle/) in order to gain information of experimental relevance quickly enough to be useful. For example, chemical information has been generatedabout the molecules illustrated using this process, so improving our knowledge of structure and reactivity.
Chemical education markup language: An XML namespace for educational chemistry software
Daniel C. Tofan, Department of Chemistry, State University of New York, Stony Brook, NY 11794-3400, Fax: 631-632-7960, firstname.lastname@example.org
The Chemical Education Markup Language (ChEdML) is being developed as an XML namespace to allow learning management systems to include chemical content. ChEdML was initially intended to provide extensions to the current IMS specifications for question and test item interoperability (QTI) XML binding. Such extensions allow authors to create items containing responses that use chemical symbolism. Examples include chemical reactions, electron configurations, Lewis structures, measures with units etc. Tags were also developed to format chemical information for display on web pages. A complete XML tag set is now under development to encompass a full curriculum of introductory chemistry. ChEdML also provides a mechanism to parameterize items and to include equations to calculate numeric responses. This allows the generation of item templates that can be instantiated at runtime with appropriate parameters. A Java API is being developed to support the generation and use of ChEdML.
Oligopeptide transporter (PepT1) homology model based on lactose permease (LacY)
Michael B. Bolger, Pharmaceutical Sciences, USC School of Pharmacy, 1985 Zonal Ave. PSC 700, Los Angeles, CA 90089, Fax: 323-442-1390, email@example.com
Purpose. To build a homology model of the oligopeptide / proton co-transporter PepT1 based on the crystal structure of bacterial lactose / proton co-transporter. Methods. The centers of transmembrane spanning domains (TMDs) in LacY plus the 22 amino acids that comprise each of the twelve TMDs were selected. The software package “Proteotoolbox™” was used to guide the threading of the sequence of PepT1 onto the 3D-structure of LacY to allow for maximal overlap of the 2D and 3D hydrophobic moments. Finally, the experimental results for site-directed mutagenesis were examined in light of this new homology model to identify structural basis for those results. Results. Site directed mutation results and cysteine-scanning for TMD 5 and 7 were explained on the basis of the PepT1 model. The new model helps to explain the involvement of key histidine residues in the proton translocation process. Conclusions. The new 3D model extends and enhances our previous results (J. Pharm. Sci. 87(11):1286 1998) and provides additional insight into the structure and function of the oligopeptide transporter.
Multi-conformational 3D databases: Quality assessment and pharmacophore search capabilities in MOE
Morten Langgaard, Berith Bjornholm, Anne Marie Munk Jorgensen, and Klaus Gundertofte, Department of Computational Chemistry, H. Lundbeck A/S, Ottiliavej 9, Dk 2500 Valby, Denmark, Fax: +45 3643 8237, firstname.lastname@example.org
In this study we report our experiences with the software solution MOE with respect to building multi-conformational databases and performing pharmacophore searches. Template pharmacophores derived from crystal structures of known protein-ligand complexes as well as classically derived pharmacophore models are used for the evaluation. Conformational coverage and the quality of each conformation of the developed multi-conformational 3D databases are evaluated thoroughly. The analysis of the search results focusses on hit rate, quality of hits, and the impact of pharmacophoric element selections for the query. Practical issues like speed, storage and management of databases are also addressed. The performance of MOE with respect to the above-mentioned issues will be discussed and compared to the more established method Catalyst.
A combinatorial DFT study of how cisplatin binds to purine bases
Leah Sandvoss, and Mu-Hyun Baik, Department of Chemistry, Indiana University, 1200 Rolling Ridge Way #1311, Bloomington, IN 47403, email@example.com
Cisplatin (cis-diamminedichloroplatinum(II)) continues to attract much attention because of its therapeutic importance as an anticancer drug. It binds primarily to the N7 positions of adjacent guanine (G) sites in genomic DNA, causing intrastrand cross-links, which suppress replication and lead ultimately to cell death. Previous work showed both kinetic and thermodynamic preference of G over adenine for the platination reaction. The goal of this study is to obtain a chemically intuitive explanation for this selective behavior of cisplatin by systematically comparing the electronic structures of a diverse set of functionalized purine bases. A computational combinatorial library of over 1500 purine derivatives was designed based on density functional theory calculations and the changes of the most important molecular orbitals as a function of structural variance were examined in detail. This electronic profile for purine bases reveals how electronic hot spots control the reactivity at the N7 position (see figure).
Study of selectivity from a pharmacophore perspective
Klaus Gundertofte, Berith Bjørnholm, and Morten Langgård, Department of Computational Chemistry, H. Lundbeck A/S, Ottiliavej 9, Dk 2500 Valby, Denmark, firstname.lastname@example.org
A number of pharmacophore models covering G protein-coupled receptors and transporters primarily from the monoaminergic families of targets have been developed. The general methodology will be described as well as performance of different methods, e.g. MOE and Catalyst, applied in the development. In order to elucidate selectivity issues across the targets studied, a comparison of the models characterised by their pharmacophoric elements was done. The analysis of the pharmacophore patterns revealed remarkable resemblances or superpharmacophores. Distinct differences between the models were also found. The impact of these findings in medicinal chemistry projects will be discussed.
Successful shape-based virtual screening: The discovery of a potent inhibitor of the type I TGFb receptor kinase (TbRI)
Juswinder Singh, and Claudio Chuaqui, Structural Informatics, Biogen, 12 Cambridge St., Cambridge, MA 02142, Fax: 6176792616, Juswinder_Singh@Biogen.com
We describe the discovery, using shape-based virtual screening, of a potent, ATP site-directed inhibitor of the TbRI kinase, an important and novel drug target for fibrosis and cancer. The first detailed report of a TbRI kinase small molecule co-complex confirms the predicted binding interactions of our small molecule inhibitor, which stabilizes the inactive kinase conformation. Our results validate shape-based screening as a powerful tool to discover useful leads against a new drug target.
HypoRefine: Automated identification of exclusion volumes in pharmacophore models
Allister J. Maynard, Marvin Waldman, and Jon Sutter, Accelrys, 9685 Scranton Rd., San Diego, CA 92121, Fax: 858 799 5100
This presentation provides an overview of the HypoGen pharmacophore generation algorithm. HypoGen is a ligand-based QSAR tool using pharmacophoric overlap to predict activity.
A limitation of HypoGen is that activity prediction is based purely on the presence and arrangement of pharmacophoric features – steric effects are unaccounted for. A novel modification to HypoGen is described (HypoRefine). HypoRefine accounts for steric effects on activity, based on the targeted addition of excluded volume features to the pharmacophores. These excluded volumes attempt to penalize molecules occupying steric regions not occupied by active molecules.
Details of the steric detection and excluded volume addition algorithm are presented, along with some examples illustrating how excluded volumes improve the QSAR pharmacophore models.
Automatic generation of multiple pharmacophore hypotheses
Simon Cottrell1, Valerie J. Gillet1, and Robin Taylor2. (1) University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom, email@example.com, firstname.lastname@example.org, (2) Cambridge Crystallographic Data Centre
Pharmacophore methods provide a way of establishing a structure-activity relationship for a series of known active ligands. Often, there are several plausible hypotheses that could explain the same set of ligands and in such cases, it is important that the chemist is presented with alternatives that can be tested with different synthetic compounds. Existing pharmacophore methods involve either generating an ensemble of conformers and considering each conformer of each ligand in turn or exploring conformational space on-the-fly. The ensemble methods tend to produce a large number of hypotheses and require considerable effort to analyse the results, whereas methods that vary conformation on-the-fly typically generate a single solution that represents one possible hypothesis even though several might exist. We will describe a new method for generating multiple pharmacophore hypotheses with full conformational flexibility being explored on-the-fly. The method is based on multiobjective evolutionary algorithm techniques and generates a manageable number of different yet plausible hypotheses.
PepT1 substrate transport pharmacophore determinants: Refinement with data from a single consistent functional assay
Terry R Stouch1, Teresa Faria2, and Julita Timoszyk2. (1) Computer-Assisted Drug Design, Bristol-Myers Squibb Pharmaceutical Research Institute, MS H23-07, PO Box 4000, Princeton, NJ 08543-4000, Fax: 609-252-6030, email@example.com, (2) Exploratory Biopharmaceutics and Stability, Bristol-Myers Squibb, Pharmaceutical Research Institute
PepT1 is a primary intestinal transporter of di and tripeptides. It also transports large quantities of important pharmaceuticals, such as beta-lactams and ACE inhibitors. The ability to function as a substrate for this channel can appreciably increase the absorption of drugs whose passive permeation rates might be low or nill. Data was collected on a series of ligands using recently developed single fluorescent function assay. The ligands were specifically chosen to elucidate the important determinants of transport. A wide range of different rates of transport was evidenced, even for dipeptides. Coupled with conformational analysis and molecular overlays, a fairly simple pharmacophore of five elements was developed that can be used to retrieve known substrates.
Structure and information theory derived pharmacophores as pre- and post-filters for docking
Kenneth E. Lind, Erik Evensen, Hans Purkey, Robert McDowell, and Erin K. Bradley, Computational Sciences, Sunesis Pharmaceuticals Inc, 341 Oyster Point Blvd., South San Francisco, CA 94080, firstname.lastname@example.org
Screening virtual compound collections has been a valuable method for finding starting points in the drug discovery process. This is often done through structure-based docking or ligand-based pharmacophore searching. These methods are more effective than random searching, but both have inherent limitations. It would be useful to have methods that make optimal use of both techniques to improve the selection of active molecules. In this study we compare standard docking and pharmacophore search techniques to methods that use different permutations to combine both methods, such as docking as a pre-filter for a pharacophore search, or vice versa. The methods are evaluated against CDK-2 for their ability to select known inhibitors and their overall enrichment rates.
A new method for pharmacophore identification
S. Stanley Young, Jun Feng, and Ashish Sanil, National Institute of Statistical Sciences, 19 T.W. Alexander Dr, Research Triangle Park, NC 27709, email@example.com, firstname.lastname@example.org
The binding of a small molecule to a protein is inherently a 3D matching problem. As crystal structures are not available for most drug targets, there is a need to be able to infer key binding features and their disposition in space, the pharmacophore, from bioassay data. We use fingerprints of 3D features and a new approach to uncover the common pharmacophore for a set of compounds. We describe the algorithm and basic benchmarking. Knowing the 3D pharmacophore for a target should allow better data base searching and more efficient compound design.
A 3DPL case study: Finding new active molecules for the inhibition of calcineurin
Tad Hurst, Scientific Software, ChemNavigator, 6126 Nancy Ridge Drive, Suite 117, San Diego, CA 92121, Fax: 858-625-2377, email@example.com
The 3DPL Database Docking system has been demonstrated to be effective at extracting known active molecules from sets of inactive compounds in many test cases. The 3DPL technology can dock structures into a receptor structure at rate of up to 30/second, thus allowing in silico investigation of millions of database structures. In this paper, we detail the application of 3DPL to select from over 11 million chemical structures in the ChemNavigator iResearch Library to find 25 screening candidates. Samples of these 25 compounds were acquired and tested for calcineurin inhibition. Four of the compounds were found to be micro-molar inhibitors. Three of these compounds share a common core structure, and represent a new area for possible lead development.
Facilitating virtual screening workflows: The PyFlexX/E/S/-Pharm and PyFTrees modules
Sally Ann Hindle1, Frank Sonnenburg1, Marcus Gastreich2, and Christian Lemmen1. (1) Chemoinformatics, BioSolveIT GmbH, An der Ziegelei 75, 53757 St. Augustin, Germany, Sally.Hindle@biosolveit.de, (2) BioSolveIt GmbH
Virtual screening usually requires several programs. This entails file format conversions, conceptually superfluous I/O, manual selection of data, consideration of interims-results and so on.
Python - a wide-spread, cross-platform, open-source and easy-to-read scripting language - allows for a wrapping of native C-applications in a Python layer, thus generating a modular world of applications which may easily be "plugged" together within a single Python script.
We have recently taken this step with our cheminformatics tools: FlexX/-E/C/-Pharm (docking), FlexS (small molecule alignment), and Feature Trees (similarity comparisons) may now be used within this scripting environment, sharing information instead of transferring it. An instant benefit is the availability of open-source Python packages for analysis and visualisation.
This concept drastically facilitates virtual screening experiments; moreover it allows for rapid prototyping of virtual screening protocols and parameter studies which shall be demonstrated in an application example
Fast Lead Identification Protocol (FLIP) for structure based data mining using 3D fingerprints
Amit, S Kulkarni, Scientific Services, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121
Structure based drug design is the method used to identify and optimize pharmaceutical leads when the crystal, NMR structure or homology model of a specific target protein is known. Virtual screening of corporate libraries, external compound collections and virtual compounds using various docking methods is routine in the drug discovery process. We are proposing a new virtual high throughput screening approach that we term “FLIP” (Fast Lead Identification Protocol) that uses the potential protein-ligand interaction sites in the active site of the target protein to data-mine compound collections. This proposed approach has the advantage of being extremely fast and can potentially be used for any target protein structure
Conformation mining: Shrinking chemical space to find biologically-active molecules
Santosh Putta, Gregory A. Landrum, and Julie E. Penzotti, Rational Discovery LLC, 555 Bryant St. #467, Palo Alto, CA 94301, firstname.lastname@example.org
Discovering the essential three-dimensional steric and chemical features shared by active compounds is an important step in designing drug candidates. However, the flexibility of actives often allows them to adopt several low-energy conformations, some of which are not important for biological activity. Conformational flexibility complicates the task of finding important features by forcing a search through a conformational space with dimensions that increase exponentially with the number of actives. Model building approaches typically address this problem either by using a small subset of conformations (e.g. most extended or lowest energy) or by encoding all of a compound’s conformations in a single fingerprint. The first approach may miss biologically-important conformations while the second risks masking critical information available only from individual conformations.
Here we explore techniques for efficiently mining the conformational space of multiple compounds. Our goal is to find a subset of biologically-important conformations and understand and exploit their commonalities.
Hit-directed nearest neighbor searching
Veerabahu Shanmugasundaram, Computer-Assisted Drug Discovery, Pfizer Global Research & Development, 2800 Plymouth Road, Ann Arbor, MI 48105, Fax: 734-622-2782, Veerabahu.Shanmugasundaram@pfizer.com, and Gerald M Maggiora, Department of Pharmacology and Toxicology, University of Arizona
Follow-up of initial hits resulting from HTS is crucial if the hits are ultimately to give rise to useful lead compounds. Several approaches may be employed to select compounds from the Research Compound Collection or from commercially available collections for follow-up screening. Similarity searching based upon the similarity of the molecular fragments possessed by the molecules, yields compounds that are similar in structure to the hits. Nearest-neighbor searching of BCUT Chemistry Space identifies compounds that have similar BCUT values and hence similar electrostatic, hydrophobic and hydrogen bonding properties. In contrast to molecular fingerprint based similarity searching that looks for similar scaffolds in molecules, nearest neighbor searching identifies isobiological molecular structures with significantly different molecular scaffolds. Several examples illustrating the application and the success of this methodology will be presented.
AGENT: A program generating tautomers for computer-aided drug design
Patrick Ballmer, Pavel Pospisil, Gerd Folkers, and Leonardo Scapozza, Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Winterthurerstr. 190, 8057 Zurich, Switzerland, Fax: 01141-1-6356884, email@example.com
Several cases documenting the impact of ligand tautomerism on protein-ligand binding are described in the literature. AGENT has been developed to provide a tool to study this phenomenon. AGENT can be used to create chemically (energetically) reasonable tautomers of molecules stored in a 3D-input file. The created tautomeric forms can be directly used for molecular docking studies. The purpose of AGENT is thus to enrich a given small molecule-database with tautomeric forms, which are not unlikely to be able to exist in a protein active site. The number of tautomers created by AGENT is restricted either by chemical rules or by a user-defined energy threshold limiting the tolerated, semiempirically calculated Gibbs free energy of tautomer formation.
Accord enabling technologies for developing data mining software applications
Shikha Varma, and Tim Aitken, Accelrys, Inc, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, firstname.lastname@example.org
We present here examples to illustrate how a chemistry aware spreadsheet with several developers toolkit components can be used to create customized chemistry solutions for industrial and academic R&D. These chemistry enabled components are very efficient in facilitating chemistry workflow in a research environment. Accord chemistry components can be used to build applications that include data generation, chemical calculations and manipulations as well as data management according to given set of rules.
Eli Lilly's chemistry-focused approach to an ELN: A tale of two pilot projects
Mike Kopach1, Daniel Koch1, Keith DeVries2, Jeffrey Christoffersen2, Will Prowse2, and Chavali Balagopalakrishna2. (1) Chemical Process Research & Development, Eli Lilly and Company, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN 46285, Fax: 317-276-4507, Kopach_Michael@lilly.com, email@example.com, (2) Eli Lilly & Company
Lilly's rationale for pursuing an electronic laboratory notebook (ELN) system differs little from that of other Pharma companies – namely improving records quality, collaboration, productivity, and knowledge management. However, to manage the project scope, Lilly's effort focused on areas having the highest probable return on investment - areas where experimental protocols, results and/or data were not currently captured electronically. Based in part on this criterion, the chemistry functions within the Discovery and Development organizations began exploring ELN solutions. Two distinct approaches were employed in evaluating and selecting potential vendors/tools in an attempt to address the differing tasks and workflow processes of these organizations. An overview of Lilly's strategy and, more specifically, the Discovery and Development efforts, will be further elaborated.
ELN development and global deployment for Schering AG
Charles S Sodano, Information Services, Berlex Laboratories, 2600 Hilltop Drive, Richmond, CA 94804-0099, firstname.lastname@example.org
Berlex Laboratories, a subsidiary of Schering AG launched an Electronic Lab Notebook (ELN) system in 1998 that was adapted by all 250 Berlex discovery researchers in 2001. In July of 2003, global implementation was completed for 900 users(US, Europe and Japan). This hybrid system utilizes Microsoft Word and Excel as authoring tools that communicate with Documentum software(document management) via visual basic add-ins. The legal, archived version of completed experiments is printed to paper where it is signed and witnessed. There is not as yet significant case history in the US to support e-records used in patent ligitation. The system has more than 40,000 completed experiments and is growing at a rate 2,000 a month.
ELN perspectives: from the multinational to the startup
Simon Coles, President & COO, Amphora Research Systems, PO Box 3940, Bracknell, Berkshire RG42 2XN, United Kingdom, email@example.com
The ELN industry is developing quickly, with departmental and even enterprise-scale deployments, bringing successes and some inevitable failures. Recently, ELN technology has evolved to a point where ELNs are practical for even small companies. In the process valuable new lessons have been learned that apply to any ELN deployments, delivering significant benefits to users and faster, more dependable increased ROI for the organisation. Critical success factors include the size and scope of any initial ELN deployment, the overall system architecture, and the early and continued involvement of the user community.
Amphora principals have been working with ELNs since 1996, starting as the architect and project manager of a leading-edge, enterprise-wide ELN for Kodak. This work grew into Amphora Research Systems, which worked with several other multinationals before merging with PatentPad in 2003. In addition to enterprise-scale ELN solutions to large companies, Amphora delivers practical, affordable ELNs to smaller companies (e.g., Biotechs) and small R&D departments of larger companies.
These experiences delivering ELNs provide a unique overview into the issues involved and approaches required to deliver good value in ELN implementations. Drawing on experience of successes in both small & large organisations, and lessons from ELN projects that are struggling to meet their potential, the paper will examine the critical success factors for any ELN project and the best practices which can assure a thriving ELN deployment.
GenSys' electronic lab notebook and Collaborative R&D platform: Current status and progress
Prem Mohan, Product Development, Gensys Software, 2434 Main Street, Santa Monica, CA 90405, Fax: 310-309-6715, firstname.lastname@example.org
The GenSys electronic lab notebook (GenSys/ELN(TM)) product is the culmination of 4 years of design and development. The GenSys/ELN was designed to be intuitive, easy to use, have functionality for multiple laboratory workflows, be a scalable Collaborative R&D platform for enterprise-wide, secure IP protection and integration with in house tools and systems. Leading R&D, manufacturing, and testing organizations recognize the requirements for Collaborative R&D platforms and ELNs and are starting to use them in place of paper. GenSys has responded to this demand by competing in numerous industrial and government-initiated RFPs and has been selected as the ELN vendor of choice for a variety of organizations. This presentation will discuss the carefully thought out elements that comprise the latest version of the GenSys/ELN software and its success as the solution of choice for large and small end user companies in the market place. A progress update on current customer needs, deployment projects, and future needs determined through deployment experience will also be discussed.
Integrated data and knowledge management systems supporting real drug discovery processes: Strategy, implementation, and examples
Trevor Heritage, Discover Informatics, Product Development, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314 647 9241, email@example.com
The Pharmaceutical Industry is faced with an ever increasing number of biological targets and must translate these targets into effective and sustainable lead candidates. This requires that leads can be optimized to incorporate the relevant drug-like characteristics so as to address industry-wide high attrition rates. The success of this process depends on using much more efficient lead generation and medicinal chemistry-based optimization processes than are currently employed.
In order to meet these challenges and to increase inherent efficiency and so decrease attrition, an informatics driven discovery process has been implemented at Tripos Receptor Research Ltd., Tripos’ own chemistry research operation in the UK. There, the Tripos Electronic NotebookTM is being integrated with corporate information systems, including a proprietary LIMS and Tripos’ own reagent registration, inventory, and ordering system, ChemCoreRIOTM.
This informatics driven approach to chemistry concentrates on two key points, do-ability and desirability. Through innovative materials availability assessments and materials management, do-ability is assessed at the earliest stage of the process. Unique design technology is used to ensure the desirability and applicability of potential lead molecules. Applied at each stage of lead finding and lead optimization, the process enhances all aspects of decision-making and facilitates “learning” across different discovery projects.
This presentation will give the underlying concepts and strategies of implementing novel informatics solutions in drug discovery environments. Issues that arise as a consequence of this integrated system approach will be discussed along with lessons learned. This approach to informatics driven chemistry will be illustrated with other real-life examples.
Electronic laboratory notebooks: Lessons learned in three generations of development
Jorge Manrique, and Chris J. Ruggles, Professional Services Dept, CambridgeSoft Corp, 100 CambridgePark Drive, Cambridge, MA 02140, Fax: 650-286-9931, firstname.lastname@example.org
While the principal role of a classic lab notebook is to be the official record of a researcher’s technical work, this is not the main reason why scientists would consider electronic notebooks. The principal objectives of an electronic lab notebook are capturing the findings of individuals, and making those results readily available to the team and the organization. Essentially, investigators want to remedy what they’ve perceived as a fatal flaw of the classic lab notebook system: to avoid reinventing something already accomplished, and to further leverage work already performed. One direct way to succeed in this goal is to pool the information developed, and devise a mechanism to share findings with those that need to access them. This presentation will discuss legal, technical and regulatory issues driving the successful implementation of an electronic notebook system, the factors that impede and facilitate acceptance, and lessons learned in three generations of development.
Making tea: A human-centered approach to designing a pervasive lab book
Jeremy G. Frey1, Gareth V. Hughes2, Hugo R. Mills2, Terry R. Payne2, Monica m.c. Schraefel2, and Graham M. Smith2. (1) Chemistry, University of Southampton, Department of Chemistry, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 23 8059 3781, email@example.com, (2) Electronics and Computer Science, University of Southampton
The eScience community in the UK seeks to provide access to experimental data, annotations and provenance information, as this information is captured. Currently, most of that data is recorded manually into a paper-based lab book. Previous efforts, both commercial and research based to translate the lab book into digital form have struggled for widespread acceptance. We present both an overview of the elicitation/design process and the resulting model and services we developed to create a useable lab book replacement for a pervasive chemistry lab. We present the approach we developed, the prototype we designed based on our technique, and we present the results of a formative study of the artefact in real use. We show that our design elicitation method strongly contributed to the success of our prototype’s take up. The positive results take us one step closer to the eScience goal of "Publish at Source".
Moving to next-generation informatics for collaborative eR&D
Rich Lysakowski Jr., Executive Director / Chief Science and Technology Officer, Collaborative Electronic Notebook Systems Association, 800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113, firstname.lastname@example.org
Next-generation informatics for Collaborative eR&D and advanced R&D knowledge management (KM) will take advantage of most, if not all, tools in the arsenals for R&D information capture, management, processing, reporting, sharing, and reporting. By definition the next-generation has not yet arrived. To discuss moving to future software systems requires projecting future needs onto the generation of products being developed by suppliers and end users now. Collaborative eR&D and advanced R&D KM are maturing from conceptual processes into repeatable, automatable business process knowledge. However, most R&D businesses design and operate Collaborative eR&D and advanced R&D KM processes differently from one another. They use different arsenals of tools, different infrastructures, and even different corporate standards; how these processes and technologies differ is what provides R&D companies with significant competitive advantages in the marketplace. So questions always arise. If these processes, infrastructures, and tools are all different, to what extent can software suppliers help end users reach their process automation and informatics goals? What new classes of informatics and automation tools are needed? What modifications to existing tools are needed? What new standards (formal and defacto) are needed to build or apply these tools? What new infrastructure components are process enablers for Collaborative eR&D and advanced R&D KM? This paper will discuss and provide answers to these and related questions.
Electronic Notebooks: An interface component for semantic records systems
James D. Myers1, Michael Peterson1, K Prasad Saripalli1, and Tara Talbott2. (1) Mathematics and Computational Science, Battelle / Pacific Northwest National Laboratory, Battelle Blvd. MS K1-87, Richland, WA 99352, Fax: 509-375-6631, email@example.com, (2) Mathematics and Computational Science, Battelle / Pacific Northwest National Laboratory, George Fox University
Stand-alone electronic notebooks are limited in their ability to interact with other producers, curators, and consumers of annotations such as workflow and data provenance mechanisms, digital libraries, and autonomous feature-detection agents. The Scientific Annotation Middleware (SAM) project is developing a new generation of semantic middleware providing capabilities to view, query, translate, and extend the corpus of metadata generated by multiple applications, environments, and agents to provide integrated data discovery, annotation, provenance tracking, and records capabilities. A notebook services layer being developed within the project allows this information to be viewed and manipulated from within an electronic notebook interface. An initial integration of the SAM software and open source Electronic Laboratory Notebook (ELN) highlights the potential of this approach and demonstrates how e-notebooks will be able to evolve to support a richer composite research record while scaling to support increasing experiment complexity.
Capturing chemistry in XML
Joe A Townsend, Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom, firstname.lastname@example.org, and Peter Murray-Rust, Unilever Centre for Molecular Informatics, Cambridge University, UK
Chemical Markup Language (CML) is an XML-conformant Schema that describes molecules, spectra, reactions, and computational chemistry. It is capable of capturing the chemistry in a variety of current publications and is becoming adopted by many organizations.
We have developed tools for batch conversion of current chemical documents such as primary journal publications and theses into conformant CML. The parser reads many text and molecular formats and extracts chemical concepts into CML that are combined to give a single XML file.
The process works well for methodology and analytical data in organic synthesis. The results are stored in an XML database where they can be queried on molecular identity and numeric quantities.
Parsers can also capture the output of computational chemistry to extract essentially all of the information in the logfile. XML stylesheets can then be used to filter and display the results in an interactive manner
OpenScienceAlliance and expML - opening the road to data interchange and archiving for Collaborative eR&D for science and technology
Rich Lysakowski Jr., Executive Director / Chief Science and Technology Officer, Collaborative Electronic Notebook Systems Association, 800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113, email@example.com
In late 2003, CENSA initiated a new XML standards program called "The Open Science Alliance(TM)" with a purpose to create defacto standards for lab informatics, a new discipline we call "datakeeping" (i.e., applying archival science principles and practices to data). Many companies decided as a group that traditional voluntary consensus standards processes for lab informatics systems are badly broken and that they need this fast track process. Our approach creates defacto standards similar to what the Portable Document Format (PDF) defacto standard has done advance high-quality publishing and the Internet. However, our targets are scientific, engineering, technical, medical, and other technical data. We know now how to circumvent vendor in-fighting, end users not wanting or knowing how to standardize, plus the lack of a good rapid consensus process that works. We are creating new technologies and tools, education for end users and suppliers, and innovative policies for buyers to follow. The first standard is called expML(TM) for enabling any software that produces experiment data and records with the "Scientific Method" to interchange data and interoperate with other vendors' electronic notebooks, LIMS, CDMS, SDMS, instruments, data processing tools, etc. Our design includes integrating extremely diverse and rapidly changing XML standards from the many concerned communities in science, engineering, product development, and elsewhere. The design for research includes rapid evolution of the standards per se so R&D and innovation are accelerated by standardizing. This talk will explain the OpenScienceAlliance(TM), the expML(TM), the progress of the OpenScienceAlliance(TM), and how people can join to help accelerate its progress.
Building spreadsheet or database software for GLP/GMP applications? Why you care about FDA's new guidance
Jay S. Kunin, Symbion Research International, Inc. & U.C. San Diego, San Diego, CA 92130
Chemical scientists engaged in laboratory or manufacturing activities as part of drug development remain subject to 21 CFR Part 11, which defines the FDA’s criteria for accepting electronic records and signatures. While the September 2003 “Scope and Application” guidance reduces the number of systems that must meet the requirements, and promises “enforcement discretion” for some sections of the regulation, it continues to require substantive compliance. Most computer-based systems developed, maintained, or managed for GLP(& GMP) application must be compliant, especially in the areas of security, training, SOPs, validation, documentation, and electronic signatures. These requirements can be particularly troublesome in relation to spreadsheet and other locally-developed software, which should be built and deployed following a rigorous methodology and operated according to SOPs. In all cases, a documented risk assessment should now be part of any procedures to design, use, change or validate software for these applications.
How the software vendor can assist with compliant-ready products or "Why should you care about structural validation?"
Virginia L. Corbin, Business Development, Waters Corporation, 34 Maple Street, Milford, MA 01757, Fax: 508-482-2773, ginni_L_corbin@waters.com
Assuring government agencies that data supporting the production of regulated products meet quality and consistency requirements is a time consuming and expensive challenge. How are you positioned to meet these challenges with FDA’s new “Risk Based” approach to CGMPs?
Compliance is something that cannot be bought. It can be achieved however, through the use of compliant-ready solutions together with owner-managed Standard Operating Procedures (SOP’s).
Compliant-ready solutions are systems and software that have a documented system development life cycle (SDLC) and all additional tools and services needed to assist you in meeting your compliance requirements. Ensuring that these solutions are developed in this manner is also part of compliance. We will discuss what your vendors can do to assist you in meeting the newest challenges for compliance from the FDA.
Strategies for long term archiving of electronic records
Charles S Sodano, Information Services, Berlex Laboratories, 2600 Hilltop Drive, Richmond, CA 94804-0099, firstname.lastname@example.org
Almost all research and development data, documents and records today are being authored or generated electronically. However, the archive media of choice is still paper with a disaster recovery copy as microfilm for most operations. Laboratory and research records need to be retained somewhere between a few years to greater than 40 years depending on the importance of the information to a company’s business. As we move more into electronic drug applications, patent submissions and e-business transactions there will be an increased emphasis on long-term storage of electronic records. Possible strategies will be described that organizations can adapt right now to assure a smooth transition into electronic record archiving.
Information sources for chemical engineering students
Ann D. Bolek, Science-Technology Library, The University of Akron, Akron, OH 44325-3907, Fax: 330-972-7033, email@example.com
Chemistry and chemical engineering students need to find preparations, properties, reactions, spectra, and safety information for their compounds. Chemistry students usually need this information for the laboratory, whereas chemical engineering students usually need this information for pilot plant and industrial size applications. Chemical engineering students also need to find additional information, too, such as vapor-liquid equilibria and other thermodynamic data, process flow diagrams, bulk chemical prices, market share and other types of business, economic, and marketing information. This paper will give examples of where chemical engineering students can find the information suitable for their particular needs, such as in databases, encyclopedias, handbooks, periodicals, and patents.
Using chemical reaction, supplier, and literature information to meet your process and engineering chemistry needs
Eva M. Hedrick, Robert C. Dana, and Linda S. Toler, Synthetic and Polymer Chemistry, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: 614-447-5471, firstname.lastname@example.org
With access to more than seven million single and multi-step reactions, coupled with the more than six million commercially available chemicals, and millions of journal and patent references in the area of chemical engineering and industrial chemistry, SciFinder provides a unique resource for process chemists. In just a few short steps process chemists can locate reaction improvements covered in recent patents, suppliers for key starting materials, or the latest chemical engineering research.
Impact of quantity and quality of critical property data on model reliability: Essential information for process simulation applications
Xinjian Yan, Qian Dong, and Michael Frenkel, Thermodynamics Research Center, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305, email@example.com
The modeling of physicochemical property data plays a central role in chemical process simulation; as a result, the capability of process simulators depends heavily on the reliability and applicability of predictive models. A recent call from industry for more accurate physical property data and robust predictive models demonstrated an urgent need for reliable data and models that impact many industrial applications spanning from process design to product design in a global competition environment. Nonetheless, there are two major problems in developing reliable thermodynamic models: (1) the quantity of, and (2) the quality of the data set used for fitting model parameters, which, consequently, have a direct impact on the reliability of models. The deficiency of the model is demonstrated when it is used to predict properties of compounds that are not involved in developing the model. Limitations in the quantity and quality of the data set may not be easily resolved by model developers. Nevertheless, the analysis, report and understanding of these effects of the problems on the model applicability are crucial issues in guiding the different data applications for industrial and chemical engineering.
This presentation focuses on a systematic analysis of the applicability and reliability of models for the critical volume (Vc), the MP (Marrero/Pardillo) model in particular, as well as the evaluation of the quantity and quality of the Data Bank supplied in the "The Properties of Gases & Liquids", Fourth Edition, 1987, on which MP was developed. The critical temperature (Tc) and critical pressure (Pc) are also briefly discussed. The reference data sources of experimental critical properties selected for the investigation are the IUPAC Project on Critical Compilation of Vapor Liquid Critical Properties and the NIST/TRC SOURCE Data System. This study reveals a fundamental problem in developing reliable models, and the results may serve as guidance for model developers and industrial users in their selection and application of the predictive models for critical properties
Chemical and industrial engineering information in uniquely focused petrochemical databases
John Hack, Research Support Services / Information Research & Analysis, ExxonMobil Research & Engineering Company, 1545 Route 22 East, Annandale, NJ 08801, Fax: 908-730-3230, firstname.lastname@example.org
While both basic and applied research are exhaustively covered in sources like Chemical Abstracts and Beilstein, chemical engineering and industrial information is often difficult to extract. Simple categorization like "Unit Operations" is insufficient to meet the needs of chemical engineers who typically require detailed studies from a wide array of technologies like materials, analytics, environment, and kinetics. To fill this information gap, the Ei EncompassLit and Ei EncompassPat databases cut neatly across the chemical literature to focus on chemical engineering and related disciplines. Since they are produced by Elsevier Engineering Information for the petrochemical industries, these databases derive their value in providing information that has pre-determined industrial relevance.
This talk will focus on the types of information covered in these databases and sophisticated hierarchical indexing that is designed to identify engineering and chemical substance information on very broad or precise levels.
Discovering hidden value in physicochemical property databases
Qian Dong, Xinjian Yan, and Michael Frenkel, Thermodynamics Research Center, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328, email@example.com
With the innovation and success of modern commercial simulators at the chemical engineer's desktop in recent decades, the further requirements that chemical engineers have placed on physicochemical data and data quality are becoming increasingly demanding. The industrial interest now is focused on requiring highly reliable physical property data and quality-related information, along with robust predictive models developed and validated upon such data and information. These changes in industrial needs have clearly set the directions for physicochemical property research and database development. Nevertheless, traditionally, the major role of physicochemical property databases was merely to provide easier and faster availability of needed data, the majority of which were supplied neither with uncertainty-related information nor with analytical information assisting evaluation and modeling of the physicochemical property data. Thus, it is time for some dramatic changes in developing physicochemical databases, evolving from the objectives, design, functionalities, and applications of databases to meet these ever-changing needs.
As a first step in this direction, critical properties are selected from more than 100 properties from the NIST/TRC Source experimental data system as a subject for analyzing and extracting information on data quality, quantity, and measurement technology, and industrial requirements as well, for their significant impact upon other properties of industrial interest. An attempt is made to discover hidden value in the experimental data of critical properties collected through the period 1822 to 2003. Such information includes distribution of compounds and compound classes experimentally studied, quality analysis on uncertainty and related information, status of duplicate measurements, availability of measurements over evolution of data uncertainty with time and advances in measurement technology, problems and progress in measurement technology, etc. The recommended values of critical properties with assigned uncertainty are a fundamental block that is used in data quality analysis. Combining such data and the discovered information, this new functionality of physicochemical databases shows great potential to provide guidance of experimental data analysis to industry, and to help scientists and chemical engineers focus on evaluating and developing the data and models of their interest, which, at the same time, will greatly enhance the capability of process simulators.
Thermochemical database for industrial high-temperature applications
Mark D. Allendorf1, Ida M. B. Nielsen1, Michelle L. Medlin1, Theodore M. Besmann2, and Carl F. Melius3. (1) Combustion Research Facility, Sandia National Laboratories, Mail Stop 9052, Livermore, CA 94551-0969, Fax: 925-294-2276, firstname.lastname@example.org, (2) Metals and Ceramics Division, Oak Ridge National Laboratory, (3) Lawrence Livermore National Laboratory
Thermochemical data (heats of formation, entropies, and heat capacities) are essential for modeling the high-temperature chemistry occurring in many industrial processes, including chemical vapor deposition, combustion, corrosion, and catalysis. Although accurate data are usually available for traditional combustion environments, they are often lacking for systems involving heavier (i.e., non-first-row main group) elements and noncrystalline condensed phases (e.g., glasses and solutions). Since experimental efforts to measure these data are rare today, often the only recourse is to obtain them by computational methods. In this presentation, we will describe a new on-line database in which thermochemical information for molecular and condensed-phase systems is made available in a practical format for modeling. Molecular thermochemistry is obtained from ab initio electronic structure calculations. New methods under development to predict heats of formation for compounds containing transition metals and fourth-row main group compounds will be discussed. The current database contains data for roughly 750 molecules involving the elements H, B, C, N, O, F, Al, Si, and Cl. Condensed-phase data for variable composition liquids and glasses are derived by application of the associate-species model, which accounts for the non-ideal behavior of these systems. Data applicable to oxide systems involving the elements Na, Al, Cr, Mn, Ni, B, and Si are currently available. All data are available* in the form of polynomial fits as a function of temperature that can be imported into standard equilibrium and reacting-flow codes. In addition, extensive information resulting from the electronic-structure calculations is provided.
*See web site at www.ca.sandia.gov/HiTempThermo/index.html
Infotherm: The new thermodynamics database on the web
Jost T. Bohlen, FIZ CHEMIE Berlin, Franklinstrasse 11, D-10587 Berlin, Germany, Fax: +49-30-39977135, email@example.com
The quality Infotherm database delivers data on pure compounds and on mixtures, such as PVT properties, phase equilibria, transport and surface properties, calorific properties, solid-liquid equilibria. The database currently contains property data for more than 6,300 pure compounds and more than 23,000 mixtures. Each piece of data can be accessed via the chemical name, the trivial name, a formula or via the CAS Registry Number in only a few seconds. The data originates from journal articles, handbooks and data collections which are evaluated by FIZ CHEMIE Berlin and which cover the time-period from 1985 until the present. The database is updated monthly.
Chemical patent information needs in industrial and engineering chemistry
Donald Walter, Customer Training, Thomson Derwent, 1725 Duke Street Suite 250, Alexandria, VA 22314, Don.Walter@DerwentUS.com
Patents are an incredibly rich source of chemical information, since so much chemical technology appears only in patents and nowhere else. The language of chemistry poses unique challenges for those who would articulate chemical information through patents. Furthermore, once the information is obtained, understanding the language of patents is another challenge still. We will review some of these issues, explore how they are resolved today, as well as in the past. We will also make special mention of the dialect of polymers. We will review some of these issues and explore how they are resolved today, as well as in the past, for the particular needs of the Industrial and Engineering chemist.
CAS environment for environmental information
Jan Williams, CAS, 2540 Olentangy River Road, Columbus, OH 43221, Fax: 614-447-5470, firstname.lastname@example.org
With its broad coverage of chemistry and the related sciences, CAS provides a unique foundation for environmental information retrieval across multiple databases. Substances, physical properties, patents, journal articles, regulatory data, purchasing details, business news, and more can be obtained from the CAS databases on STN and STN Easy. Although the importance of the published literature is well known, health and safety concerns call for maximizing its value by utilizing citations, thesauri, and other content and functionality capabilities to ensure that “no stone is left unturned.” In addition, older information is sometimes neglected despite its potential to provide key details unavailable elsewhere. This presentation will use polychlorinated biphenyls as a case study in preparing a strategy for obtaining a cross section of the published resources, analyzing the output, and reporting the results of the study.
Computational studies on the analysis of organic reactivity
Ingrid M Socorro1, Jonathan M Goodman1, and Keith T Taylor2. (1) Unilever Centre for Molecular Science Informatics, Cambridge University, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, email@example.com, (2) MDL Information Systems
Our work is focused on the development of computational tools for the study of organic reactivity with the purpose of predicting and analysing organic reactions. We are developing a reaction prediction program based at a first stage on general knowledge of organic chemistry. The program uses Java and also MDL Cheshire chemical scripting language. The system developed arrives at its conclusions by application of a series of rules designed to consider different features in molecules for the determination of reactivity. In this way, the program makes decisions on primary aspects when considering a reaction such as the determination of reaction sites or which bonds are to be broken or made. Therefore, new reactivity should be found and analysed when considering unprecedented reactions. It will also be possible to predict and to analyse the reactivity of unknown reactions. An example of what the program does is given in the figure below.
Computational studies on the analysis of organic reactivity
James A. Platts, Department of Chemistry, Cardiff University, P.O. Box 912, Cardiff, United Kingdom, Fax: 44-2920-874030, firstname.lastname@example.org, and Robert A. Saunders, Dept of Chemistry, Cardiff University
Modifications to the standard definition of polar surface area (PSA) are reported. It is shown that increasing the flexibility of PSA-based models can lead to some improvements in accuracy, but that these still fall well short of previously published methods. To compete with these, PSA-based descriptors are scaled according to the hydrogen bonding characteristics of common functional groups. Introducing this scaling markedly improves the accuracy in validation studies against octanol-water, chloroform-water, and cyclohexane-water partition coefficients. The methods so developed are then applied to a range of important industrial applications, including drug transport, properties of "green solvents", and solvation and partition of metal complexes.
How can generic reactions be specific? Virtual synthesis with "smart" reactions
Gyorgy Pirok, Nora Mate, Szilard Dorant, Miklos Vargyas, and Ferenc Csizmadia, ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary
Virtual reactions based on generic reaction equations usually produce many chemically non-feasible products. ChemAxon's solution for this problem consists of three major components:
We build a reusable organic reaction library by integrating generic reaction equations with reactivity and selectivity rules. The core reaction engine for the enumeration of "smart" reactions is Reactor. It generates reaction products considering the chemoselectivity, regioselectivity and stereoselectivity issues. Synthesizer evaluates an additional rule layer built into the synthesis definitions of combinatorial libraries to eliminate products outside of the interest areas.
The presentation gives an insight to the "smart" reaction technology and its effective use with some examples.
Recursive Partitioning, models and statistics: What can we extract from categorical data?
Sean E O'Brien, and Marcel J. de Groot, Molecular Informatics, Structure and Design, Pfizer Global Research and Development, Sandwich Laboratories, Ramsgate Road, Sandwich, United Kingdom, email@example.com
Recursive Partitioning (RP) is a multivariate data analysis technique gaining increasing usage in chemo-informatics. It is designed to cope with categorical data, compounds with multiple mechanism and many descriptor types. RP enables fast derivation of decision trees for the prediction of activities or properties and can provide readily interpretable results.
Here we review typical problems and scenarios one may encounter when utilising RP. We demonstrate how full analysis of a decision tree can extract useful information from what appears, at first, to be a poor model. This is illustrated with examples drawn from our practical experiences with standard RP and multiple-Y models (PUMP-RP). We show how RP is not only useful for modelling activities but also valuable as a tool to stimulate different ways of evaluating data.
Research patterns in the Earth System Science Department: An interdisciplinary geoscience program at the University of California, Irvine
April M. Love, Science Library Reference, University of California, P. O. Box 19557, Irvine, CA 92623-9557, Fax: 949-824-3114, firstname.lastname@example.org
My analysis of references cited in the publications (from 1999 through 2003) of the UCI Earth System Science faculty researchers will illustrate the interdisciplinary nature of the research of this new department. Additionally, this analysis will provide insights into the research habits of the Earth System Science (ESS) faculty, including itemizing source journals consulted. The results presented will demonstrate specialized collection development experiences in a university library setting as well as highlight current changes in information seeking habits and usage in the geosciences. These changes not only have an impact on library users, but also for those responsible for collection development in support of research. Founded in 1990, the UC Irvine ESS Department took on the "global change agenda" for both the research and teaching focus. Incoming faculty members were hired in the atmospheric sciences, geochemistry, terrestrial and aquatic ecology, oceanography, and hydrology.
The University of California, Irvine (UCI) was founded in 1965. In 1989-90, the School of Physical Sciences examined the possibility of establishing a geosciences program where, up until this time, there had been no geology program included in the UCI campus science curriculum. The Earth System Science Department has its roots in the atmospheric chemistry research of F. Sherwood Rowland's laboratory group in the Department of Chemistry. The focus of the proposed geosciences program was nontraditional and did not emphasize the usual "rock" geology. In 1990 Ralph Cicerone, a specialist in atmospheric chemistry and former director of the National Center for Atmospheric Research's Atmospheric Chemistry Division, joined the UCI faculty. With Dr. Cicerone came a change in the focus for the departmental curriculum; it took on the "global change agenda," and the founding faculty members were hired in the atmospheric sciences, geochemistry, terrestrial and aquatic ecology, oceanography and hydrology.
Pharmaceutical decision making using LeadDecisionTM
Barry J. Wythoff, Research and Development, Scientific Reasoning, 23B Congress St, Newburyport, MA 01950
The drug discovery process proceeds iteratively and discontinuously. At each iteration, a fateful decision must be made which might be phrased as ″Out of the molecules that we can ? which should we ?″, wherein the action denoted by the question mark might be ″order″, ″test″, ″synthesize″, ″carry forward″, etc. This is a question then, of selecting among alternatives. Increasingly, this selection must be made in the face of manifold dimensions that we wish to optimize. Lead Decision™ is designed to aid the scientist in rapidly accomplishing such selections using a combination of calculation, visualization and interaction. The calculation methods that will be described are adapted from economics, statistics, mathematics, artificial intelligence and operations research.
New ways to integrate data and information
Carl S. Ewig, Life Sciences Solutions, IBM, Suite 300, 4660 La Jolla Village Dr, San Diego, CA 92122, Fax: 858-587-4835
Productivity in research, especially in the life sciences, depends strongly on the efficient retrieval and integration of data and information from a variety of sources and in a number of different forms. The most powerful computational tools for performing the integration are federated data sources and database engines, which integrate multiple, heterogeneous data sources into a single virtual database. However recent developments have taken the federation process well beyond simple virtual federated queries of cheminformatics and bioinformatics data files, and now encompass several additional ways of accessing information, including algorithms such as HMMER, repositories such as accessed through Entrez, and data in an XML representation. Libraries of user-defined functions allow pre-processing to be performed, and ″extended search″ procedures allow obtaining data from unstructured sources such as web sites. This talk summarizes recent developments at IBM, and provides examples of their use with the DB2 Information Integrator in research applications.