#230 - Abstracts

ACS National Meeting
August 28-September 1, 2005
Washington, DC

Titles link to slides when available. Please note: Presentations given at CINF symposia have been posted to the CINF website with express permission granted by the authors who retain the original copyright. These presentations are for information purposes only and cannot be further disseminated without the author's prior written permission.

8:30 1 Fate of chemistry branch libraries: Onward toward 2015
Jeremy R Garritano, Mellon Library of Chemistry, Purdue University, 504 W. State St., West Lafayette, IN 47907, jgarrita@purdue.edu - SLIDES

The pressures of technology, multidisciplinary research, and shrinking budgets have caused many librarians to rethink the roles of chemistry branch libraries in recent decades. Some of these libraries have reinvented themselves, while others have been consolidated into general science and technology libraries. The author will report on the results of a 2005 survey of Association of Research Libraries (ARL) institutions and the status of their chemistry related library resources and facilities. The survey will look at the past, present and future of their chemical information resources, paying particular attention to those that have been or will be combined with other facilities. The reasons for consolidation will be discussed, as well as what other disciplines are included within the combined collection, and other issues regarding administration and outreach.

8:50 2 The Harvard chemistry library: Ghosts aboard the starship
Marcia L. Chapin, Chemistry & Chemical Biology, Harvard University, 12 Oxford St., Cambridge, MA 02138, Fax: 617-495-0788, chapin@chemistry.harvard.edu- SLIDES

The Harvard Chemistry Library has played a quiet but profound role in chemical education and research at Harvard. Since 1927, the Library, located in the heart of the Chemistry & Chemical Biology Department complex, has served as a focal point for chemical information resources, chemical contemplation, and a host of Harvard chemistry community gatherings. The spirit of many an illustrious faculty member is to be felt there. The reading room embodies what students have come to expect from Harvard, a sense of history and elegance. With the advent of digital access to chemical information, the space occupied by the Library is beginning to be scrutinized very closely. Is it reasonable to harvest the current Library space for laboratories and create a small “starship” information center, a new paradigm where most everything would be online? A perfect storm of university politics, space competition, and financial constraints has come to bear on these decisions. How important is the historic space to 21st century teaching and research?

9:10 3 Adaptation of a chemistry library: The University of Chicago experience
Andrea Twiss-Brooks, John Crerar Library, University of Chicago, 5730 S. Ellis Ave, Chicago, IL 60637-1403, atbrooks@uchicago.edu - SLIDES

"You cannot step twice into the same river." Heraclitus (c. 540 - c. 480 BC)

During its eight decades of existence, the Chemistry Library at the University of Chicago has undergone a lot of change. This change has been driven by many factors, including advances in library technology, construction of new library and research buildings, space planning challenges, migration to electronic information resources, and more. In summer 2005, the Chemistry Library was completely closed and the collections merged into the holdings of the main science library. This presentation will explore some of the driving forces behind the closure of the Chemistry Library, the changing role of the Chemistry Librarian, and chemical information reference and instructional services in the context of a centralized science library environment. Effects of the closure on staff, resource reallocation, the process of moving the collections, and service marketing in the new environment will also be addressed.

9:30 4 Metamorphosis of the chemistry library: What will emerge?
William W Armstrong, LSU Libraries, Chemistry Library, Louisiana State University, Baton Rouge, LA 70803, Fax: 225-578-2760, notwwa@lsu.edu - SLIDES

Forces ranging from institutional financial pressures and space constraints to rapid technological advances are acting on the chemistry library causing a metamorphosis. Technological advances have revolutionized the way scientists communicate with one another and the way this information is disseminated. Has the library's role in the flow of information changed in response to these new developments? Have the needs of patrons changed as a result? What shape or role will the library have in the future? What should its role be? The author will provide an overview of some of the changes occurring or likely to occur, while highlighting any positive or negative aspects these changes might entail. We must balance an ideal with a knowledge of the realities which act as constraints, or parameters, in which these changes will take place. Change will occur. Will we merely react, or will we direct this change?

10:00 5 Changing mission, strengthened focus: A new use for the Current Periodicals Room at the University of California, Santa Cruz
Catherine B. Soehner, Christy Hightower, and Wei Wei, Science & Engineering Library, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, Fax: 831-459-2797, soehner@ucsc.edu - SLIDES

The Science & Engineering Library at UC Santa Cruz was built in 1991 and included a beautiful room dedicated to a print collection of current periodicals. During the past two years we have systematically canceled all print journals for which there was an electronic counterpart, thus diminishing the number of journals in the Current Periodicals Room. During a strategic planning effort, the Library determined that it should be identified as the 'Information Center' of the campus and be the 'destination of choice' for students, faculty, staff, and members of our greater community even in this digital age. As a first step toward realizing this goal, the library staff began a lecture series entitled Synergy: Explorations in Science and Society, held in the Current Periodicals Room. This new lecture series highlights research, teaching and grants in science and engineering at UCSC and brings these efforts to the attention of the UCSC and greater Santa Cruz community. The response to this lecture series has been overwhelmingly positive with record attendance. This venture marks the beginning of a successful move toward integrating the library further into the mission of the University and further increases the library's connection with its faculty.

10:20 6 Planning a combined engineering, computer sciences, and physics library at Stanford University
Grace A. Baysinger, Swain Library of Chemistry & Chemical Engineering, Stanford University, Organic Chemistry Building, 364 Lomita Drive, Stanford, CA 94305-5080, Fax: 650-725-2274, graceb@stanford.edu - SLIDES

A new library for the Engineering, Computer Sciences, and Physics communities at Stanford University is slated to open in 2012. It will be a state-of-the-art facility that will be designed as “stackless” or without book stacks. Planning efforts include reviewing trends, assessing issues, and developing future visions for the facility, including its collections, services, and staffing. User needs are being assessed via surveys and interviews. Technical, financial, and legal opportunities and challenges are also being evaluated. This presentation will provide an overview of the vision and planning efforts going into this new library.

10:40 7 Knowledge management at Cytec Industries: Building the library of the future
David A. Breiner1, Joseph J. Kozakiewicz1, Jeanne L. Courter1, Leonard Davis1, Raymond S. Farinato1, Steven Greenhouse1, John H. Hillhouse2, Nimal Jayasuriya1, James A. Jubinsky1, Dana B. Moore1, J. Wilfredo Perez1, and Gary Walters1. (1) Cytec Industries, 1937 West Main Street, Stamford, CT 06904, david.breiner@cytec.com, (2) Phosphine Technical Center, Cytec Industries - SLIDES

Since 2003, the Cytec Technical Information Center (TIC) in Stamford, Connecticut, has undergone a radical transformation. From moving its physical location to hiring a new staff to launching a virtual library, the Cytec TIC has become a center of excellence for learning, idea exchange, and innovation. As its mission, the TIC partners with Cytec R&D to leverage appropriate technology in order to search, archive, and disseminate internal and external information in a cost-effective, user-friendly manner. To achieve its mission, the Cytec TIC has designed and implemented a simple web portal for instant “one-stop” global access to technical information. Primary resources for external information include ACS, MicroPatent, Knovel, Elsevier ScienceDirect, Teltech, and SRI Consulting, while a web-based document management system is utilized for retrieving important internal information. In addition, the Cytec TIC has become a hub for cross-functional R&D activity by hosting scientific discussion forums and weekly poster sessions. This presentation will highlight experiences encountered during a Knowledge Management initiative including identifying system requirements, process design, implementation issues, cultural challenges, and lessons learned.

11:00 8 Virtually virtual: The postmodern pharmaceutical library
Mary Laskow, Lou Ann Di Nallo, and Mary Talmadge-Grebenar, Information & Knowledge Integration, Bristol-Myers Squibb, Rt. 206 & Province Line Rd., PO Box 4000 J12-01, Princeton, NJ 08543, Fax: 609-252-6280, mary.laskow@bms.com

The Research Libraries at Bristol-Myers Squibb serve a wide and varied audience, with one of the main user groups comprised of chemists. Historically we have given them primary focus from a collection and service viewpoint. As early but sometimes reluctant adopters of ejournals and other electronic resources, BMS chemists have, over time, become comfortable in the virtual world. Increasing demands on our physical library spaces from other parts of the organization have fortunately led to the opportunity to rethink our use of space in a thoughtful fashion. Some of the areas we are addressing include: increasing opportunities for collaboration, aligning chemical information professionals with clients that they serve, and reducing our collection footprint. The physical library will remain for at least the near future until key chemistry reference resources become either available electronically, or until pricing models evolve to make them more affordable.

2:05 9 Copyright basics
Eric S. Slater, Publications Division, Copyright Office, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112, e_slater@acs.org - SLIDES

This session will feature a general discussion of basic United States Copyright Law, including, but not limited to, such topics as subject matter of copyright, exclusive rights of copyright, and duration of copyright. Additionally, there will be a detailed discussion of ACS Publications Division Copyright Policy and how United States Copyright Law ties in to ACS Policy. In this regard, the speaker will cover why ACS requires transfer of copyright from authors and discuss why this approach is beneficial to all parties involved. Other related topics will include a detailed explanation of the ACS Copyright Status Form, and specifically, to the rights that ACS grants back to authors/employers of authors. Finally, the session will conclude with a primer on the permissions process, and why it is important to be aware of copyright when using material posted on the Internet.

2:35 10 Teaching copyright to chemistry students
S. Scott Zimmerman, Department of Chemistry and Biochemistry, Brigham Young University, C205 BNSN, Provo, UT 84602-5700, Fax: 801-422-0153, scott.zimmerman@byu.edu - SLIDES As chemistry professors and students, we might ask the following questions about copyright: Who owns the copyright to students' research reports and laboratory notebooks? Can instructors make copies of a JACS paper and distribute them to the students in their classes? What published materials can instructors legally include in their course packets? Can graduate students publish papers in scientific journals and then publish the same papers in their theses or dissertations? In this presentation, I will try to answer these and other questions about copyright. I will also outline suggested topics and list online resources that instructors can use in teaching copyright to chemistry students.


3:05 11 Solution provider perspective: A brief case study in serving the customer and their end-users
Robert Weiner, Senior Vice President, Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, Fax: 978-750-0347, bweiner@copyright.com

The demand for digital content is greater than ever, forcing both information content users and rights holders to search for new ways to engender compliance with U.S. copyright law. Rights holders want to maintain control over how their intellectual property is used and at what cost, while information consumers want to reproduce and disseminate material without putting their institutions at risk of infringement litigation. Fortunately, there are solutions.

3:35 12 Intellectual property agreements
Gianna Arnold, Epstein Becker and Green, 1227 25th Street, NW, Suite 700, Washington, DC 20037-1175, Fax: 202-296-2882, garnold@ebglaw.com - SLIDES

Intellectual property assets are critical for technology companies and often account for a large percentage of such company's capital. Accordingly, appropriate protection and leveraging of such assets can greatly enhance value and can be crucial to success. Patents, trademarks, copyrights, trade secrets and contracts are used to protect and leverage intellectual property assets. This presentation will focus upon the use of contracts – both in-house agreements and strategic alliances. Whether such contracts are used to protect intellectual property rights, improve in-house capability or garner revenue, the goal is to enhance the strength and value of the corporate entity. Items discussed will include types of contracts, the licensing process, and drafting considerations.

4:05 13 Publish and your patent rights may perish
Alan M. Ehrlich, Weiss, Moy & Harris, P.C, 1101 Fourteenth St., N.W, Suite 500, Washington, DC 20005, Fax: 202-216-0083, aehrlich@weissmoyharris.com - SLIDES

Patents are awarded for inventions of articles, methods and compositions that are useful, novel, and not obvious to one ordinarily skilled in the art. A patent's value stems from the fact that a patent owner initially has the exclusive right to exclude others from making, using, selling or importing the invention, and the owner can sell that exclusive right in whole or in part. The novelty is lost if the invention has been published prior to filing of a patent application. Thus, there is a potential conflict between researchers' interests in publishing and their employers' desires to maintain that exclusivity. This paper will outline those disclosures that destroy patentability and ways to balance the interests of publication and commercialization.

4:35 14 Harvesting the scientific information in patent documents: What non-patent specialists should know
William M. Mercier and Jan Williams, Chemical Abstracts Service, Columbus, OH 43210, Fax: 703-435-0827, wmercier@cas.org - SLIDES

CAS databases offer millions of patent references from more than 50 active patent-issuing authorities around the world. These patents can be viewed not only as documents of legal significance, but also rich sources of scientific information; in fact, over 60 percent of the new small molecules CAS adds each year to the CAS REGISTRYSM are from patent documents rather than journal literature. The scientific information contained in these patent records makes a broader scope of data available for research and data analysis. Those patents records that qualify CAS selection criteria (those covering chemistry, biochemistry and chemical engineering), are analyzed and fully indexed by CAS scientists in less than 27 days from the date of issue. Complementary to patent information, CAS references a wealth of journal literature dating back to 1907. This information can assist in making business critical decisions, direct a research project, or assess prior art for patentability.

8:00 15 Text search anomalies and how to cope with the "tough" searches in Pubmed for your just-in-time knowledge needs
Soaring Bear, MeSH, NLM/NIH, 8600 Rockville Pike B2E17, Bethesda, MD 20894, Fax: 301-402-2002, soaringbear@nih.gov

As much as one fifth of Medline subject header (MeSH) indexing vocabulary (http://www.nlm.nih.gov/mesh/MBrowser.html) is modified each year to keep up with additions and changes in science. Recent changes in MeSH will be presented along with three easy steps you can follow to help you keep up with and use the changes for better and faster search results.

Changes in MeSH usually improves search results but can sometimes confuse searchers and automated informatics tools. For instance, why does a search on the word ‘sweetening' fail to deliver 100 thousand citations on ‘sweetening agents'? Why does a search on benzo[a]pyrene give a syntax error? Why does a search on ‘plants' fail to find 20 thousand citations about ‘plant extracts'. Why does a search on ‘anti-inflammatory' fail to get 60 thousand citations about ‘antiinflammatories'? MeSH is doing the best we can to help provide good search results, but the multiplicity of word meaning and the budget limits what any categorization scheme can do. You've got to do the rest. Here's how.

8:30 16 Text and data mining: Together at last!
Anthony J. Trippe, Science IP/Chemical Abstracts Service, 2540 Olentangy River Rd., Columbus, OH 43210, atrippe@cas.org

Many techniques and tools have long been available to information professionals for statistical analysis of fielded (structured) data. Lately, there has been an increased focus on the analysis of textual (unstructured) data. Traditionally, these forms of analysis have been conducted separately. In general, it was not possible for the value and strengths of these approaches to be combined. New software now allows the application of rigorous data mining tools, e.g., data grouping and clean-up, to the creation of bar charts and 2-D matrix charts from fielded data. It also allows the use of text mining elements, including data harmonization, for the creation of concept clusters and maps from unstructured data. Output from both is linked and dynamically interactive. A brief discussion of the software's capabilities will be followed by a case study on how the marriage of text and data mining supports strategic business research by providing rapid, insightful analyses.

9:00 17 Knowing when to say "When..."
Farhad Soltanshahi, Michael S. Brusati, and Robert D. Clark, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144

Sampling large data sets efficiently is a computational challenge but it can also be a philosophical one. Keeping structural diversity within the selected subset high is important, but so is maintaining representativeness of the data set as a whole. As the fraction of the data set selected increases, enhancing diversity becomes increasingly expensive in computational terms, but of progressively less value in practical terms. So when does it make sense to stop worrying about diversity and shift over to straight random sampling? Optimizable k-dissimilarity (OptiSim) is a stochastic selection method that is uniquely positioned for addressing this question, in part because it returns an ordered selection set in which the earlier selections being, on average, measurably more distinctive and more representative than are later ones.

9:30 18 Maximizing chemical knowledge: New approaches in spectral data mining and search via the successful consolidation of multi-technique spectral data
Gregory M. Banik1, Deborah Kernan2, Kevin Scully3, and Marie Scandone3. (1) Bio-Rad Laboratories, Informatics Division, 3316 Spring Garden Street, Philadelphia, PA 19104, gregory_banik@bio-rad.com, (2) Bio-Rad Laboratories, Informatics Division, (3) Informatics Division, Bio-Rad Laboratories, Inc

It has become standard practice in multiple applications, such as compound verification or unknown sample identification, for scientists to run a sample and, using spectral search software, compare it to commercial and/or proprietary reference databases of spectra. The software mines the reference data and calculates a score or hit quality index (HQI) to describe the correlation or “closeness” of the match between the spectrum being examined and the spectra of known compounds in reference databases.

This paper describes a new approach to spectral searching which gives scientists who analyze samples using multiple spectral techniques the ability to simultaneously combine all spectral information available to yield a single search result. In a series of case studies, we will demonstrate how this approach enables the optimization of chemical similarity and maximizes chemical knowledge in order to identify several unknown samples.

10:00 19 Hierarchical k-means clustering using principal components to solve the unsupervised multi-class classification problem
James F. Rathman1, Syed B. Mohiddin1, and Chihae Yang2. (1) Department of Chemical and Biomolecular Engineering, The Ohio State University, Koffolt Laboratories, 140 West 19th Avenue, Columbus, OH 43210-1110, Fax: 614-292-3769, rathman.1@osu.edu, (2) Leadscope, Inc

Current clustering techniques can be grouped as either supervised or unsupervised. In a supervised method, each observation in the training dataset is pre-assigned to a class based on prior knowledge, while an unsupervised method uses no prior knowledge of the class distinction. Numerous supervised techniques have been demonstrated to work well for binary classification and a few of these are reasonably good at making supervised multi-class predictions. However, techniques for unsupervised binary and multi-class predictions have not been fully developed. In this work, we present an analysis technique based on hierarchical K-means using differentially weighted principal component analysis to address unsupervised classification for both binary and multi-class problems. We demonstrate the methodology on both biological (NCI 60 cancer cell lines dataset and acute leukemia dataset) as well as chemical datasets with the objectives of predicting class membership and identifying non-redundant features most responsible for differentiating the observed classes.

10:30 20 Dynamic equation of state evaluation with ThermoData Engine
Chris D. Muzny1, Eric W. Lemmon1, Robert D. Chirico2, Vladimir V. Diky2, Qian Dong1, and Michael Frenkel2. (1) Physical and Chemical Properties Division, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328, Fax: 303-497-5044, chris.muzny@nist.gov, (2) Thermodynamics Research Center (TRC), National Institute of Standards and Technology (NIST)

ThermoData Engine (TDE) is a software tool recently released by the Thermodynamics Research Center at the National Institute of Standards and Technology that for the first time implements the concept of dynamic data evaluation for thermodynamic property data. In this talk we will present an extension of TDE that implements the dynamic data evaluation concept for pure fluid equations of state. We will detail the performance of TDE in comparison to established equations of state based on individual static data evaluations. The specific equations of state we compare against are those presented in NIST REFPROP, a software tool that delivers recent, state-of-the-art equations of state for over 80 fluids. Full implementation of the dynamic data evaluation concept requires continuous acquisition and storage of new data. Toward this end we will also present an extension of TDE that allows for on-demand TDE local database updates from a central server.

9:05 21 Leveraging open access chemical information with Text Influenced Molecular Index
Richard D. Hull, Axontologic, Inc, 12565 Research Parkway, Suite 300, Orlando, FL 32826

Research and development of new text mining algorithms for drug discovery have been hampered by the restricted availability of large, open access chemical databases. Recent efforts to make more chemical information available to researchers are opening promising new avenues of research. Text Influenced Molecular Indexing (TIMI) is a process that discovers correlations between structural components of chemical structures and the textual contexts that these structures are described within, namely, the scientific literature, internal research reports, and chemical patents. TIMI can identify recognized and novel latent relationships between compounds, proteins, genes, diseases and other domain concepts that are expressed across very large textual corpora. A linchpin of this technique is the ability to recognize chemical names within these texts and access their corresponding chemical structures. We describe our work with TIMI as an example of what can be done when large numbers of chemical structures are made available for text mining purposes.

9:35 22 PubChem
Stephen H. Bryant, Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bldg. 38A, Rm. 5S504, Bethesda, MD 20894, Fax: 301-480-9241, bryant@ncbi.nlm.nih.gov

PubChem is a new online information resource from NCBI. The system provides open access to information on the biological properties of chemical substances. Following the sequence-deposition model followed by GenBank, PubChem's content is derived from user depositions of chemical structure and bioassay data, including data from NIH's Molecular Libraries Roadmap initiative. The PubChem retrieval system supports searches based on chemical names and chemical structure, as well as searches based on bioassay descriptions and activity values. It furthermore provides links to depositor sites, for further information on each substance, as well as links to other NIH resources such as the PubMed biomedical literature database and Entrez's protein 3D structure database.

10:05 23 The ZINC database as a new research tool for ligand discovery
John Irwin and Brian Shoichet, Department of Pharmaceutical Chemistry, University of California, San Francisco, 1700 4th St, San Francisco, CA 94143, jji at cgl.ucsf.edu (email address altered at author's request)

ZINC is a free database of commercially available compounds for virtual screening, available on the web at http://zinc.docking.org. ZINC represents small molecules as biologically relevant models suitable for virtual screening and other related applications. To make the database useful we have focused on addressing commercial availability, "drug likeness", stereochemical and regiochemical ambiguity of many supplier catalogs, physical properties, protonation, charge and tautomeric equilibria. The database may be searched and subsets created using on-line tools. Parts of ZINC have been downloaded by thousands of institutions worldwide in academia, government, and industry. ZINC continues to evolve: a dozen new compound suppliers and millions of new compounds have been added over the past year via quarterly releases. Numerous errors have been corrected thanks to alert and helpful users. This presentation will discuss some applications of ZINC as well as some of the ways we are trying make ZINC better. ZINC relies extensively upon vendor catalogs, commercial software and GPLed software which are acknowledged on our website. The delicate balance of providing a freely available service based partly on commercial software will be discussed.

10:35 24 Paper Withdrawn - MOLTABLE: An open access intiative on molecular informatics M Karthikeyan, Information Division (Digital information Resource Centre), National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India, Fax: +91-20-5893973, karthi@ems.ncl.res.in, and S Krishnan, Information Division, National Chemical Laboratory

MolTable is an open access initiative[1] to collect, compute and distribute the data to academic and research community. Through this portal one can query large number of molecules for similarity, computed molecular properties, etc., and will be able to download the results in .csv format[2]. Since molecular descriptors are extensively used for QSAR, QSPR, QSTR studies it was proposed to compute descriptors such as topological, electronic, properties data for all the molecules[3-4]. These data in combination with activity, property or toxicity data can be used for building predictive models with the aid statistical tools (PLS, PCR, kNN, SVM, ANN etc.). Some of the molecules are linked with Dspace@NCL an open access initiative[5,6]. Molecular data can downloaded in standard SMILES format. The visualization of the molecules achieved with the help of ChemAxon's MarvinViewer. Details will be presented.

1. http://moltable.ncl.res.in/index.htm 2. http://moltable.ncl.res.in/nrm/sample.txt 3. http://moltable.ncl.res.in/nrm/moltable.jsp 4. http://moltable.ncl.res.in/nrm/molprop.jsp 5. http://dspace.ncl.res.in/ 6. http://moltable.ncl.res.in/public/thesis_1130.jsp

11:05 25 Open access chemical-information and computer-aided drug design res
Marc C Nicklaus, Laboratory of Medicinal Chemistry, CCR, NCI, NIH, Bidg.376 Boyles Street, Frederick, MD 21702, Fax: 301-846-6033, mn1@helix.nih.gov, Markus Sitzmann, Laboratory of Medicinal Chemistry, CCR, National Cancer Institute/Frederick, NIH, DHHS, Igor V. Filippov, Laboratory of Medicinal Chemistry, National Cancer Institute, and Wolf-Dietrich Ihlenfeldt, Xemistry GmbH

We present an update on the tools and resources used in the drug design and in silico screening work of the CADD Group at LMC, CCR, NCI. Many of these chemoinformatics resources are implemented in the form of web services, and open access is granted to the public for most of them at http://cactus.nci.nih.gov. Web-based search interfaces are presented for databases with millions of compounds using a search engine operating in distributed mode across a Linux cluster. Many of these databases are being made publicly available, including multi-million collections of commercial screening samples, as well as data sets from various U.S. Government agencies. Also presented are new automated tools for generating such web services, as well as tools and services utilizing new calculable CACTVS hash code-based identifiers useful for rapid compound identification and database overlap analyses.

11:35 26 Automatic aggregation of open chemical data
Nick E Day1, Peter Murray-Rust2, Henry S. Rzepa3, Simon M. Tyrrell4, and Yong Zhang4. (1) Department of Chemistry, Unilever Centre for Molecular Sciences Informatics, Lensfield Road, CB2 1EW Cambridge, United Kingdom, Fax: +44-1223-763076, ned24@cam.ac.uk, (2) Unilever Centre for Molecular Informatics, University of Cambridge, (3) Department of Chemistry, Imperial College of Science, Technology and Medicine, (4) Unilever Centre for Molecular Science Informatics, University of Cambridge

Most experimental chemical data (e.g. crystal structures (80%), spectra (99%), comp chem (>99%)) is never published in machine-understandable form and is effectively lost. However where authors deposit it alongside publication, either in repositories or as supplemental data to journal articles or theses, we show that it can be extracted and preserved.

The components of our process have been automated and are:

  • a workflow to manage the process
  • conversion of legacy structural formula (MOL, ChemDraw, SMI, etc.) to InChI (the IUPAC chemical identifier)
  • conversion of crystallography (CIF), spectra (JCAMP) and computational chemistry (MOPAC, GAMESS, etc.) to CML
  • archival in an Open XML-aware repository
  • publication of metadata through the Open academic repository system (e.g. DSpace, eprints), disseminated using RSS and RDF.

    The primary data object is the chemical compound, indexed by InChI and its properties (with standard CML/RDF metadata). Robots can search collections for compounds and properties and compile indexes of different degrees of comprehensiveness or specialisation. We have shown that these are well indexed by conventional search engines (Google(TM), MSN(TM)) thus removing the need for specialised chemical software on the Chemical Semantic Web. The search results are highly customisable and as they are Open can be used directly for further scientific research or re-dissemination

    All software in this system ("WorldWideMolecularMatrix", WWMM) is available as Open Source.

1:30 27 Predictive models for genotoxicity based on discriminating structural features and reassembled medicinal chemistry building b
Constantine Kreatsoulas1, Chihae Yang2, Glenn J. Myatt2, and James F. Rathman3. (1) BMS, Princeton, NJ 08543, constantine_kreatsoulas@merck.com, (2) Leadscope, Inc, (3) Department of Chemical and Biomolecular Engineering, The Ohio State University

A chemical structure-based strategy is used to develop two classes of predictive models of genetic toxicity as determined by the SOS Chromotest assay. The SOS assay has high concordance with the standard Ames assay and has been used successfully for numerous diverse compound classes. In one approach, the MultiCASE algorithm was used to automate the extraction of substructures for the prediction of genotoxicity. This model was then applied to data sets for which SOS data is available.

In addition to modeling the global results, models for chemically similar subsets were also developed. For each specific dataset and endpoint, predictive scaffolds were then constructed using structural features from a library of 27,000 medicinal chemistry building blocks. Scaffolds were built separately for the global dataset and each subset. Results are compared for models built using partial logistic regression for both binomial and multinomial ordinal toxicity endpoints.

2:00 28 Building and using an in-house platform for data mining and analysis integrating open source and proprietary software: I. Designing and constructing the framework
Erik Evensen, Hans E. Purkey, Ken Lind, and Erin K. Bradley, Computational Sciences, Sunesis Pharmaceuticals Inc, 341 Oyster Point Blvd., South San Francisco, CA 94080, Fax: 650-266-3501, ee@sunesis.com

A common problem faced by computational chemists is integrating and transferring data among numerous and disparate systems. This process often involves managing and translating multiple flat files, a process that does not scale well to complex workflows with large data sets. We have constructed a database-backed platform utilizing open source software, primarily MySQL and Python, that enables building complicated data management and analysis processes incorporating data generated by both open and closed source software. In addition, we have developed internal protocols based on open standards such as XML-RPC to make available computational results both within and outside of our platform. By using well-known, open standards, we are able to leverage widely available knowledge and experience. We will present lessons learned and wisdom gained during the development of this platform.

2:30 29 ABCD: From data to insight
Dimitris K. Agrafiotis, Johnson & Johnson Pharmaceutical Research & Development, L.L.C, 665 Stockton Drive, Exton, PA 19341, Fax: 610-458-8249

Johnson and Johnson has recently unveiled ABCD (http://www.bioitworld.com/archive/061704/discovery.html), an informatics platform that bridges multiple continents, data systems and cultures using modern information technology, and provides researchers with an environment that allows them to make better decisions. The system consists of three major components: 1) a data warehouse, which combines data from multiple chemical and pharmacological transactional databases, organized using dimensional modelling principles to support supreme query performance; 2) a state-of-the-art application suite, which facilitates data upload, retrieval, mining and reporting, and 3) a workspace, which facilitates collaboration by allowing users to share queries, templates, results and reports across project teams, campuses, and other organizational units. A central goal of ABCD is to provide users with the means to retrieve, view and analyze multifactorial SAR data. Key to the success of this effort is the ability to combine fast substructure and similarity searching with conventional relational queries, and deliver the results in an expedient and visually compelling format. In this presentation, we give an overview of ABCD, and focus on a few core components that represent the system's "chemical intelligence", including the chemical cartridge, sketcher, molecular spreadsheet and interactive data mining components.

3:00 30 Double focusing by molecular bioactivity and drug likeness
Anwar Rayan, David Marcus, Ohad Givaty, Dinorah Barasch, and Amiram Goldblum, Medicinal Chemistry and Natural Products, Hebrew University of Jerusalem, School of Pharmacy, Jerusalem 91120, Israel, Fax: 972-2-675-8925, anvarr@md.huji.ac.il, amiram@vms.huji.ac.il

We have developed an Iterative Stochastic Elimination (ISE) algorithm to construct sets of best results for highly complex combinatorial problems1-4. The ISE was used to construct sets of molecular descriptor ranges that serve as filters for distinguishing between drugs and non-drugs. Other methods suggest filters that produce a binary result, acceptance or rejection of a molecule as a drug candidate. We employ large sets of best filters to assign a Drug Like Index (DLI) to any molecule, which corresponds to its chance to belong to a database of drugs. A similar approach is applied to databases of biological activity, for which a Molecular Bioactivity Index (MBI) is produced for any specific activity. We find many molecules with a high DLI value in large databases of non-drugs, and propose to examine them for their bioactivity. These molecules are then assigned values of MBI for a specific bioactivity. This double focusing approach with DLI and MBI is proposed as a process for discovering molecules with specific biological activities in large databases of known or of virtual molecules.


(1) Glick, M.; Rayan, A.; Goldblum, A. Proceedings of the National Academy of Sciences of the United States of America 2002, 99, 703-708. (2) Glick, M.; Goldblum, A. Proteins-Structure Function and Genetics 2000, 38, 273-287. (3) Rayan, A.; Noy, E.; Chema, D.; Levitzki, A.; Goldblum, Current Medicinal Chemistry 2004, 11, 675-692. (4) Rayan, A.; Senderowitz, H.; Goldblum, A. Journal of Molecular Graphics and Modelling 2004, 22, 319-333.

3:30 31 Paper Withdrawn - Chemical datamining approach to scaffold based QSAR studies of NCI anti-tumor dataset
M Karthikeyan, Information Division (Digital information Resource Centre), National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India, Fax: +91-20-5893973, karthi@ems.ncl.res.in, Letha Sebastian, Dept of Bioinformatics, Amman College, and Alexander Tropsha, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina

National Cancer Institute (NCI) has been carrying out in vitro screening of compounds to determine their in vitro inhibitory activity of cell growth in the NCI 60 human cancer cell lines for the purpose of anticancer drug discovery. The chemical structures along with their activity data were processed for removing duplicate molecules and error structures. In this process about 32000 molecules with their reported biological activity data (NLOGGI50, NLOGTGI, NLOGLC50) for 60 human tumor cell lines were organized in the Oracle database table. Each molecule and their biological activity data were linked to corresponding molecular descriptors using common identifier for querying the database. Various molecular descriptors of type “topological, electronic, quantum mechanical, 2D and 2D” along with predicted properties such as molar refractivity, solubility logP(o/w) partition co-efficient, and the drug likeliness related information related with Lipinsky rule of 5 including ‘number of rotatable bonds, number of hydrogen bond acceptors, number of hydrogen bond donors, total polar surface area etc., were calculated for all these molecules which are essentially required for QSAR/QSPR analysis. Scaffold and functional group analysis was conducted on NCI data set to identify the number of common scaffolds. [Fig-1]. Selected sets of scaffolds were used for QSAR studies using MOE descriptors and in-built PLS, PCR and other statistical methods. The methods of data-mining and computational results are presented.

4:00 32 The use of Random Forests for modeling in vitro ADMET endpoints
Jason D Hughes, Molecular Informatics, Pfizer, 620 Memorial Dr, Cambridge, MA 02139

A framework for molecular property/activity prediction consisting of a Random Forest model coupled with a custom set of descriptors has been found to be very effective across a variety of endpoints, including kinetic solubility, membrane permeability, metabolic stability, and dofetilide binding. Random Forests are bagged decision tree ensembles that are trained and applied normally but for one exception: only a small, randomly selected subset of descriptors are considered when selecting the best split at each node during tree construction. The descriptors used here are all simple molecular substructure or feature counts encoded as Daylight SMARTS queries. Some mathematical properties of these RF-based models have been explored, including the impact of descriptor and training set selection schemes, nearest neighbor effects, etc. Additionally, examples will be given to demonstrate that the effectiveness of this modeling paradigm compares favorably to a selection of alternatives.

2:00 33 Web services as integrators of public chemistry databases
Gary Wiggins, School of Informatics, Indiana University, 901 E. Tenth Street, Bloomington, IN 47408-3912, Fax: 812-856-4764, wiggins@indiana.edu

PubChem and other chemistry databases on the Web will provide a wealth of chemical and biological information. We are embarking on a series of projects that will utilize computer simulation and visualization environments to create an integrated chemical informatics cyberinfrastructure built on modern distributed service architectures. The projects will use the emerging high-capacity computer networks, powerful data repositories, and computers that comprise the Grid, thus ensuring scalability, computational efficiency, and interoperability among heterogeneous components. A description of the overall architecture of the projects and the planned links to the databases will be presented.

2:30 34 Chemical and biological data from DTP/NCI
Daniel W Zaharevitz, Information Technology Branch, Developmental Therapeutics Program, National Cancer Institute, EPN, Room 8010, 6130 Executive Blvd, Bethesda, MD 20892, Fax: 301-480-4808, zaharevitz@dtpax2.ncifcrf.gov

The Developmental Therapeutics Program (DTP) at the National Cancer Institute has been acquiring compounds for testing since 1955. This effort has resulted in the accumulation of a wealth of chemical and biological information. DTP has made this information useful to the research community by making the data publicly available and by developing tools that search and analyze the data. Over 250,000 chemical structures and over 10 million biological data points are available. Biological data includes measurement of growth inhibition in sixty human tumor cell lines, growth inhibition in yeast strains with defined mutations, protection from HIV in cell culture, anti-tumor activity in numerous mouse tumor models in vivo, and several other assays. Searches can be done by NSC number, CAS registry number, chemical name, or chemical substructure. Development of a data architecture for organizing this data will be discussed as well as plans for future additions to the data.

3:00 35 Public information databases for virtual screening
John Irwin and Brian Shoichet, Department of Pharmaceutical Chemistry, University of California, San Francisco, 1700 4th St, San Francisco, CA 94143, jji@cgl.ucsf.edu

Investigators wishing to apply computational methods such as virtual screening to discover novel ligands for proteins require a database of molecules suitable for docking. To shorten the hypothesis-testing cycle, these compounds should be commercially available and broadly "drug-like". To address this need, which has been a barrier to entry to this field, we developed the ZINC database of purchasable compounds for virtual screening, a collection currently of 3.3M compounds available from over 20 vendors. Notwithstanding our original goal of serving the virtual screening community, ZINC has attracted the attention of cheminformaticists more generally as a source of publicly available chemical structures for research. By the time of this meeting, a large part of ZINC should have been loaded into PubChem, the new database of chemical structures and screening data from NCBI that is tightly linked into the chemical and biological literature. This link from PubChem to ZINC complements the existing links from ZINC into PubChem, and to compound vendor websites. We hope this growth of a web of publicly available chemical information, linking the literature to 3D structures, properties, and chemical suppliers, will be a boon to investigators, particularly those who have hitherto not had access to this information. ZINC is on the web at http://zinc.docking.org.

3:30 36 NIST Computational chemistry comparison and benchmark database
Russell D. Johnson III, Computational Chemistry Group, National Institute of Standards and Technology, 100 Bureau Drive Stop 8380, Gaithersburg, MD 20899, Fax: 301-869-4020, russell.johnson@nist.gov - SLIDES

The NIST Computational Chemistry Comparison and Benchmark Database (CCCBDB) is a website and database which allows users to compare ideal-gas thermochemical properties determined by experiment or by quantum chemical calculations. The database contains experimental data for more than 640 small molecules, and over 100 000 calculations. Types of data include enthalpies of formation, entropies, geometries, vibrational frequencies, and dipole moments. The primary goal of the CCCBDB is to allow comparisons of thermochemistry and related properties (entropies, geometries, vibrational frequencies). The CCCBDB illustrates the question “How good is that calculation?” by providing many examples. This talk will describe the data present in the CCCBDB, the tools available through the website for comparisons, and the future plans of the CCCBDB. The CCCBDB is accessible at http://srdata.nist.gov/cccbdb.

4:00 37 Chemical information databases for environmental fate and exposure assessments
Suzanne Bogaczyk1, Philip H. Howard2, William M. Meylan2, Amy Hueber2, and Jay Tunkel2. (1) Syracuse Research Corporation, 1215 South Clark Street, Suite 405, Arlington, VA 22202, Fax: 703-418-1044, sbogaczyk@syrres.com, (2) Environmental Science Center, Syracuse Research Corporation

Accurate and dependable sources of chemical information are of great importance in the assessment of chemicals for environmental purposes. Syracuse Research Corporation (SRC) produces and maintains several databases of this type, including the Environmental Fate Database (EFDB) and the physical properties database (PHYSPROP). The EFDB, which is continually updated and maintained at SRC, was developed in conjunction with the EPA to allow rapid access to available environmental fate and physical/chemical properties data on chemical substances. PHYSPROP contains a recommended single value for water solubility, octanol water partition coefficient, melting and boiling point, vapor pressure, Henry's Law constant, and hydroxyl radical rate constant for over 25,000 chemicals. SRC also developed ChemS3, a web-based search engine which allows sub-structure searches to be combined with queries of text and numeric data. The compilation and versatility of these databases to effectively search for environmental fate and exposure information on chemical substances will be discussed.

38 3-D Database search queries for colchicine binding site inhibitors
Ann Hermone, Tam Luong Nyguyen, James Burnett, Connor McGrath, Ernest Hamel, Daniel W Zaharevitz, and Rick Gussio, Information Technology Branch, Developmental Therapeutics Program, PO Box B, FVC 310, Frederick, MD 21702, Fax: 301-846-6106, hermone@dtpax2.ncifcrf.gov

Microtubules, which are linear arrays of alternating alpha and beta tubulin, are critical for cellular proliferation and are therefore a target of cancer chemotherapy. Colchicine was the first compound found to bind at the interface of alpha and beta tubulin and to destabilize microtubules. Over the years, a large number of structurally diverse small molecules have been shown to bind at the colchicine site of tubulin and inhibit tubulin polymerization. In other work by our group, docking studies involving the recently-determined X-ray structure of the alpha,beta tubulin/colchicine complex were used to construct binding models for a set of structurally diverse colchicine site inhibitors, which subsequently formed the basis for a common pharmacophore. This study expands on that work by developing internally consistent Catalyst search queries that can discriminate between colchicine site inhibitors and their inactive congeners.

39 Algorithms and cancer drugs: In silico design of S100B ligands to block p53 bin
John L. Whitlow, Department of Chemistry, East Carolina University, 300 Science and Technology Building, Greenville, NC 27858, Fax: 206-424-1645, john@johnwhitlow.com

Cancer is the leading cause of death for persons under the age of 85. Elevated levels of S100B are associated with cancer. This research focused on interactions between S100B and the tumor suppressor protein, p53. S100B disrupts p53's protective function by inhibiting p53's C-terminal regulatory domain phosphorylation. This study designed compounds to block the effects of S100B on p53. Compounds that enhance p53's cellular function may provide potent anticancer therapies.

Accelrys's Cerius2 software was used for de novo drug design. The three dimensional structure of S100B was analyzed to resolve its main interaction sites. Fragment molecules were screened against targets of interaction in the S100B active site. Top fragment molecules were used as scaffolds to design complete ligand molecules. Additionally, public and private molecular libraries were run through docking algorithms to locate existing molecules with high affinities for the S100B active site. ADME and toxicity properties were also investigated.

40 Framework for integrating transcriptomic and proteomic profiles in Escherichia coli
Kunal Aggarwal, Leila H. Choe, and Kelvin H. Lee, School of Chemical and Biomolecular Engineering, Cornell University, 120 Olin Hall, Ithaca, NY 14853, Fax: 607-255-9166, ka62@cornell.edu

We have developed a model experimental system to study the relationship between mRNA and protein expression profiles in genetically perturbed E. coli. Experimental data at the genomic, transcriptomic and proteomic levels from these cells are integrated on a common platform to understand the effects of the introduced genetic and environmental perturbations in the cells at the molecular level. The cells are perturbed to overexpress fragments of rhsA in presence of IPTG and are observed to have a reduced growth rate. Gene expression and protein abundance data from these cells suggests a perturbed translation machinery and a non linear correlation between the mRNA and protein levels in rhsA overexpressing E. coli cells. The gene expression data is integrated with the connectivity information between genes and their transcription factors using network component analysis to gain information on altered levels of transcription factor activity and to identify parameters that may cause the observed non linearity between the mRNA and protein levels.

41 3-D-QSAR CoMFA and COMSIA studies of novel alkoxylated and hydroxylated chalcones as potential anti-malarial agents
Devendra S Puntambekar and Mange Ram Yadav, Department of Pharmaceutical Chemistry, The M.S University of Baroda, Pharmacy department, Faculty of Technology & Engineering, Kalabhavan, P.O Box - 51, Vadodara, Gujrat, India, Baroda 390 001, India, Fax: +91-0265-2423898/2418927, devendra_res@yahoo.co.uk

Comparative molecular field analysis (CoMFA) and Comparative molecular similarity indices (CoMSIA) was performed on a series of novel alkoxylated and hydroxylated chalcones as antimalarial agents. The ligand molecular superimposition on template structure was performed by atom/shape based RMS fit methods. The removal of outliers from the initial set of 69 compounds improved the predictivity of the models. The statistically significant model was established from 52 compounds, which were validated by evaluation of test set of 14 compounds. The atom and shape based alignment yielded best predictive CoMFA model (r2cv = 0.674, r2ncv = 0.957, r2pred = 0.670, F value = 83.040, r2bs = 0.992 with six components) while CoMSIA model yielded (r2cv = 0.610, r2ncv = 0.913, r2pred = 0.726, F value = 50.115, r2bs = 0.947 with seven components). The contour maps obtained from 3D-QSAR studies were appraised for the activity trends of the molecules. The results indicate that steric, electrostatic, hydrophobic and hydrogen bond donor substituents play significant role in the antimalarial activity of these compounds.

42 Automatic molecular library generation of processed bioenzymes by proteolisys methods for bioremediation processes
Vito Librando, Danilo Gullotto, and Zelica Minniti, Department of Chemistry, University of Catania, via Andrea Doria 6, Catania, Catania, Italy, vlibrando@unict.it, envch3@unict.it

The goal of the present work concerns the implementation of informatic procedures, able to interface themselves with application software environments. Procedures were developed for computer processing in molecular modeling fields and allow generation of molecular libraries, including data relative to sequence and structure configurations of bio-enzymes. Each library contains molecular structures that differs for several amino acid delections inside specified molecules regions. So, it is possible to obtain a collection of molecular fragments, sourced from the ancestral protein. Protein side chains obtained by this strategy, were compatible with the enzymatic proteolysis methods that are used on conventional laboratory protocols and that was useful to decrease the time required to apply experimental procedures. The developed methodology was able to identify many chemical-physic properties in the source molecule, leading the selection procedure to find out the most suitable residues candidates to proteolisys. The program takes into consideration a set of index and parameters, related both amino acids sequences properties (hydrophobicity) and the occurrence of amino acids typology within secondary structures(helixs, sheets and loops). Criteria used to perform the choice of residues suitable for proteolisys methods were based on the capability to recognize many features in a protein sequence. The advantage of a such strategy consists of allowing proteins to maintain their structural and energetics features, without loss of conformational changes in the secondary structure release avoiding, consequently, a probable loss of the protein activity. Finally, this method allows generation of a wide set of optimised fragmented structures that are suitable to be tested and applied in subsequent computing molecular modeling environment. Acknowledgements The Authors are grateful to MIUR for the financial support

43 Library generation and lead selection for optimal laboratory procedure of environmental biocatalists
Vito Librando, Danilo Gullotto, and Zelica Minniti, Department of Chemistry, University of Catania, Viale Andrea Doria, 6, Catania 95127, Italy, Fax: +39-095-580138, vlibrando@unict.it

Between Sicilian contaminated sites, particularly the Siracuse Bay, poor attention has been given to the pollution and remediation. The petroleum products that remain as long term contaminants, include polycyclic aromatic hydrocarbons (PAHs), that are a family of ubiquitous pollutants with similar biological activity, high toxicity, mutagenic and carcinogenic power. This paper describe preliminary results of an in situ treatment strategy using engineered enzymes extracted from selected bacteria for low-cost bioremediation of petroleum products that are poorly degraded by naturally-occurring bacteria. Effects of sequence modification can be predicted using particular algorithms, and it is possible to design and test numerous different active molecules derived from the original ones. Multiple virtual delections of the aminoacidical sequence were obtained working on the original PDB file, and new sequence were annealed using force fields in molecular dynamics simulations in which were considered real environmental parameters. The structures were analyzed to find the ones with the best configuration of active site and selective channels for the substrate; then multiple docking simulations were performed for all the different substrates giving information about the amount of the interactions between enzymes and substrates of environmental interest. A complete scan of protein surface were carried out using naphthalene as probe to find new eventual inactive binding site that could hold the substrate far from the active site.

44 Modeling vs. X-ray crystallography: The basal activity of constitutive androstane receptor (CAR)
Björn Windshügel, Institute for Pharmaceutical Chemistry, Martin-Luther-University Halle-Wittenberg, Wolfgang-Langenbeck-Str. 4, Halle (Saale) 06120, Germany, bjoern.windshuegel@pharmazie.uni-halle.de

Abstract not available.

45 Mok: A domain-specific language for molecular information processing
Ivan Tubert-Brohman and William L. Jorgensen, Department of Chemistry, Yale University, 225 Prospect St., New Haven, CT 06520, Fax: 203-432-6299, Ivan.Tubert-Brohman@yale.edu

Mok is a domain-specific language for molecular information processing, based on the same execution paradigm as the AWK programming language. It is derived from Perl and includes specialized functions and command-line options for molecular file input and output, substructure matching, bond perception from 3D coordinates, and an object model for accessing and modifying various properties of the atoms and bonds in a structure. It is freely available on CPAN under the same license as Perl itself.

46 WinDock: An integrated structure-based drug discovery environment using graphical user interface
Zengjian Hu1, Donnell Bowen2, Shaomeng Wang3, and William M. Southerland1. (1) Department of Biochemistry and Molecular Biology, Howard University College of Medicine and the Howard University Drug Discovery Unit, 520 West Street, Northwest, Room 324, Washington, DC 20059, zhu@howard.edu, (2) Department of Pharmacology, Howard University College of Medicine, (3) Comprehensive Cancer Center and Department of Internal Medicine, The University of Michigan

In recent years, virtual database screening using high-throughput molecular docking (HTD) has emerged as a very important tool and method for finding new leads in the drug discovery process. Most HTD efforts utilize expensive workstations and hard-to-master Unix-like operating systems. With the advent of powerful and inexpensive personal computers (PCs), it is now possible to perform HTD investigations on Windows-based PCs. To make HTD more accessible to a broad community, we present here WinDock, an integrated structure-based drug discovery environment on Windows-based personal computers (PCs) which integrates small molecule searchable 3D databases, homology modeling tools, ligand-protein docking programs, and consensus scoring functions into a cohesive system which provides a general tool for a wide range of computer-aided drug discovery applications, including protein homology modeling, lead identification, and lead optimization. WinDock is coded in C++ language and is distributed free of charge for all users.

8:00 47 Turbo similarity searching
Jérôme Hert1, Peter Willett1, David J. Wilton1, Kamal Azzaoui2, Edgar Jacoby2, and Ansgar Schuffenhauer2. (1) Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom, j.hert@sheffield.ac.uk, (2) Discovery Technologies, Novartis Institute for Biomedical Research

Previous work has shown that fusing the outputs of similarity searches carried out using different isoactive reference compounds produces a more effective ranking than one based on just a single reference compound. Turbo similarity searching applies this strategy using a reference molecule and its nearest neighbours. The similar property principle implies that these neighbour compounds are likely to have a similar bioactivity profile; accordingly it may be worth including them in a fusion procedure. The effectiveness of this method is investigated by means of simulated virtual screening experiments using the MDL Drug Data Report Database. Extensive searches are carried out for eleven diverse activity classes and consistently demonstrate the superiority of turbo similarity searching over conventional similarity search. This method hence represent a simple way of enhancing similarity-based virtual screening methods.

8:10 48 On-line submission and peer review systems
William G Town, Kilmorie Consulting, 24A Elsinore Road, London SE23 2SL, United Kingdom, bill.town@kilmorie.com - SLIDES

In the last ten years, electronic publishing of the results of scientific research has developed from being a novelty to being accepted as the normal method of publishing. Systems for online submission of articles, for peer review and for transmission of approved articles into the production workflow systems which manage both print and electronic publishing are now commonplace. This paper will review the technologies which have made this transition possible and the impact of these systems on authors' and peer reviewers' experience of publishing and on the timeliness of peer review and publishing. The impact of preprint servers will also be discussed.

8:40 49 Path to document recommendation services: Technologies that enabled the development of on-line information systems
Gerry Grenier, Publishing Technologies, IEEE, Inc, 445 Hoes Lane, Piscataway, NJ 08855, g.grenier@ieee.org - SLIDES

Online information services are a 40-year-old phenomenon. The evolution of these services has been on-going, with perhaps the most significant period of change occurring over the past 10 years. The development of the internet infrastructure, the rise of the world wide web and the http protocol spurred an explosion of online information services that has evaporated temporal and spatial barriers to information. Lost in the excitement of the development of the internet are the developments of the previous 30 years that have contributed significantly to the search and discovery capabilities of online information services. Among those developments are full text search, relevancy ranking, and markup languages. This paper will offer a look at the development of these three technologies, and then review the state of the art of the nascent service of document recommendation – a service that is built upon the three aforementioned technologies.

9:10 50 Clustering and meta-search as enabling technologies for rapid creation of vertical web portals
Raul E. Valdes-Perez, Vivisimo Inc, 2435 Beechwood Blvd, Pittsburgh, PA 15217, Fax: 412-422-2495, valdes@vivisimo.com - SLIDES

Specialized web portals, or vortals, provide a comprehensive gateway for information on a scientific specialty. The vortal fad of the late 90's stalled because of costs, both human and technology, and the lack of a web business model. The situation is now changed: the new technologies of search, meta-search, and clustering enable rapid deployment of vortals that index one's own or public information, meta-search partner search engines, and cluster the combined information into categories. This opens up radical new possibilities for publishers and for industry to deploy internal vortals for their scientists and engineers.

9:40 51 Why your library doesn't do what you want it to
Stuart L. Weibel, Office of Research, OCLC Online Computer Library Center, 6565 Frantz Road, Dublin, OH 43017, Fax: 614 764 2344, weibel@oclc.org - SLIDES

The success and appeal of Lego blocks is more than esthetics; it is rooted in engineering principles that comprise the foundation of the industrial age, and are essential for systems engineering. The Lego metaphor gives us a sound conceptual model for the design of information systems including the principles of standardized interfaces, modular design, and extensibility.

There is a darker aspect of information systems that can be modeled with Lego as well, however, and this metaphor sheds some light on the difficulties we have in the design and use of electronic publishing systems and digital library technology in general. The Week-After-Christmas metaphor evokes a box of dozens or hundreds of unique parts that, while interoperable in some sense, can be recombined in a staggering array of configurations… not all of which make sense. The rapid changes in a broadly distributed information environment make it impossible to anticipate these changes and difficult simply to accommodate them in a coherent way. The result is constant change and a requirement for adaptation that is a new feature of the education and research process.

The challenge of designers and users is to recognize the useful bits that work together, and configure them in sustainable, cost-effective systems that meet the functional requirements of libraries and their constituents.

10:30 52 CAS Registry: An evolving resource for science
Roger J. Schenck, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: 614-461-7140 - SLIDES

From the inception of CA in 1907 and the publication of the first CA Substance Index in 1920, the identification and storage of substance information from the publicly-disclosed literature has always been a major focus at CAS. Starting from a manually-curated 3X5 card file, the CAS Registry has evolved into a computer-based collection approaching 80 million records for organic and inorganic molecules, proteins, and sequences. The CAS Registry began as a tool to serve the needs of CAS Production Operations but soon became and remains today an essential adjunct to the work of researchers in academia, industry, and government agencies around the world. This talk will concentrate on the history of the CAS Registry, focusing on the changes in computer technology that have enabled the evolution and huge growth of this world resource.

11:00 53 Why are we still reading "papers" in a digital world? Can papers become digital, too?
David P Martinsen, ACS Publications, American Chemical Society, 1155 16th Street NW, Washington, DC 20036, Fax: 202-872-4389, d_martinsen@acs.org - SLIDES

The last ten years have witnessed a revolution in the way scientists receive information, with a remarkable impact on discovery and delivery. It is much easier than ever before to find articles of interest and to obtain those articles without ever leaving the lab or office. However, the method by which scientists read articles has been evolutionary at best. Most of us still print out a copy of the article to read and make notes. This talk will examine some of the challenges to making reading a more digital experience, to better realize the promises of the enhanced, digital editions of articles.

11:30 54 Electronic data standards for spectroscopy and analytical procedures
Antony N. Davies, Waters Informatics, Europaallee 27-29, 50226 Frechen, Germany, Fax: +49-2234-9207-99 - SLIDES

With the ever increasing availability of information in electronic form such as in peer-reviewed journal articles, electronic patent submissions or pharmaceutical submissions to regulatory bodies comes the equally increasing pressure on standards bodies such as IUPAC and the ASTM International to ensure that the associated data can be made available in open standard formats. This talk will review the state-of-the art and identify good-practice which scientists around the globe should adopt.

2:00 55 Science online: Bridging scientific disciplines
Monica M. Bradford, Science, AAAS, 1200 New York Avenue, NW, Washington, DC 20005, Fax: 202-289-7562, Mbradfor@aaas.org - SLIDES

Science, the premier, inter-disciplinary journal of the world, has been helping scientists communicate their peer-reviewed results to the scientific community for over 125 years. For the last 10 of those years, Science has embraced electronic publishing as a means of enhancing scientific communication and helping scientists make their results more accessible. Moving to an entirely electronic workflow for the journal has led to decreased processing times, increased submissions, and a more international review process. Forward and backward reference linking, multi-media enhancements, online supplemental material, and a suite of tools have made the online version of the journal a richer resource. Creation of two online knowledge environments has allowed the staff to experiment with creating online resources that expand beyond the traditional journal. Over the next five years, the challenge will be to match technology to researchers' behaviors to ensure that the communication vehicles match work styles and information needs.

2:30 56 Publishing innovation at the Royal Society of Chemistry
Richard Kidd and Robert Parker, Royal Society of Chemistry, Thomas Graham House, Science Park, Milton Road, Cambridge CB4 0WF, United Kingdom, Fax: +44 1223 420247, kiddr@rsc.org - SLIDES

The RSC implemented an XML production route for its journals in 2000, and from there has worked to use the XML to improve our publications. XSL-FO is used to produce our database products and this had promise for journal article make-up. We will demonstrate our innovative data checker, developed with the Unilever Centre at Cambridge University, which has great potential for the extraction of chemistry from previously published work. We will show the ways we are developing our online articles to increase the chemical science content in our journal articles.

3:00 57 Meeting the communications needs of physicists: AIP’s electronic publishing experiences
Tim Ingoldsby, Director of Business Development, American Institute of Physics, 2 Huntington Quadrangle, Suite 1NO1, Melville, NY 11747-4502, Fax: 516-576-2327, tingoldsby@aip.org - SLIDES

Physicists invented the World Wide Web to satisfy their need for rapid communication of research results. AIP responded to this need through early experiments with Web publishing and the creation of an online hosting service, Scitation, to meet the needs of physical science and engineering publishers. The evolution of AIP's online services will be discussed, with special emphasis on the development of linking, one of the primary “value adds” of electronic publishing. Reports about two ongoing projects, the STIX Fonts and Essential Information Objects, will be used to speculate about things to come in the world of electronic scholarly communication.

3:30 58 Electronic publishing and disruptive technologies
Karen Hunter, Elsevier, 360 Park Avenue South, New York, NY 10010, k.hunter@elsevier.com - SLIDES

Paul Saffo of the Institute for the Future has said we are in a period of "unprecedented uncertainty" as to technology and its effects on our business. This paper explores the effects of disruptive technologies and "unprecedented uncertainty" about technology on the process of experimentation, product creation and investment in STM electronic publishing. Specific experiences of Elsevier over the past 25 years are used as illustrations of technology trade-offs and technology-related strategic issues, as well as a look at some of the current technology concerns.

4:00 59 Genesis of ACS electronic journals
Lorrin R. Garson, Publications Division, American Chemical Society (retired), 1155 Sixteenth Street, N.W, Washington, DC 20036-4892, garson9929@yahoo.com - SLIDES

The availability of ACS journals on the World Wide Web is the consequence of over 25 years effort. In anticipation that “someday” journals would be delivered electronically, a concerted effort was made in journal production, starting in 1975, to create journal data in consistent structures that would enable the creation of digital products. Over the years various experimental efforts were undertaken in collaboration with universities, publishers and organizations to bring us to today's journals on the Web.

8:30 60 InChI: Open access/open source and the IUPAC International Chemical Identifier
Stephen R. Heller, Stephen E. Stein, and Dmitrii V. Tchekhovskoi, Physical and Chemical Properties Division, NIST, Gaithersburg, MD 20899-8380, srheller@nist.gov

With the acceptance and use of the Internet throughout the world-wide scientific community, the ability for chemists and their colleagues in related fields to communicate more readily and at less expense has finally arrived. Open Access and Open Source are public domain, freely available projects which allow for the free exchange of information and are having and will continue to have a positive and profound effect on chemists worldwide.

IUPAC has long been involved in the development of systematic procedures for naming chemical substances on the basis of their structure. The resulting rules of nomenclature, while covering a large fraction of compounds, were designed for text-based media. IUPAC has now developed an open source, public domain means of representing chemical substances in a format more suitable for digital processing and the Internet, involving the computer processing of chemical structural information (connection tables). This has led to the development of the IUPAC International Chemical Identifier, InChI. Details of the status and acceptance InChI and related freely available Open Access information tools, such as chemistry journals (e.g., Beilstein Journal of Organic Chemistry) will be discussed in this presentation.

8:55 61 Promoting data standards and open public access to structure-searchable toxicity databases: DSSTox and coordinated public efforts
Ann M. Richard, US EPA, MD B143-06, Research Triangle Park, NC 27711, Fax: 919-685-3263, richard.ann@epa.gov, and Maritja Wolf, Lockheed Martin, Contractor to the US-EPA

The Distributed Structure-Searchable Toxicity (DSSTox) database project seeks to: promote use of chemical structure standards and file formats for chemical toxicity databases; coordinate with other public efforts to encourage chemical structure annotation, data standardization, and open public access to toxicity databases; and enhance the ability of scientists and regulators within and outside EPA to integrate, explore, and utilize chemical toxicity data from a structural perspective to improve capabilities in predictive toxicology. DSSTox is coordinating with the public LIST ToxML and International Life Sciences Institute on toxicity data standardization efforts in Developmental Toxicity and other areas of toxicology. Additional collaborations are ongoing with the LHASA VITIC structure-activity database effort, the National Cancer Institute's chemical databases and structure-browser, NCBI's PubChem, the IUPAC/NIST InChI chemical information code project, and the NIEHS Chemical Effects in Biological Systems (CEBS) toxicogenomics knowledge-base, among others.

9:20 62 The US EPA contribution to the OECD work on the validation, for regulatory purposes, of (quantitative) structure activity relationships: (Q)SARs
Maurice Zeeman, Kelly Mayo, and Oscar Hernandez, Office of Pollution Prevention and Toxics, U.S. Environmental Protection Agency, 1200 Pennsylvania Ave., NW (7403), Washington, DC 20460, zeeman.maurice@epa.gov

OECD's work on the validation, for regulatory purposes, of (Quantitative) Structure Activity Relationship (aka (Q)SAR) models will be described.

9:45 63 OECD residue chemistry guideline harmonization project
Amy Rispin, Rick Loranger, and Steve Funk, Office of Pesticide Programs (OPP), U.S. Environmental Protection Agency, 1200 Pennsylvania Ave., NW, Washington, DC 20460, rispin.amy@epa.gov

A US-led Expert Group on Pesticide Residue Chemistry, established in Oct. 2003, is developing two guidance documents and five Test Guidelines (and templates for reporting test study summary data).

10:10 64 Performance standards for quality assurance of validated alternative test methods
Amy Rispin, Office of Pesticide Programs (OPP), U.S. Environmental Protection Agency, 1200 Pennsylvania Ave., NW, Washington, DC 20460, rispin.amy@epa.gov, Kailash Gupta, U.S. Consumer Product Safety Commission (CPSC), and William Stokes, Division of Intramural Research, National Institute of Environmental Health Sciences (NIEHS)

Alternative test methods for toxicological tests are being sought to replace animal testing when possible. Such methods must be validated in order to make their use a feasible alternative.

10:35 65 Facilitating electronic submission of chemical information: OECD Harmonized Templates (and XML schema), the U.S. High Production Volume Information System (HPVIS), and the European Union's IUCLID database [panel]
Randall Brinkhuis1, Leslie MacDougall2, Jay Ellenberger3, Brion Cook4, and Todd Holderman4. (1) Office of Pollution Prevention and Toxics, U.S. Environmental Protection Agency, Washington, DC 20460, brinkhuis.randall@epa.gov, (2) OPPT, Risk Assessment Div, U.S. Environmental Protection Agency, (3) Office of Pesticide Programs, U.S. Environmental Protection Agency, (4) OPPT, Information Management Division, U.S. Environmental Protection Agency

This session will include a description of the development of harmonized templates by OECD for submission of individual study report summaries or study evaluation reports. Separate templates are being developed for over seventy different toxicology, ecotoxicology, and physicial-chemical property types. In addition, XML schema are being developed for each template.

Electronic submission of chemical summaries and data will also be discussed, particularly with respect to high-production-volume (HPV) chemicals. The European Union's IUCLID database will also be described.

1:30 66 ELNs: What are they, and what do they need to do?
Keith T. Taylor1, David Hughes1, and Phil McHale2. (1) Marketing, MDL Information Systems, Inc, 14600 Catalina Street, San Leandro, CA 94577, k.taylor@mdl.com, (2) Corporate Communications and Scientific Affairs, MDL Information Systems, Inc - SLIDES

The paper notebook is well understood and the R&D workflow has been derived around its capabilities. The notebook is used to support patent claims, and to prove compliance with regulations, for example the FDA's GLP. An ELN must satisfy these basic needs but it has the potential to do more. The basic functionality the ELN and its potential to drive the evolution of an electronic-R&D environment will be discussed.

2:00 67 Electronic laboratory notebooks in the advanced undergraduate laboratories
Todd E. Woerner, Department of Chemistry, Duke University, Box 90346, Durham, NC 27708, Fax: 919-660-1605, todd.woerner@duke.edu - SLIDES

As part of an ongoing effort to use computer technology in undergraduate courses we introduced an electronic laboratory notebook (ELN) to the physical chemistry laboratory. Blackboard® software was selected as the host for the ELN because it is familiar to students and meets the necessary requirements of security and accessibility. Our eventual goal is to use the ELN throughout the advanced undergraduate laboratories thus completing a long-term initiative to upgrade the laboratories with networked computer systems. With the inclusion of the ELN all aspects of laboratory work, from literature review and experimental design, through instrument control, data collection and record keeping, to analysis and report writing, is managed electronically.

2:30 68 Using electronic laboratory notebooks in an academic environment
Mahesh H Merchant1, Paresh C Sanghani2, and Sonal P Sanghani2. (1) School of Informatics, Indiana University, 719 Indiana Avenue, Walker Plaza, WK319, Indianapolis, IN 46202, mmerchan@iupui.edu, (2) Department of Biochemistry and Molecular Biology, Indiana University - SLIDES

The notion of a paperless laboratory has been around for a long time; however, the adaptation of electronic laboratory notebooks has lagged behind. Innovations in technology and advances in the software implementation of electronic notebooks are making this a reality. The School of Informatics at Indiana University has launched a graduate program in Laboratory Informatics. The curriculum of this program includes the use of a commercial electronic laboratory notebook in conjunction with other scientific data management tools. The experiences of the first group of students using this commercial electronic laboratory notebook will be discussed. An upcoming joint project between the School of Informatics and the Masters Program in Biotechnology using Electronic Laboratory Notebooks on Tablet PCs will be presented.

3:00 69 Expanding the available public chemical information using ELN's
Scott E. Schaus, Department of Chemistry, Boston University, Life Science and Engineering Building, 24 Cummington Street, Boston, MA 02215, Fax: 617-353-6466, seschaus@bu.edu

The Center for Chemical Methodology and Library Development at Boston University (CMLD-BU, http://cmld.bu.edu/) is a new center funded by the National Institute of General Medical Sciences (NIGMS) focused on the discovery of new methodologies to produce novel chemical libraries of unprecedented complexity for biological screening. The goal of the CMLD-BU is to explore and expand the diversity of small-molecule libraries by creating general, useful protocols for stereocontrolled synthesis. A major objective of the CMLD-BU is to provide information and chemistry protocols to the public on parallel and chemical library synthesis. The Center has created the Synthesis Protocols Database of electronic laboratory notebook procedures to accomplish this goal.

3:30 70 From collaboration tool to semantic e-record: The evolving role of the Electronic Laboratory Notebook (ELN)
James D. Myers1, Charles E. Arp2, Tara Talbott1, and Michael Peterson1. (1) Mathematics and Computational Science, Battelle / Pacific Northwest National Laboratory, Battelle Blvd. MS K1-87, Richland, WA 99352, Fax: 208-474-4616, jim.myers@computer.org, (2) Records Management Office, Battelle - SLIDES

The open-source Electronic Laboratory Notebook (eln.sourceforge.net) has been in use as a collaboration and productivity tool at Pacific Northwest National Laboratory (PNNL) for nearly a decade. During that time, the ELN has evolved significantly and has incorporated records-related functionality. Independent of this technical evolution, interest at PNNL in fielding a general electronic notebook as a business record has been growing. Over the past year, these trends have resulted in an active dialog between ELN developers and records managers and discussion of pilot deployment. A number of factors, ranging from the incorporation of web service and semantic technologies within the ELN and records systems at PNNL to successes in migrating other forms of records to electronic form, have contributed to the current momentum. This talk will review the drivers of the current activity, highlight enabling factors, and present the technical and organization progress being made.

4:00 71 Global ELN deployments: Experience from the front lines
Chris J. Ruggles, Professional Services Dept, CambridgeSoft Corp, 100 CambridgePark Drive, Cambridge, MA 02140, Fax: 617-588-9190, cruggles@cambridgesoft.com - SLIDES

In this paper, we will present the challenges presented when managing global deployments of ELN Systems. The migration from a personal, paper paradigm, to a public, electronic model poses complex challenges. We will describe how end user sensitivity to scientific security issues, IT Manager concerns for consistency and maintainability, and executive desire for increased productivity can be bridged by providing a common forum in which these issues can be resolved. The ELN deployment process provides a unique opportunity for groups often in conflict, to discuss, debate, and resolve these issues. Coordinating requirements for patent attorneys, medicinal chemists, process engineers, biologists, as well as executive management brings a perspective to each group that is too often lost in the day to day focus of individual work. CambridgeSoft will present the results of several global ELN deployments where functional working groups have provided invaluable assistance not only to the success of the deployment, but to the greater organization as a whole.

4:30 72 Electronic Lab Notebooks: How they can impact productivity in the laboratory and how to justify a purchase
Richard M. Stember, Scientific Division, EKM Corporation, 25255 Cabot Rd Ste 103, Laguna Hills, CA 92653, Fax: 949-455-1523, stemberr@ekmco.com - SLIDES

Understanding how an electronic lab notebook can benefit laboratory operations is critical to the justification, purchase and proper implementation. This presentation will detail ELN functions that can have the greatest benefit to an organization, review several real-world cases and independent productivity studies. The benefits of collaborative tools, data mining and searching, corporate and regulatory compliance protocols, and interfacing to related applications will be discussed.

An introduction to the developing a Return on Investment (ROI) based upon these functions and projected productivity enhancements will be described.

Which features are needed to improve productivity • What is needed for Research, R&D, QC, and Services laboratories • How integration with existing systems impacts productivity • What may be important outside of the laboratory • How to develop ROI justification based upon projected productivity gains

8:30 73 Green chemistry and the environmental community: Building bridges with ICE -- information, communication, education
Frederick Stoss, Science and Engineering Library, University at Buffalo - SUNY, Buffalo, NY 14260, fstoss@buffalo.edu - SLIDES

"Green Chemistry" is a concept approaching its adolescence in terms of the number of years it has existed. Much of its growth and development has gone virtually unnoticed by the environmental community. The demand for consumer goods, however, still requires connections to chemistry and chemical processes. This demand for goods has placed a wide variety of well-documented burdens on the environment and the producers of those consumer goods. Combating the environmental impact of "chemicals" has proven to be a costly venture in terms of time, energy, expertise, and money. Green Chemistry emerged as a means to shift the focus of environmental impact to prevention of adverse impacts. This is accomplished through design of chemical process more compatible with positive environmental outcomes, such as decreasing the amounts of harmful chemicals used in processes, reduction in on-site storage of chemicals, and implementation of strategies to achieve zero-discharge of pollutants and emissions. The concept of Green Chemistry has been closely aligned to that of sustainability and to a lesser degree the tenants of the Precautionary Principle. The "three-legged stool" model of sustainability—Ecology, Economics, Equity—has a logical analog for introducing the concepts of Green Chemistry for stakeholders: Government Agencies, Business and Industry, Environmental Concern. This presentation will examine another model to explore the potential for increased public awareness and outreach: Green Chemistry ICE—Information, Communication, Education—and cite the example of the Center for Environmental Information (Rochester, NY) as the type of organization that can perform such services, programs, and publications. The roles of components of the American Library Association (e.g., Task Force on the Environment, Social Responsibilities Round Table, and Science and Technology Section of the Association of College and Research Libraries) and the Special Libraries Association (e.g., Environment and Resource Management Division, Chemistry Division, Engineering Division, Science and Technology Division) will also be discussed The case for development of educational services and information products, information resources sharing networks, and effective cross-disciplinary communications to better inform the communities of environmental concern and responsibility will be made.

9:00 74 IUPAC Ionic liquids database, ILThermo
Qian Dong1, Chris D. Muzny1, Robert D. Chirico1, Vladimir V. Diky1, Joseph W. Magee1, Jason Widegren1, Kenneth N. Marsh2, and Michael Frenkel1. (1) Physical and Chemical Properties Division, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328, qian.dong@nist.gov, (2) Department of Chemical and Process Engineering, University of Canterbury - SLIDES

Recent reviews on ionic liquids, one of the emerging topics in chemistry during the past five years, have called for comprehensive investigations of chemical and physical properties in order to understand the nature, functionality, and potential uses of ionic liquids. In 2003, a IUPAC task group was formed to address the need for international scientific cooperation. The goal was to create a web-based comprehensive database for storage and retrieval of metadata and numerical data for ionic liquids, including their syntheses, structures, purity, and properties. The data project consists of two major parts: (1) data capture and storage employing a NIST/TRC data archive system known as SOURCE; (2) data search and retrieval building on a J2EE/ORACLE web platform. It aims to provide users worldwide with up-to-date information on publications of experimental investigation on ionic liquids, including numerical values of chemical and physical properties, measurement methods, sample purity, uncertainty of property values, as well as many other significant measurement details. The database can be searched by means of the ions constituting the ionic liquids, the ionic liquids themselves, their properties and references. The first edition is scheduled for public release via the internet by the end of September 2005.

9:30 75 New tool to improve access to green chemistry and engineering resources
Jennifer L. Young, Green Chemistry Institute, American Chemical Society, 1155 Sixteenth St., NW, Washington, DC 20036, Fax: 202-872-6206, j_young3@acs.org, and Paul T. Anastas, Green Chemistry Institute, American Chemical Society - SLIDES

The resources for green chemistry today are in many forms, but there is no unifying framework that assembles these widely scattered resources. As a result, chemists and engineers, who may not be experts in green chemistry and green engineering, struggle to access the information and implement pollution prevention in their work. The Green Chemistry Institute is currently working on a tool to aid chemists and engineers in identifying opportunities for green chemistry and engineering and locating the most relevant green chemistry resources. With this tool, the user is guided through an opportunity assessment protocol in relation to their chemical or process design and then directed to the resources most relevant to their application. The tool provides access to a wide variety of resources including software, databases, web sites, examples, journal articles, and keywords. Progress on this project and related information dissemination tools will be presented.

10:00 76 Green chemistry and environmental sustainability: A middle school module
Michael Rottas, Pfizer Global Research & Development - Groton/New London Laboratories, Eastern Point Road, Groton, CT 06340, michael.h.rottas@pfizer.com - SLIDES

Pfizer Global Research & Development, through a grant from the Pfizer Education Initiative, and in collaboration with The Keystone Center (of Keystone, Colorado), is developing an interdisciplinary middle school module focused on environmental sustainability and green chemistry. Middle school students learn about the real world challenges of product development while balancing the economic, social and environmental bottom lines. Green chemistry is introduced as an means to succeed in business while ensuring natural resources are available for future generations. The presentation will provide an overview of the ten-day module and the drivers for its development.

10:30 77 CBIAC and homeland security information
James M. King, Chemical and Biological Defense Information Analysis Center, PO Box 196, Gunpowder, MD 21010, Fax: 410-676-9703, kingj@battelle.org - SLIDES

Chemical and Biological Defense Information Analysis Center (CBIAC) is a full service DoD Information Analysis Center. It is DoD's centralized source for Chemical and Biological Defense (CBD) information/technology. CBIAC supports DoD, Federal agencies, contractors, state and local governments, and first responders. CBIAC encompasses all aspects of CBD and homeland security (HLS): Manufacturing Processes for NBC Defense Systems, Chemical and Physical Properties of CBD Materials, Identification, Combat Effectiveness, Counter Proliferation, Counter Terrorism, Decontamination, Defense Conversion and Dual-Use Technology, Demilitarization, Domestic Preparedness, Environmental Fate and Effects, Force Protection, Individual and Collective Protection, International Technology Proliferation and Arms Control, Medical Effects and Treatment, NBC Survivability, Smoke and Obscurants, Toxic Industrial Chemicals and Materials, Toxicology, Treaty Verification and Compliance, and Warning and Identification. This presentation focuses on CBIAC's HLS activities. For more information, see http://www.cbiac.apgea.army.mil/.

11:00 78 Preparing for chemical terrorism response at the Centers for Disease Control and Prevention
David L. Ashley, National Center for Environmental Health, Centers for Disease Control and Prevention, Mailstop F-47, 4770 Buford Highway, Atlanta, GA 30341, Fax: 770-488-0181, dla1@cdc.gov - SLIDES

In the event of domestic terrorism, identification of the agent used and the people affected is critical in an effective response. CDC has been given the federal responsibility for analyzing clinical specimens from people with suspected exposure to chemicals used during a domestic terrorist event, a chemical accident, or an unknown chemical exposure. Specimens may be collected and evaluated for the purpose of identifying the agent used, confirming a preliminary field agent identification, assessing whether individual subjects were exposed, determining the extent of exposure of individual subjects, relating internal dose levels to medical symptoms, assessing individuals for assignment to registries, assessing the geographical and temporal extent of exposure and/or differentiating exposed subjects from the worried well. Biomonitoring is able to provide important information to clarify the confusion during a terrorist event and aid in the public health response to a terrorist event. This presentation will discuss the current status of efforts within DLS to respond to a chemical terrorism event.

11:30 79 Information sharing for science and security: The path forward
Gigi K. Gronvall, Center for Biosecurity of UPMC, 621 East Pratt Street, Suite 210, Baltimore, MD 21202, Fax: 443-573-3305, ggronvall@upmc-biosecurity.org - SLIDES

Since 9/11/2001, a tremendous amount of funds have been devoted to homeland security goals. The people of the US, represented by their government, clearly are counting on scientists to reduce the threat and consequences of another attack. However, are enough scientists truly engaged to work in this field? At the same time as more funds are becoming available, new laws regarding the handling of biological agents have come into effect, and new norms for scientific publication have been promoted by journal editors. Fundamental public policy questions have arisen, questions that have deep implications for the scientific community. How can the competing needs for research transparency be reconciled with the desire to not lower the barriers towards weapons development? How can we bridge the gulf between the scientific and national security communities, so that the best science is brought to bear on homeland security issues? How do US domestic actions square with the global setting that research occupies and information is exchanged? This presentation will focus on the new roles that scientists, their professional organizations, research institutions, and the government are taking on in order to increase national and international security, the sometimes uncomfortable merging of cultures, and paths forward to increase the ease and interest of scientists to work in the interests of homeland security.

1:00 80 ArQiologist: An integrated decision support tool for lead optimization
Atipat Rojnuckarin, Research Informatics, ArQule Inc, Presidential Way, Woburn, MA 01730, arojnuckarin@arqule.com

This talk describes ArQiologist, a Web-based tool that integrates chemical, analytical, biological, and computational data to facilitate decision support for lead optimization at ArQule. It features an easy-to-use graphical query builder that allows queries to be saved, reused, and shared by researchers. In addition to being an integrated portal for ArQule's discovery data, ArQiologist offers customizable treatment of the oft-neglected nuances unique to hierarchical compound-centric discovery data.

1:30 81 Integrating R with the CDK for QSAR modeling
Rajarshi Guha and Peter C. Jurs, Department of Chemistry, Pennsylvania State University, 104 Chemistry Building, University Park, State College, PA 16802, rxg218@psu.edu

In this work we describe the development of a framework combining cheminformatics and statistical functionality to provide a freely available platform for the purposes of QSAR modeling. The Cheminformatics Development Kit (CDK) is an open-source project that provides a comprehensive framework for cheminformatics projects. Features include the ability to read multiple formats, calculate molecular descriptors and perform substructure searches. The R software package is an open-source project that provides a variety of statistical and data mining functionality. A framework was developed to allow the use of the statistical functionality provided by R from within the CDK. We also describe a publicly available service to create QSAR models as well as obtain predictions from reference models developed using this framework. The service also implements the validity measure developed by Guha et al. (J. Chem. Inf. Model., 2005, 45, 65-73) to provide a measure of confidence in the predictions.

2:00 82 Chemogenomic assessment of SAR data from learned journals
Richard Cox, Ah Wing E. Chan, Bissan Al-Lazikani, David Michalovich, and John P. Overington, Inpharmatica Ltd, 60 Charlotte St, London W1T 2NU, United Kingdom, Fax: +44 (0)20 7074 4700, r.cox@inpharmatica.co.uk

Structure activity relationships (SAR) between chemical structures and bioactivities are fundamental to understanding drug-target interactions, especially when considered in a gene family, where one needs to understand and predict relative potency and selectivity. Although this kind of information has been available in learned journals for many years, there is no available system to retrieve this kind of SAR. Historically this has involved intensive scientific effort, and it is difficult to assess coverage and accuracy in any individual case. To address this need, we have developed StARLITe, a database that comprises around 300,000 bioactive, synthetically tractable compounds abstracted from primary medicinal chemistry journals. The associated bioactivity data includes over 1.3 million activity data points and in excess of 4000 unique molecular targets, enabling both rapid extraction of arbitrary SAR tables, and sequence based entry to the data - a unique departure from the traditional compound-centric view of medicinal chemistry data.

2:30 83 Enterprise knowledge management and the industrial revolution in scientific experimentation
Kenneth Eric Milgram, Laboratory Operations Director, Metabolon, 7820 Kingsbrook CT, Wake Forest, NC 27587, Fax: 919-882-8822, eric.milgram@gmail.com

The term “high-throughput” is widely used throughout the pharmaceutical industry. Most scientists probably have a mental image of a high-throughput lab, where many activities are automated with the aid of robotics instrumentation. However, defining specific criteria to designate methods as high-throughput is challenging. For example, in order to be considered high-throughput, is there a minimum threshold for samples produced or analyzed per unit time? Must a high-throughput method be capable of processing samples in a simultaneously parallel fashion, such as with a MUX interface or 96-channel pipette? Is a high-throughput technique inherently of lower quality than a conventional technique?

In many organizations the term high-throughput has acquired negative connotations, particularly with regard to confidence in the quality of a product or service or the ability to infer knowledge from an experiment. Organizations have an unfortunate tendency to dichotomize products and services depending upon whether they were derived from conventional or high-throughput processes. This presentation has two objectives. The first is to gain a better understanding of the relationship between high-throughput and non-high-throughput processes. The second is to understand the implications of this relationship for enterprise data and knowledge management and to offer some concrete examples based on this author's time in industry as a member of both a large pharmaceutical organization and a vendor of knowledge management solutions for scientists. The insights from this presentation have implications ranging from experimental design and implementation to selection of LIMS and electronic laboratory notebooks (ELN).

3:00 84 Integration of chemoinformatics and fragment-based lead discovery
Kashif Hoda, Structural GenomiX, San Diego, CA 92121, kashif_hoda@stromix.com

Structural GenomiX (SGX) has developed FAST(tm) (Fragment of Active Structures), a proprietary technology for lead discovery. In this fragment-based technology, large-scale crystallographic and biochemical screenings, library designs and synthesis are conducted on a daily basis. It is critical to ensure seamless data tracking and query system across the entire processes from fragment screening and selection, virtual library generation, library design, library synthesis, which supports lead discovery efforts. The concept and implementation of the system will be discussed.



Newspaper template for websites