#235 - Abstracts

ACS National Meeting
April 6-10, 2008
New Orleans, LA

 1 Can social networks help to increase the use of e-books in academia?
Sasha I. Gurke, Marketing, Knovel Corporation, 13 Eaton Avenue, Norwich, NY 13815, Fax: 607-337-5090, sgurke@knovel.com

Can promoting e-resources by vendors via social networks increase their usage in academia? We will share a year of experience using Facebook as a tool for promoting the use of Knovel online reference collection in academia, specifically with chemistry students. Successful and not so successful tactics will be discussed, including discussion boards, special events, quizzes, course exercises, and surveys.

Survey results and statistics will be presented for at least one major engineering school.

 2 Enhancing information resources and instruction with computational and chemical software
Jeremy R Garritano, Mellon Library of Chemistry, Purdue University, 504 W. State St., West Lafayette, IN 47907, jgarrita@purdue.edu

As the Chemistry Library begins to move journal subscriptions to electronic only access and experiments with e-books, physical space in the Library has been reallocated. One such transformation has allowed for installation of ten new computers within the Library. Not only do these workstations mirror current workstations throughout the rest of the University Libraries system, but they also have been enhanced with additional software for visualization, statistical analysis, and data manipulation. As our users deal with more and more data compared to text based resources, adding these computational workstations appeared to be the next logical step. Besides being an open lab for any library patron, this new facility will also be used for small group instruction and demonstrations for new or enhanced library resources. This paper will describe the process for designing the lab, choosing the software, implementation of the facility, and evaluation of its use.

 3 Issues and opportunities associated with federated searching
Grace Baysinger, Swain Library of Chemistry and Chemical Engineering, Stanford University Libraries, 364 Lomita Drive, Organic Chemistry Building, Stanford, CA 94305-5081, Fax: 650-725-2274, graceb@stanford.edu

Library catalogs and databases contain a wealth of information that is not available to Internet search engines such as Google. It can be difficult for users to identify which research tools to use and time-consuming for them to search each resource one at a time. Federated search tools make it possible to search multiple resources with one query. Several strategies have been developed to provide "one-stop shopping" but those dealing with multiple search interfaces face the biggest challenges. This talk will describe a project underway to develop federated searching prototypes on campus and will cover the viability of providing federated search services as well as the interest level by students and faculty in using them.

 4 VIVO: Connecting the disciplines at Cornell
Leah R. Solla, Physical Sciences Library, Cornell University, 293 Clark Hall, Ithaca, NY 14853-2501, Fax: 607-255-5288, lrm1@cornell.edu

Interdisciplinary research is a high priority for Cornell, but the university operates within compartmentalized administrative structures. How can researchers and administrators identify potential collaborators, competitive faculty and student candidates and potential donors outside of their traditional peer groups? When the Cornell University Library became involved in the Genomics and New Life Sciences initiatives at Cornell University, our goal became to develop a tool to present the depth and breadth of research and scholarship at Cornell across the life sciences, independently of Cornell's administrative structure. The result is VIVO, an interactive web site that captures information about the people, programs, classes, publications, and facilities involved in life sciences research at Cornell University. This model has been so successful for this cross-section of the community that the project is now expanding to cover initiatives related to the physical sciences and engineering, social sciences and international research. VIVO: virtual life sciences library, http://vivo.cornell.edu/

 5 Transforming the online catalog through faceted browsing
Andrea Twiss-Brooks, John Crerar Library, University of Chicago, 5730 S. Ellis Ave, Chicago, IL 60637-1403, atbrooks@uchicago.edu

In July 2006, the University of Chicago Library performed a user study that assessed the utility and usability of guided navigation for scholarly research. Based on the overwhelmingly positive results of that study, the Library embarked on a project to select and implement faceted browser technology. While the initial impetus was to extend the functionality of the library catalog, faceted browser (or guided navigation) technologies offer additional exciting possibilities for search and discovery. Metadata for locally created digital collections, archival finding aids, and other resources can be utilized by the faceted browser software. This presentation will present the background leading up to the implementation, early observations about the University of Chicago experience with faceted browser technology, and some discussion of future additions to and developments of faceted browsing.

 6 Improving the usability of the University of Rochester River Campus Libraries' web sites
Susan K. Cardinal, Carlson Science & Engineering Library, University of Rochester, Carlson Library, Rochester, NY 14627, scardinal@library.rochester.edu

At the University of Rochester River Campus Libraries, we believe that it is no longer enough to produce a functional web page or catalog without getting user feedback. We have embedded usability work into our web product design process. I will describe usability methods, some surprising results and how they impacted our designs.

 7 Online social networks: Swiss Army information tools
Gerry McKiernan, Library, Iowa State University, 152 Parks, Ames, IA 50011, Fax: 515-294-5525, gerrymck@iastate.edu

As of July 2007, Facebook (http://www.facebook.com/), a social networking service launched in February 2004, had the largest number of registered users among college-focused sites. There are now more than an estimated 40 million users, an increase of more than 30 million in just over a year. As characterized by Wikipedia, a “social network service focuses on the building … [of] communities of people who share interests and activities, or who are interested in exploring the interests and activities of others …”


We believe that social networking services, such as Facebook, are not only excellent environments to foster and facilitate contact and communication among members of a local community, but also prime venues in which library and librarian services can be more actively and visibly promoted.

This presentation we will provide an overview of Facebook features and describe local and national library outreach projects using Facebook functionalities.

8 Progress toward the bioeconomy: An overview
Samantha Swann, Business Acquisition Editor, John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, United Kingdom, Fax: +44 1243 770432, SSwann@wiley.co.uk

This paper gives an introductory overview to the symposium and updates a previous paper covering the research and information landscapes surrounding sustainable fuels, chemicals and energy. Statistics pertaining to the increasing number of scientific articles in this area will be presented.

9 People are the bioeconomy: Social media for engaged information conversations
Gerry McKiernan, Library, Iowa State University, 152 Parks, Ames, IA 50011, Fax: 515-294-5525, gerrymck@iastate.edu

While a substantial portion of the primary literature on alternative energy is published in scholarly journals, significant research is also reported in conference publications, dissertations, reports, theses, white papers, and other “gray” literature. This presentation will discuss this literature and the efforts of the Iowa State University Library to identify and acquire key gray publications on the production, use, and impact of biofuels, to support university scientific, technical, and sociological research initiatives. Two alternative energy blogs established to promote core publications and resources to a world community will also be profiled. Major current and future research alternative energy projects at Iowa State will be described as well.

The presentation will conclude with speculation on the potential use of social networking services, such as 2collab, chemistry.org/exchange, and the Nature Network to facilitate and support university/industry/government communication, collaboration, and coordination of bioenergy projects.

10 Developments toward the knowledge-based bioeconomy
John Sime, Department of Biology, University of York, York, YO10 5YW, United Kingdom, john.sime@biosciencektn.com

The UK has years of experience in the application of biotechnology to industrial applications but, with the move to an economy that focuses increasingly on sustainable technologies and renewable resources, has been accelerating its efforts. There is also a need to play a significant part in European and global initiatives and ensure appropriate integration with these whilst recognizing that the national drivers may differ.

Supporting existing technologies based upon the application of the biosciences is augmented by an integration of different emerging areas of biotechnology that seeks to develop biorefining beyond the immediate area of biofuels and also to develop novel sources of products derived from the biosciences.

 11 Feedstock to fuel: National Agricultural Library's guide to current research and impacts of cellulosic biofuel production
Michael S. Terborg, National Agricultural Library, National Agricultural Library, ARS USDA, 10301 Baltimore Avenue, Beltsville, MD 20705-2351, mterborg@nal.usda.gov

Ethanol production from cellulosic biomass materials has emerged as a hot topic of national priority over the past few years and will continue to grow in the future. Recent advances in cellulosic conversion to biofuels rests on a complex information foundation. The complexity derives from the multi-disciplinary nature of the work reaching across fields as diverse as the environment, society, agriculture, economics, chemistry and technology. Case studies will explore the pathways and technologies currently in use to access these diverse information resources. Library initiatives on the horizon will showcase how a global information network for biofuels led by the National Agricultural Library would streamline the process for finding this complex information. Easy access to comprehensive information and tools is only possible in the 21st century through collaborative networks between providers and users.

12 Information resources: Finding one's way through the "maize" of the new bioeconomy
William W Armstrong, LSU Libraries, Louisiana State University, Baton Rouge, LA 70803, notwwa@lsu.edu

With the explosion of research and new technologies related to the emerging bioeconomy, the ability to keep up with the relevant burgeoning information resources essential to furthering efficient growth and progress in this area has become increasingly more difficult. This talk will seek to plot a course through this morass and highlight some of the more valuable information resources in this rapidly evolving arena.

13 Patenting the transition to the bioeconomy: Tools for searching
Edlyn S. Simmons, Intellectual Property & Business Information Services, Procter & Gamble Co, 5299 Spring Grove Ave, Cincinnati, OH 45217, Fax: (513)527-6854, simmons.es@pg.com

Innovations for the bioeconomy involve technologies from organic chemistry to organic farming and from industrial engineering to genetic engineering. Patents on these innovations are useful sources of information on technological developments and the institutions that develop them, and it is essential to search patent databases in order to confirm freedom to practice any technologies of interest.

Searching for patents related to the bioeconomy is complicated by the cross-disciplinary nature of most work in the field. This presentation will discuss appropriate patent databases, patent classification codes, and search techniques for finding patents covering biofuels, biopolymers, bio-feedstocks, and other aspects of the bioeconomy.

14 Avogadro's constant: A brief history
Carmen J. Giunta, Department of Chemistry and Physics, Le Moyne College, 1419 Salt Springs Rd, Syracuse, NY 13214-1399, Fax: 315-445-4540, giunta@lemoyne.edu

When Amedeo Avogadro hypothesized, nearly 200 years ago, that "the number of integral molecules in any gases is always the same for equal volumes," neither he nor anyone else had any idea what that number was. This presentation will examine efforts of later investigators to determine the number of atoms or molecules in a given quantity of matter. It will survey approaches to the problem ranging from Loschmidt's 19th-century application of kinetic molecular theory to present-day experiments involving silicon spheres, emphasizing early 20th-century work by Einstein, Perrin, and Millikan. It will also touch on relationships between Avogadro's constant and other constants and units, particularly the kilogram.

15 On the history of the Avogadro constant and implications for defining the mole
Ian Mills, Department of Chemistry, University of Reading, Reading RG6 6AD, United Kingdom, i.m.mills@reading.ac.uk

Currently, NA is the number of entities in 12 g of carbon 12. We would like a new simpler definition, integrating it with the redefinitions of the other base units. To ensure that current proposals to improve the International System of Units (the SI) are widely discussed and understood before any changes are made, such discussions should extend to the US. Under ACS discussion are the mole and its relation to the Avogadro constant. The gram atomic weight of a (pure) substance, later to be known as the mole, grew from the need to calculate the masses of reacting substances without any exact knowledge of the value of the Avogadro constant. This led to the concept of the quantity amount of substance, and its unit the mole, in the 1960s. It also led to the present definition of the mole. We now know the value of NA very precisely leading to recent suggestions that a simpler definition might be adopted, specifying the number of elementary entities that make up a mole, thus fixing the value of NA exactly. Redefining all the SI base units using the values of the fundamental constants of physics as references, will be discussed.

16 Proposing a clear, friendly, redefinition of the mole
Paul J. Karol, Department of Chemistry, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, pk03@andrew.cmu.edu

The mole, the kilogram, and Avogadro's constant are entangled in their current definitions and, unlike other base units, based on artifacts. A study underway by metrologists re-defines these using an exact definition of Planck's constant in conjunction with either the “Josephson constant” or “Klitzing constant” of physics, both unknown or opaque to chemists. Unit definitions play a quintessential role in the ability of beginning students through professional practitioners to understand the quantitative nature of chemistry. Through discussions within the ACS Committee on Nomenclature, Terminology and Symbols, a proposal exists to define the mole as Avogadro's number (not Avogadro's constant) of anything -- exactly 6.0221418 X 1023 (an integer, no uncertainty!) -- from which the kilogram emerges by the long-standing premise that Avogadro's number of 12C has a mass of exactly 12 grams (no uncertainty). The proposal follows the philosophy behind the 1967 definition of the second, the time required for 9,192,631,770 periods of a specified 133Cs radiation. The speed of light is similarly defined as a certain (integer) number of m/s.

17 Quantum electrical units and the new SI: Linking macroscopic to microscopic mass via the "electronic kilogram"
David B. Newell, Quantum Electrical Metrology Division, National Institute of Standards and Technology, 100 Bureau Drive, M/S 8171, Gathersburg, MD 20899-8171, david.newell@nist.gov

The Consultative Committee for Units (CCU) advised the 23rd General Conference on Weights and Measures (CGPM, Sept 2007) that new definitions be adopted for the kilogram, ampere, kelvin and mole to fix the values of h, e, k and NA respectively. Reports from numerous bodies were taken into consideration for this recommendation, including the Consultative Committee for Electricity and Magnetism (CCEM) that unambiguously supports redefining the kilogram to fix h and the ampere to fix e. Their primary argument for this change is that it would bring electrical metrology, which is now universally based on the use of the Josephson effect to measure electrical potential difference and the quantum Hall effect to measure electrical resistance, into line with the SI, through the use of defined values for e and h in the relations KJ = 2e/h for the Josephson constant and RK = h/e2 for the von Klitzing constant. Use of these quantum electrical standards to realize the new definition of the SI unit of mass via a watt balance, known as the “electronic kilogram,” and the implications for the base unit of mole will be discussed.

18 Coordinating digital acquisitions
Janette B. Carver, Chemistry Physics Library, University of Kentucky, 150 Chem Phys Bldg, Lexington, KY 40506, jbcarv1@email.uky.edu

Wonder how merging three science libraries would affect acquisition processes? Going from buying one book at a time world to the digital world where electronic books are often bought as part of packages rather than single item purchases. Digital item acquisitions changes processes for everyone from the individual librarians to the purchasing department of the University. Learn about the experiences of the University of Kentucky Libraries from the Chemistry Physics Librarian's perspective.

19 Electronic science and technology books: Trends in acquisitions and access
Erja Kajosalo, MIT Libraries, Massachusetts Institute of Technology, 14S-134, 77 Massachusetts Ave, Cambridge, MA 02139-4307, Fax: 617-253-6365, kajosalo@mit.edu

Academic science and engineering libraries have many years of experience with acquiring electronic journals and making them easily accessible to their users. Over time, science and engineering scholars have demonstrated a strong preference for electronic editions of journals over print. The business models for acquiring electronic journals are widely understood and reasonably consistent. But the arrival of electronic books has created a complicated family of new acquisitions and access decisions. Electronic reference books form one branch of that family, information technology handbooks another, popular novels a third, historic imprints a fourth, and so on. Systematic publishing of scholarly electronic editions by science and engineering publishers is emerging as a new branch of the electronic book family. This presentation will discuss the many issues, including selection (criteria), funding, cataloging, marketing issues, that librarians face when acquiring scholarly science and engineering electronic books.

20 Management of e-journals and e-books: Information flow to the end users
Arun Kumar, Martin P. Brändle, Armin Müller, and Engelbert Zass, Informationszentrum Chemie Biologie Pharmazie, ETH Zuerich, HCI H 5.3, CH-8093 Zuerich, Switzerland

E-collections offer librarians a significant cost-saving benefit, since not only the shelf space but also the staffing resources to maintain large print collections are expensive. Thus, libraries are quickly migrating from print to electronic media. While the paper-based era for journals has practically come to an end, more efforts in terms of the online availability and acquisitions are needed to enhance the role of e-books in the near future. The purpose of this work is to describe the strategies used, for managing e-collections with focus on e-journals and e-books, at the Chemistry Biology Pharmacy Information Center, ETH Zurich. A much more important issue is to convey the availability of e-collections to the end users. The channels used for this objective are elaborated. Furthermore, publishers' somewhat non-cooperative attitude for their unwillingness to supply improved bibliographic information or to correct the corrupt content is demonstrated.

21 Cheminformatics developments at RECCR: New tools, collaborations and outreach
Curt M. Breneman, Department of Chemistry / RECCR Center, Rensselaer Polytechnic Institute, 110-8th Street, Center for Biotechnology and Interdisciplinary Studies, Troy, NY 12180, Fax: 518-276-4887, brenec@rpi.edu, and N. Sukumar, Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute / RECCR Center, Troy, NY 12180-3590

Since its founding as part of the NIH / NGHRI ECCR program, the Rensselaer Exploratory Center for Cheminformatics Research (RECCR) has emphasized the development of novel descriptors and machine learning methods in an effort to improve the science of predictive Cheminformatics. In order to accomplish this goal, The RECCR community formed (and continues to form) a number of test-bed collaborations to assess and extend the application of these new methods to novel areas. An important collaboration now exists with the SRI MLSCN Screening Center, which will serve as a model for ECCR/MLSCN technology transfer and scientific interactions.

22 Projects in the Michigan Alliance for Cheminformatic Exploration
Kerby Shedden, Department of Statistics, University of Michigan, Ann Arbor, MI 48104, Fax: 734-763-4676, kshedden@umich.edu, and Gus R. Rosania, Department of Pharmaceutical Sciences, University of Michigan College of Pharmacy, Ann Arbor, MI 48103

I will discuss some of the research and educational projects in the Michigan Alliance for Cheminformatic Exploration (MACE). MACE focuses on data-driven approaches for formulating mechanistic hypotheses about small molecule activities in living systems. Such hypotheses may allow mechanistic interpretation of cell-based assay results. We propose that the subcellular transport and localization of small molecules can be important determinants of their activities. To investigate this, we have developed a comprehensive database of associations between experimental measurements of small molecule activity and the expression of biomolecules involved in transport and localization. Moving beyond statistical associations, we have also developed a mechanistic modeling and systems analysis framework for characterizing the accumulation and distribution of small molecules in various subcellular compartments. To close the gap between these methods and cell-based screening data, we have developed novel machine vision methods for analyzing patterns of subcellular distribution of fluorescent probes in high-content screening data.

23 I don't care where my data and methods are: A web-service approach for distributed access to methods, data and models
Rajarshi Guha1, Geoffrey Fox2, Kevin E. Gilbert1, Marlon Pierce3, and David J Wild1. (1) School of Informatics, Indiana University, 1130 Eigenmann Hall, 1900 E 10th Street, Bloomington, IN 47406, rguha@indiana.edu, (2) Indiana University School of Informatics, Bloomington, IN 47408, (3) School of Informatics, Bloomington, IN 47408

In recent years cheminformatics has been enhanced with the public availability and accessibility of methods and chemical and biological data. This talk will highlight recent progress on the development of a web service based infrastructure to provide uniform and distributed access to methods, data and models in a general manner. I will focus on the use of the infrastructure to provide access to a variety of statistical methods and predictive models, highlighting a number of applications based on these services. In addition to providing access to data, it is useful to add value to current data sources. I will present a shape searchable, single conformer, 3D derivative of PubChem allowing one to retrieve similar shaped molecules in 10 to 30 sec depending on similarity cutoffs. I will discuss some approaches that will allow us to include arbitrary numbers of conformers yet still maintain reasonable query times.

24 Carolina ChemBench (C-ChemBench): A web-based cheminformatics expert system for the analysis and prediction of biological screening data
Tongan Zhao1, Christopher Grulke1, Berk Zafer1, Weifan Zheng2, Diane Pozefsky3, and Alexander Tropsha1. (1) Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, tzhao@email.unc.edu, alex_tropsha@unc.edu, (2) Department of Pharmaceutical Sciences, Biomanufacturing Research Institute and Technology Enterprise (BRITE), North Carolina Central University, Carolina Exploratory Center for Cheminformatics Research (CECCR), 1801 Fayetteville Street, Mary M. Townes Science Complex, Room 1256, Durham, NC 27707, Fax: 919-530-6600, wzheng@nccu.edu, (3) Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, pozefsky@cs.unc.edu

The Carolina Center for Exploratory Cheminformatics Research (CECCR) has developed and deployed a prototypic cheminformatics web server called C-ChemBench (http://ceccr.ibiblio.org). It includes modules designed to address the needs of all constituent groups of chemical biology and drug discovery specialists, i.e., computational chemists (Model Development Module), biologists (Predictions Module), chemists (Library Design Module), and bioinformaticians (CECCR Base Module). We shall discuss several cheminformatics-specific data mining and knowledge discovery technologies (such as Quantitative Structure Activity Relationship Modeling) for biological assay data analysis and provide several successful examples of applications. Our technologies (that also rely on distributed computing) afford robust and validated models capable of accurate prediction of properties for molecules not included in the training sets. This focus on knowledge discovery and property forecasting brings C-ChemBench forward as the major data-analytical and decision support cheminformatics server in support of experimental chemical biology research.

25 ChemSpider: Building a structure-centric community for chemists
Anthony J. Williams, ChemSpider, 904 Tamaras Circle, Wake Forest, NC 27587, tony@chemspider.com

Scientists commonly find themselves in a state of overwhelm in regards to the availability of information accessible to them. The distribution of resources now includes the entire space of the worldwide web, access to primary databases such as CAS and, commonly, a plethora of internally developed systems. While the web has provided improved access to chemistry-related information there has not been an online central resource allowing integrated chemical structure-searching of chemistry databases, chemistry articles, patents and web pages such as blogs and wikis. ChemSpider has built a structure centric community for chemists by providing free access to an online database and collaboration tool for chemists. The online database offers an environment for curating the data on ChemSpider as well as the deposition of chemical structures, analytical data and associated information and provides a significant knowledge base and resource for chemists working in different domains. An overview of present and future capabilities will be given.

26 ChemModLab/ChemSpider: QSAR modeling and model-based searching
S. Stanley Young, NISS, PO Box 14006, Research Triangle Park, NC 27709, Fax: 919 685 9300, young@niss.org, and Jacqueline M. Hughes-Oliver, Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, Fax: 919-515-7591, hughesol@stat.ncsu.edu

ChemModLab, written by the ECCR@NCSU consortium under NIH support, is a toolbox for fitting and assessing quantitative structure-activity relationships (QSARs). Its elements are: a cheminformatic front end that computes five types of molecular descriptors for use in modeling; a set of sixteen statistical methods for fitting models; and methods for validating the resulting model. These sixteen statistical methodologies comprise a comprehensive collection of approaches. ChemModLab can produce eighty QSAR models that can be used individually or as the basis for ensembles.

ChemSpider is a web chemistry resource that has over twenty million electronic compounds. These compounds can be used in combination with models from ChemModLab for virtual screening. The resulting paradigm is that screening data is fed to ChemModLab, then the resulting QSAR models are used to virtually screen compounds available from ChemSpider. We present an application of the process and show active compounds from the screening set as well as compounds predicted to be active coming from the ChemSpider collection

27 RoadRunner: A publicly available bioactivity database
Tudor I. Oprea1, Stephen L. Mathias1, Jeremy J. Yang1, and Cristian G. Bologa2. (1) Division of Biocomputing, Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, MSC11 6145, University of New Mexico, Albuquerque, NM 87131, toprea@salud.unm.edu, (2) Division of Biocomputing, University of New Mexico School of Medicine, Albuquerque, NM 87131-0001

Roadrunner is a chemical database screening server from the New Mexico Molecular Libraries Screening Center, NM MLSC (http://screening.health.unm.edu/rrnmmlsc/). It was designed to address all Informatics needs from the Biology, Assay Development, Screening, Cheminformatics and Chemistry components of the NM MLSC. Roadrunner stores well and plate location, master plate volume, associated Screening statistics, and measured and computed biological and chemical properties, for each compound. Roadrunner supports post-HTS analyses (e.g., batch similarity searching via data fusion) and molecular property calculation. The Roadrunner database is implemented in PostgreSQL, an open-source object-relational database management system (http://www.postgresql.org/), and on CHORD from gNova (http://www.gnova.com/). All chemical information in Roadrunner is based on OECHem from OpenEye (http://www.eyesopen.com/). Key features of Roadrunner are simplicity (web-based) and the ability of authorized users to use it from any computer, running any operating system. Multiple field searches include chemical substance/compound searching that can be performed by substructure, chemical property, name/alias, similarity or bioactivity.

28 Cheminformatics in Open Notebook Science
Jean-Claude Bradley, Department of Chemistry, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, bradlejc@drexel.edu

The UsefulChem project, designed to publicly report ongoing research within a research group working on the development of anti-malarial and anti-tumor agents, will be described. The project makes use of free hosted tools as much as possible so that the infrastructure can be easily replicated by other research groups. InChIs, InChIKeys and compound names are used as tags as on blog and wiki pages to facilitate indexing on common search engines. The handling of large libraries and interfacing with online databases is generally accomplished with SMILES lists. Substructure searching and annotation are handled by ChemSpider. JSpecView is used to manipulate JCAMP-DX spectra over a browser interface.


29 ChemXSeer: Cyber-tools for researchers in environmental chemistry
Karl T. Mueller1, Prasenjit Mitra2, C. Lee Giles2, Barbara J. Garrison1, James D. Kubicki3, Susan L. Brantley4, Bingjun Sun5, Ying Liu2, William J. Brouwer1, Shikha Nangia1, and Joel Z. Bandstra6. (1) Department of Chemistry, Penn State University, 104 Chemistry Building, University Park, PA 16802, Fax: 814-863-8403, ktm2@psu.edu, (2) College of Information Sciences and Technology, Penn State University, University Park, PA 16802, (3) Dept. of Geosciences, The Pennsylvania State University, University Park, PA 16802, (4) Department of Geosciences, EESI, The Pennsylvania State University, University Park, PA 16802, (5) Department of Computer Science and Engineering, Penn State University, University Park, PA 16802, (6) Center for Environmental Kinetics Analysis, Pennsylvania State University, University Park, PA 16802

One goal of environmental chemistry is to integrate experimental, analytical, and simulation results performed on systems from molecular to field scales. E-science and cyberinfrastructure have become crucial for scientific progress and we will report here on our development of the ChemXSeer architecture as a portal for academic researchers, especially in the area of environmental chemical kinetics. This system of tools integrates the scientific literature with experimental, analytical and simulation datasets and we intend to offer unique aspects of search not yet present in other scientific search services. For example, we will demonstrate tools for the extraction of tables, figures, equations and formulae from scientific documents enabling users to search on those fields. Ultimately, ChemXSeer intends to provide a wide range of features including full text search; author, affiliation, title and venue search; figure search; table search; formulae search; citation and acknowledgement search; and citation linking and statistics.

30 Can innovation from industry find broader application?
Fangqiang Zhu and Dimitris K. Agrafiotis, Johnson & Johnson Pharmaceutical Research & Development, L.L.C, 665 Stockton Drive, Exton, PA 19341, fzhu2@prdus.jnj.com

The interplay between academia and industry typically involves creative scientific ideas from the academic side and practical applications from the industrial side. However, innovative ideas also originate from industrial settings occasionally. The problem then arises how such innovation can benefit the broader scientific community. In this talk, we present one such example, where the pursuit of certain applications of a novel computational method generated in a commercial company may lie beyond its direct commercial interests. The situation poses several interesting challenges that require creative solutions.

31 Rewards and challenges of academic-industrial collaborations in the area of computational drug discovery
Alexander Tropsha, Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, alex_tropsha@unc.edu

I shall discuss my experiences over the last ca. 15 years in collaborations with various pharmaceutical companies in the areas of computational drug discovery research and graduate education. The rewarding aspects of collaborations included access to real data and projects (certainly under strict confidentiality agreements), intellectually stimulating discussions with industrial colleagues, opportunities to test academic software in industrial settings, and networking that helped in a number of cases with the job placing of graduate students and postdocs. In the area of graduate education, my students benefited in many cases from industrial internships, lectures by industrial colleagues, and their services on students' advisory committees. Challenges included difficulties with data exchange, restrictions on publications, and instability of projects. Still, I shall advocate strongly in favor of finding practical solutions that help increase and improve research partnerships between industry and academia especially in drug discovery.

32 Academic-industrial collaboration in chemoinformatics: Experiences from the UK
Valerie J. Gillet and Peter Willett, Department of Information Studies, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom, Fax: +44 (0) 114 2780 300, v.gillet@sheffield.ac.uk

The Chemoinformatics Research Group at the University of Sheffield has extensive experience of collaboration with industry over many years. Our collaborations have varied from the provision of software and databases via collaborative MSc and PhD projects through to fully funded post-doctoral positions within the research group. In this paper we will provide an overview of the different mechanisms through which collaboration has been achieved, highlight some success stories, and provide our view on the ingredients for success. We will finish with a look to the future.

33 University-industry collaborations: The good, the bad, and the ugly
GM. Maggiora, Pharmacology & Toxicology, College of Pharmacy & BIO5 Institute, 1703 E. Mabel Street, Tucson, AZ 85721, maggiora@pharmacy.arizona.edu

Having spent time in both universities and the pharmaceutical industry, I have seen a number of issues regarding collaborative research from both points of view. Over the last twenty years industrial and university research has changed significantly. This is not only due to revolutionary changes in basic science and technology, but also to changes in the roles that research is playing in both of these regimes and in the way that research is funded there. This raises a number of questions. Is there any value in industry-university collaborations? If so, is this value currently being realized? If not, why not. What role(s) are evolving industry and university research practices playing in current industry-university relationships? What can be done to strengthen industry-university collaborations? In the talk, I will address some of these questions and present my personal view on a number of important issues associated with industry-university collaborations.

34 Both sides now: An intimate perspective on collaborations
Robert D. Clark, Informatics Research Center, Tripos International, 1699 S. Hanley Rd., St. Louis, MO 63144, bclark@tripos.com

Tripos has engaged in several major collaborations over the last decade. In some of these, Tripos played the role of industry supporter and in others it was essentially the academic partner. Sometimes these collaborations have led to the development of commercial software products, sometimes they have advanced science in fairly basic ways, and sometimes they have done both, directly or indirectly. Experience garnered along the way is worth sharing inasmuch as it may shed light on what makes such collaborations succeed and what can make them fail.

35 Networking universities of applied science with small and medium size enterprises: New applications of semantic systems
René Deplanque, Professor, Director, FIZ-CHEMIE Berlin, Franklinstrasse 11, D-10587 Berlin, Germany, Fax: 49-30-39977133, deplanque@fiz-chemie.de

One of the main problems in the area of technology transfer is the different approach which is taken by universities as compared to small and medium-size companies.

Let's take for example the evaluating of the usefulness of a newly found substance. Whereas universities, while analyzing the value of a new compound, are mainly interested in their physical, chemical properties we find that the interest by small and medium-size companies, is by far more on the usage and the market possibilities of this compound.

Because of those different approaches and views it is very difficult to breach the gap in understanding of the priorities' between those two groups.

In this talk, it will be shown how to combine various searches and find approaches within a mash-up server.

Knowing the difference in priorities it is possible to define a ranking which takes into consideration how a problem is approached and how results can be presented to be useful to both sides.

Combining semantic, graphical, full text and qualified list searches it is possible to serve both approaches, without asking the user to learn complex databank languages.

This should increase the possibility of new and successful projects in which the different points of view are to be used to strengthen the project approach.

36 Biofuels production: Process technical and economic characterizations
Richard L. Bain, Biorefinery Analysis Section, National Bioenergy Center, National Renewable Energy Laboratory, 1617 Cole Boulevard, Golden, CO 80401, Fax: 303-384-6363, richard_bain@nrel.gov

A number of technical options exist for biofuels production. A discussion of the major technical pathways will be given covering technology descriptions, biofuel yields, and process economics. Processes to be discussed include corn/wheat ethanol, sugar cane ethanol, cellulosic ethanol (both thermochemical and biochemical), biodiesel, and renewable diesel.

37 Renewable energy and forest biomass supply
John A. Stanturf, Southern Research Station Center for Forest Disturbance Science, US Forest Service Research & Development, Forestry Sciences Lab, 320 Green Street, Athens, GA 30602, Fax: 706- 559-4317, jstanturf@fs.fed.us, Bryce J. Stokes, Resource Use Sciences Staff, US Forest Service Research & Development, Washington, DC 20250-1113, and Marilyn A. Buford, Forest Management Sciences, US Forest Service Research & Development, Washington, DC 20250-1115

Wood is a renewable source of transportation fuels, chemicals, heat, and power. Biobased products from wood often can be used to displace fossil fuel-based products. Wood is an abundant, renewable, home-grown cellulosic resource that can contribute significantly to meeting the national goal of displacing 20% of U.S. petroleum use within ten years (20-in-10). Our Nation's forests are a strategic asset in meeting this goal, and in achieving and enhancing U.S. energy security, economic opportunity, and environmental quality.

Of approximately 6% of U.S. energy needs currently being met through renewable sources, about 47% of that is biomass. Wood currently accounts for about 70% of the biomass energy produced and used in the U.S. According to a recent joint DOE-USDA report, Biomass as Feedstock for a Bioenergy and Bioproducts Industry: The Technical Feasibility of a Billion-Ton Annual Supply, wood residues and wastes alone can sustainably supply about 370 million dry tons per year. Additional analyses show that including feedstocks such as short rotation woody crops and expected advances in cellulosic ethanol conversion offer substantial opportunities for wood-based bioenergy. An overview of the issues and research needed to increase productivity and conversion efficiencies and reduce costs to meet demands for cellulosic material without significant changes in land use or disruption in fiber markets will be presented

38 Developing the new lignocellulosic energy age
AJ. Ragauskas, School of Chemistry and Biochemistry, Institute of Paper Science and Technology, Georgia Institute of Technology, 500 10th Street. NW, Atlanta, GA 30332, Fax: 404-894-IPST, arthur.ragauskas@chemistry.gatech.edu

Since the beginning of the new millennium, we have witnessed an ever-increasing merger of technical, economical and societal demands for sustainable technologies. At the cornerstone of this green industrial revolution is the integrated biorefinery. This is a biomass processing facility that integrates our ability to tailor biomass productivity and processability with conversion processes to produce a range of fuels, power, food, materials and chemicals from biomass. It fully utilizes all components of biomass in proportions that maximizes sustainable, economic development. As such, this vision seeks to develop a new “lignin-carbohydrate economy” that will initially supplement today's petroleum economy and, as these non-renewable resources are consumed, become the primary resource for fuels, chemicals and materials. This presentation will examine recent developments in the conversion of lignocellulosics into nano cellulosic composites, biofuels and enhanced conventional products and how all of these advances are contributing to the development of the New Lignocellulosic Age.

39 Policies and drivers to deliver a viable bioeconomy
David B. Turley, Agricultural and Rural Strategy, Central Science Laboratory, Sand Hutton, York, UK YO10 3BG, United Kingdom, Fax: +44-1904-462111, d.turley@csl.gov.uk

Many bio-based materials typically cost more to produce than their fossil-derived counterparts and others may have technical parameters less well suited to their mode of use. So, what incentive is there to encourage use and development of such materials? The rise of environmental concerns relating both to protection of the natural environment, reducing waste and the desire to reduce human impacts on green house gas emissions has led to a renaissance of interests in the use of plant derived bio-materials for both energy and material products. But how can cost barriers be overcome to encourage greater uptake and delivery of public good? What policies and fiscal incentives have been developed in industrialised economies to directly or indirectly encourage uptake of bio-based technologies and are they effective? Will developing biorefinery concepts, biotechnology and new public policy approaches offer opportunities to address some of these problems?

40 Challenges and strategies of a successful National Biofuels Program
Valerie Sarisky-Reed, Office of the Biomass Program, U.S. Department of Energy Office of Energy Efficiency and Renewable Energy, 1000 Independence Ave. NW, Washington, DC 20585, Valerie.sarisky-reed@ee.doe.gov

This paper will address the technical, political, and infrastructure barriers facing the development of a successful biofuels industry in the U.S., as well as the efforts the Department is taking to overcome them through its Biofuels Initiatives. This discussion will include a description of the goals of the President's Advanced Energy Initiative and 20 in 10 Initiative, current programmatic work, major research and development successes, and planning for the future.

41 Potential impact of climate change on the bioeconomy
Richard M. Cruse, Department of Agronomy, Iowa State University, 3212 Agronomy, Ames, IA 50011, rmc@iastate.edu

The United States Department of Agriculture and United States Department of Energy have established a vision for bioenergy, a vision that will require substantial increases in biomass production and harvest if renewable fuel targets are to be met. Assumptions in this vision do not consider changing climate patterns and these potential impacts on feedstock production volumes, risks associated with production processes, or threats to natural resources required by this industry to produce feedstock required for liquid fuels. This presentation will address the direct effects of changing climate on potential biofuel production and interactions between climate change and other critical biofuel supply chain components sensitive to weather. Different potential biofuel industry configurations and the sensitivity of each configuration to climate change will be discussed.

42 WITHDRAWN: Bioeconomy strategy for CSIRO
Alastair Robertson, Science Strategy and Investment, CSIRO, PO Box 93, North Ryde NSW 1670, Australia, Alastair.Robertson@csiro.au

Abstract not available.

43 Community-based collaborative drug discovery for neglected infectious diseases
Barry A. Bunin, Collaborative Drug Discovery, Inc, 1818 Gilbreth Road, Suite 220, Burlingame, CA 94403, Fax: 650-522-9498, bbunin@collaborativedrug.com, and Sylvia Ernst, Collaborative Drug Discovery, Inc, Burlingame, CA 94010

Case studies from scientists working in secure collaborative groups to rapidly develop drug candidates for commercial and humanitarian markets will be presented. The first case study involves overcoming drug resistance which is the major problem for malaria. New approaches that allow scientists working together to develop new drugs faster are desperately needed. The discovery of alternatives to Verapamil, a known chemosensitizer to overcome both tumor and malaria resistance, will be presented using novel collaborative drug discovery technologies to help specialists work together in a global network. A detailed example showing how chemosensitizers addressing chloroquine resistance can be identified combining results from the University of Cape Town (South Africa) with structurally related compounds from the University of California at San Francisco (USA) and similar FDA/Orphan (courtesy Dr. Lipinski) approved drug compounds will be presented. This new collaborative technology allows researchers to build up networks of technical experts around therapeutic or target areas thus facilitating discovery of new drug candidates. It allows scientists to speed up the research by simultaneously sharing unpublished data in the race to overcome drug resistance. The community-based platform is currently being used openly to help develop new treatments for neglected infectious diseases such as malaria, Chagas Disease, and African Sleeping Sickness and securely against commercial cancer targets.

44 Experiences with knowledge and data sharing at Lhasa Limited
Philip N. Judson, LHASA Ltd, Department of Chemistry, University of Leeds, Leeds LS2 9JT, United Kingdom, Fax: +44 (0) 113 343 6535, judson@dircon.co.uk

Lhasa Limited was originally set up by organisations wanting to share knowledge about chemical synthesis. Over the years, the sharing has broadened to cover toxicology, mammalian metabolism, and environmental biodegradation, and to include data as well as knowledge. In these fields, chemical and biological information are closely associated and chemoinformatics cannot be confined to the realm of chemistry. Problems arise over standardization of terms, methods of assessment, and interpretation of data. Issues over confidentiality and security have to be addressed and there are tensions between the needs of academic groups to publish and those of industrial groups to protect IPR. This talk will present examples of problems that have been solved and some technical challenges that remain.

45 Safe exchange of chemical information: Not "safe" enough?
Tudor I. Oprea, Division of Biocomputing, Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, MSC11 6145, University of New Mexico, Albuquerque, NM 87131, toprea@salud.unm.edu

An ACS symposium dedicated to the safe exchange of chemical information (SECI) was held in San Diego in Spring 2005; of the sixteen speakers, eleven contributed papers to a special issue of J. Comput.-Aided Mol. Des. (JCAMD; Nov 2005). Despite the scientific enthusiasm, as well the potential solutions proposed, the SECI initiative was not well received in industrial circles. Indeed, two years later, there is no follow-up from a business or scientific perspective. A number of the JCAMD papers offered solutions (and examples) and illustrated how SECI is possible. However, it was perception, not fact, that resulted in a general lack of interest from potential industrial partners. This paper will highlight some of the potential solutions, and off-the-record responses from potential industrial partners, with respect to the safe exchange of chemical information.

46 Using SemanticEye and FOAF to add value to the scientific collaboration process
Omer Casher, Information Architecture and Engineering, GlaxoSmithKline, New Frontiers Science Park, Third Avenue, CM19 5AW, Harlow, United Kingdom, omer.2.casher@gsk.com, and Henry S. Rzepa, Department of Chemistry, Imperial College London, London SW7 2AZ, United Kingdom

We introduce the idea of combining metadata embedded within electronic publishing resources with user profiles in Web 2.0 resources (blogs, wikis and social networking sites), to create a single entry point for enhancing scientific collaboration. Enabling such activities is one of the goals Web 3.0, which represents the next evolutionary step in the Web cycle. It is predicated on the adoption of a Semantic Web approach to content management. Here, we present a proof of concept featuring two tools;

1. SemanticEye (DOI: 10.1021/ci060139e) a lightweight ontology of chemical electronic publishing metadata which enables the location of articles by the same author or containing the same molecule (as determined using InChi identifiers),

2. FOAF, the “friend-of-a-friend” RDF vocabulary which we propose be used for the social networking aspects.

FOAF allows semantic expression of an individual's publications, public collaborations and activities. FOAF aggregators are software agents which are used to build up a network of “friends” akin to the goals of social networks such as open-Notebook Science and LinkedIn, but in an automated fashion. Rather than rely on an individual creating and maintaining their own static FOAF, we propose a more dynamic approach to the FOAF metaphor, in which SemanticEye is used to output a FOAF serialization of its ontology by querying it with SPARQL, the RDF query language. FOAF information from other sources, such as our Scientific seminar RDF/RSS database (DOI: 10.1021/ci0504115). Institutional or departmental digital repositories can similarly be queried and then aggregated with the SemanticEye FOAF. This FOAF aggregation could feed into any social network which is RDF/FOAF compliant, and then subjected to further SPARQL queries to create a semantic fusion between molecules, publications and scientific collaborations.

47 Wavelet based search prefilters for spectral library matching
Barry K. Lavine, Nikhil Mirjankar, and Kadambari Nuguru, Department of Chemistry, Oklahoma State University, 107 Physical Science, Stillwater, OK 74078, Fax: 405-744-6007, bklab@chem.okstate.edu

There is renewed interest in library matching of IR data. However, a concern in the use of library spectra for identification is the degree to which a search is truly interpretative. Most search algorithms treat a spectrum as a set of points. Band shifting is not handled well and bands of low intensity are usually ignored. Using a search prefilter, all of these problems can be addressed. The development of a search prefilter for carboxylic acids is presented. Using the wavelet packet transform, library spectra are passed through two scaling filters: a high pass filter and a low pass filter. The decomposition process which utilizes wavelet coefficients that represent the high and low frequency components of the signal are iterated through successive wavelet packets until the required level of signal decomposition is achieved. A genetic algorithm for pattern recognition analysis identifies wavelet coefficients characteristic of functional group.

48 Fragment activity comparison tool
Stephen R Johnson1, Brian L. Claus1, Olafur Gudmundsson2, Paul A. Elzinga2, Gerry Everlof2, and Michael J. Hageman2. (1) Computer-Assisted Drug Design, Bristol-Myers Squibb, P.O. Box 4000, Princeton, NJ 08543-4000, stephen.johnson@bms.com, (2) Discovery Pharmaceutics, Bristol-Myers Squibb, Princeton, NJ 08543-4000

The optimization phase of a medicinal chemistry project typically progresses through the monitoring of activity changes that result from synthetic modifications of a lead compound. Over time, these molecular transformations have been applied repeatedly to countless different scaffolds directed against many different therapeutic targets. The fragment activity comparison tool (FACT) is an in-house developed computational tool to help guide the medicinal chemist through all known molecular transformations and their effect on activity. These molecular transformations were identified using a brute force search of internally synthesized compounds. Using several ADME/PK activities and properties as examples, this work highlights the potential impact of utilizing data mining to extract maximum value from a company's intellectual property.

49 Modeling a touch of freshness: Developing a QSPR model for amine-assisted perfume delivery in laundry detergent
David T. Stanton1, Johan Smets2, Marc Van de Walle2, An Pintens2, Sofie Van de Velde2, and Rafael Trujillo3. (1) Procter & Gamble, Miami Valley Innovation Center, 11810 East Miami River Road, Cincinnati, OH 45252, stanton.dt@pg.com, (2) Procter & Gamble, Strombeek-Bever, Belgium, (3) Procter & Gamble, Cincinnati, OH

It is difficult to understand the impact of all the possible interactions in the laundry process on the delivery of perfume ingredients from laundry detergent formulation to the cleaned and dried fabric. There are several partitioning events, some form of a drying event, and the eventual storage of the fabric prior to use. However, the ability to predict perfume delivery for this process can have a significant effect on the design of fragrances that are well delivered. We will present the results of work done using ADAPT to develop a quantitative structure-property relationship (QSPR) model for the delivery of perfume ingredients in a through-the-wash application.

50 QSAR at the undergraduate institution and a model of air-to-blood partition coefficients for small organic molecules
Nathan R. McElroy and Sean D. Smith, Department of Chemistry, Indiana University of Pennsylvania, 143 Weyandt Hall, 975 Oakland Avenue, Indiana, PA 15705, Fax: 724-357-2437, nathan.mcelroy@iup.edu

The majority of advances in chemoinformatics, both in basic theory and application, tend to come from research-oriented universities (i.e., Ph.D. granting institutions) and workgroups within industry (e.g., CADD groups). In smaller academic institutions, such as the primarily undergraduate or Masters-granting universities, scholarly activity and output in one's discipline remain important components in tenure and promotion, despite some obvious limitations. The teacher-scholar approach in chemoinformatics is discussed, with focus on modeling the air-to-blood partition coefficient (log K) in a small set of organic compounds. Data for 130 compounds with measured log K values in rats (range = -2.90 to 3.23; mean = 1.78 ± 2.3) and data for 66 compounds with measured log K values in humans (range = -0.36 to 4.52; mean = 3.14 ±3.7) are encoded by molecular structure features and examined with Bayesian neural networks that employ automatic relevance determination.

51 Adapting in an ABCD world
Edward P. Jaeger, Information Technology, Research & Early Development, Johnson & Johnson Pharmaceutical Research & Development, L.L.C, 665 Stockton Drive, Suite 104, Exton, PA 19341, ejaeger@prdus.jnj.com

This talk will highlight recent developments in ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. ABCD is an attempt to bridge multiple continents, data systems and cultures using modern information technology, and provide scientists with tools that allow them to make informed, data-driven decisions. The first phase of ABCD focused on decision support (data warehousing, retrieval, analysis and visualization) and met with great success, becoming an indispensable tool for more than 1,200 users across all J&JPRD research sites. The system consists of two major components: a data warehouse, which combines data from multiple chemical and pharmacological transactional databases, designed for supreme query performance and a state-of-the-art application suite, which facilitates data upload, retrieval, mining, and reporting. Chemical intelligence, performance, and analytical sophistication lie at the heart of the new system, which was developed entirely in-house. ABCD has delivered on its promise of simplifying data assembly, delivery, comparison and decision-making. It has also driven business process change create more consistent and better-documented data for discovery analysis. We have now embarked on the development of a new global transactional system that will replace the legacy operational data stores. This presents us with several compelling advantages: an ability to create a common ontology used across the transactional and decision support layers, a simpler, more streamlined and more robust ETL, and a radically different end-user experience through the use of a single, unified application front-end. Indeed, ABCD is the only system of its kind that will utilize a common framework for the entire discovery data life cycle, including processing, upload, mining, analysis, visualization and reporting.

52 Bias data fusion with turbo search to improve chemical similarity searching
Jenny Wan-Chen Chen, Department of Information studies, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom, Fax: 00 44 (0)114 278 0300, lip01jwc@shef.ac.uk, John Holliday, Department of Information Studies, University of Sheffield, Sheffield S1 4DP, and John Bradshaw, Daylight Chemical Information Systems Inc, Cambridge, CB3 0AX, United Kingdom

In this presentation, we first conclude that four coefficients: Forbes, Simple Matching, Tanimoto and Russell-Rao; are the most suitable coefficients to use in data fusion in the context of chemical similarity searching due to the complementary nature of their individual performances. Second, we implement a systematic approach to find the best weightings for each of the four coefficients for use in data fusion. The approach uses the turbo similarity search methodology in the training and testing stages. All three fusion-rules are studied: MIN, MAX and SUM. Our testing results show that, using MIN bias data fusion, an average improvement rate of 49.2% over the industrial standard, Tanimoto coefficient, can be achieved; using MAX bias data fusion with turbo search, this average improvement rate over Tanimoto is 24.5%; and using SUM bias data fusion with turbo search, an average improvement rate of 27.6% over Tanimoto can be achieved.

53 Structure generation using reaction vectors
Hina Patel1, Valerie J. Gillet1, Beining Chen2, and Michael Bodkin3. (1) Department of Information Studies, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom, Fax: + 44 114 2780300, lip05hp@sheffield.ac.uk, (2) Department of Chemistry, University of Sheffield, Sheffield S3 7HF, United Kingdom, (3) Eli Lilly UK, Windlesham GU20 6PH

A number of de novo design tools have been described with the aim of generating novel molecules for drug design, however, they are limited in their ability to propose molecules which are synthetically feasible. Here we describe how reaction vectors can be used for the design of synthetically accessible novel molecules.

Broughton et al. [1] have recently described the reaction vector which captures the changes that take place at the reaction centre, without the need for complex reaction mapping procedures. The individual components of a reaction are described by vectors (such as atom pairs) and the overall reaction vector is generated using:

Reaction Vector = [Sum of product vectors] – [Sum of reactant vectors]

Here we show how reaction vectors can be used in both simple transformations involving, for example, a simple functional group substitution, to more complex multi-component reactions of the form (R1 + R2 → P1 + P2), to generate novel molecules for synthesis. Using a ‘cleaned' reaction dataset and a set of reagents, we demonstrate the application of the algorithm to the design of known drugs from simple starting materials, via mixing and matching of reaction transforms and reactants.

We also describe the incorporation of the method into KNIME (Konstanz Information Miner) [2], a platform for creating visual data flows, in the aim of developing an automated multi-objective application for de novo design.

[1] Broughton, H. B. et al. Methods for Classifying and Searching Chemical Reactions. United States Patent Application 367550 (2003). 25 Sept.

[2] www.knime.org

54 WITHDRAWN: Computing IUPAC names using Chemaxon nomenclature tools
Daniel Bonniot, ChemAxon, Budapest, Hungary, dbonniot@chemaxon.com

ChemAxon is providing tools to generate the IUPAC names of chemical structures. We illustrate the possibilities of these tools, using concrete examples covering different nomenclatures and complex cases. We consider the differences between traditional and preferred IUPAC nomenclature and the options to handle both. We present the different ways these tools can be used, from real-time, interactive naming of drawn structures to batch naming and automatically computed names in databases. Finally, we evaluate the naming rate and the quality of generated names using both human expert analysis and automated methods.

55 OSRA: Using open source optical structure recognition software to recover chemical information
Igor V. Filippov, Laboratory of Medicinal Chemistry, SAIC-Frederick, Inc., NCI-Frederick, 376 Boyles St, Frederick, MD 21702, igorf@helix.nih.gov, and Marc C. Nicklaus, Laboratory of Medicinal Chemistry, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, Frederick, MD 21702

Until recently most scientific and patent documents dealing with chemistry have described molecular structures either with systematic names or with graphical images of Kekulé structures. The latter method poses inherent problems to the automated processing that is needed when the number of documents ranges in the hundreds of thousands or even millions since such graphical representations cannot be directly interpreted by a computer. To recover this structural information which is otherwise all but lost we present an open source tool built on modern advances in image processing - OSRA. OSRA can read a document in any of the over 90 graphical formats parseable by ImageMagick - including GIF, JPEG, PNG, TIFF, PDF, PS - and generate the SMILES representation of the molecular structure images encountered within that document.

56 Pattern vectors for feature extraction in large scale datasets
Kailin Tang, Tongji University; Shanghai Center for Bioinformation Technology, Shanghai, China, tangkailin@hotmail.com, and Tonghua Li, Department of Chemistry, Tongji University, Shanghai, China

One of the main objects of analyzing cancer related gene expression data or SELDI-TOF data is to distinguish normal and tumor samples. Here a novel method combined pattern vectors and kernel PLS algorithm is presented and applied to the recognition of the expression patterns and the biomarkers. In this method, numerical data are transformed to sequence like data. The pattern of tumor and normal are calculated respectively. Then pattern covers are calculated to find the patterns which could cover the whole data and the pattern are degenerated. KPLS is used to classify. We compare our algorithm with other popular classification methods such as support vector machines and so on. This study demonstrates the potential applications of this algorithm for tumor diagnosis and the identification of candidate targets for therapy.

57 Comprehensive framework of chemoinformatics
Johann Gasteiger, Computer-Chemie-Centrum and Institute of Organic Chemistry, University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen 91052, Germany, Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de

The term chemoinformatics first appeared 10 years ago but the field has a long history that can be traced back more than 40 years. Most of the applications of chemoinformatics are focused on drug design where it has established its importance in making – in concert with bioinformatics – drug design more efficient. However, chemoinformatics can assist in solving problems in all areas of chemistry and these applications will find increasing development in the future.

58 NIH Roadmap data: New possibilities for computer-aided drug discovery
Vladimir V. Poroikov1, Dmitry Filimonov1, and Marc C. Nicklaus2. (1) Russian Academy of Medical Science, Institute of Biomedical Chemistry, Pogodinskaya Str., 10, Moscow 119121, Russia, Fax: 007-095-245-0857, vladimir.poroikov@ibmc.msk.ru, (2) Center for Cancer Research, National Institutes of Health, National Cancer Institute, Frederick, MD 21702

The NIH Roadmap is paving the way to a "chemo-information superhighway" based on the wealth of data on structure and biological activity of hundreds of thousands of drug-like molecules. To transform these data into information & knowledge, special computational tools are needed. The computer program PASS (http://www.ibmc.msk.ru/PASS), which predicts over 3000 kinds of biological activity for chemical compounds with an average accuracy of ~95%, is presented as one such tool. We describe the results of PASS training based on the NIH Roadmap data, separately or combined with the standard PASS training set. This provides the possibility to select the most prospective compounds among other (e.g. commercial) sample collections available for testing in the appropriate assays. Predictions obtained with the standard version of PASS can be used for interpretation of NIH Roadmap screening data, e.g. to correlate them with particular pharmacotherapeutic applications. In general, analysis of NIH Roadmap data with PASS provides information about multiple structure-activity and activity-activity relationships, which could be further used for creating new more-effective and safer medicines.

59 Combining direct and indirect strategies in computer-assisted drug design
Ferran Sanz, Cristina Dezi, Jana Selent, and Manuel Pastor, Research Unit on Biomedical Informatics (GRIB), IMIM, Universitat Pompeu Fabra, Passeig Marítim 37-49, 08003 Barcelona, Spain, Fax: 34 93 224 0875, ferran@imim.es

We will review the potential and challenges of combining direct (target-based) and indirect (ligand-based) methodologies in computer-assisted drug design. The discussion will be exemplified with the study of a series of butyrophenones showing the pharmacological profile of atypical antipsychotics (Dezi, C. et al; J. Med Chem 2007; 50: 3242-55). The discussion will include aspects related with the use of receptor homology models in docking simulations and the use of the resulting ligand alignments in the development of 3D-QSAR models. We will also discuss the need of going beyond the consideration of the affinities for a single biological target in the development of structure-activity models, by affording the multi-receptor pharmacological profiles responsible for the therapeutic action and the adverse side-effects.

60 Activity profile browsing using target affinity maps
G. Wolber1, Goekhan Ibis1, Fabian Bendix1, and Thierry Langer2. (1) Computer Science Group, Inte:Ligand GmbH, Mariahilferstrasse 74B/11, 1070 Vienna, Austria, wolber@inteligand.com, (2) Inte:Ligand GmbH, 2344 Maria Enzersdorf, Austria

Virtual screening using 3D pharmacophores has been established as an important and commonly used technique for virtual screening. Recent approaches combine sequential screens of several models against small libraries into a multi-target screening protocol. Mining results from cross-target screening is a work-intensive and explorative task. The presented framework visualizes and categorizes the results as quickly and directly perceivable activity maps. These maps can then be used to identify the activity scope of one molecule or a set of molecules at one sight. This allows for the identification of unwanted biological activity and minimizing off-target effects. Our activity maps are enhanced for interactive use with linking and brushing techniques for directly linking molecule lists to target points on the map. The power of visualization and human exploration abilities are put together to solve the crucial task of mining drug candidates to quickly identify those compounds that show the most promising activity profiles.

61 Molecular selectivity index for ligand based drug design
David Marcus and Amiram Goldblum, Department of Medicinal Chemistry, Hebrew University of Jerusalem, School of Pharmacy, Sudarsky Center for Computational Biology, Jerusalem 91120, Israel, mdavid@pob.huji.ac.il, amiram@vms.huji.ac.il

Lack of selectivity is a source of side-effects of drugs. Recently, we developed a "Molecular Bioactivity Index" (MBI) to predict the chance of any molecule to bind to a specific target, and extend it now to produce selectivity indexes to distinguish between similar targets. MBI is based on our algorithm that can find optimal solutions in highly complex combinatorial problems, such as feature selection and model building using 2D descriptors. The ensemble of best models can be used to form an index which scores a compound by its ability to pass as an active in each model. We have applied our methods to build models which could extract with high confidence compounds that are active on a certain target and not the others, by calculating their Molecular Selectivity Index (or MSI). This method was applied to investigate the selectivity among the Matrix Metalloproteinases (MMPs). We also screened several compound databases such as ZINC to locate putative selective compounds.

62 Properties of ensemble models for supervised learning
Jacqueline M. Hughes-Oliver, Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, Fax: 919-515-7591, hughesol@stat.ncsu.edu

Ensemble models, where output from many individual models are combined to yield an overall conglomerate model, have gained popularity in multiple areas of chemistry. The method called Random Forests (RF) has been shown to be highly effective for predicting biological activity in many applications. RF is a family ensemble model because it uses base learners created from the same underlying mechanism, a recursive partitioning decision tree. While generally effective, RF can have poor performance when the training set is highly unbalanced. This is often the case for applications regarding quantitative structure-activity relationships, where the percent of active compounds can be very small. For such applications, we study the properties of family ensemble models and make recommendations for obtaining improved performance, especially when false negatives are considered to be much more costly than false positives.

63 Ligand based virtual screening identifies CAR nuclear receptor activators and active opioid receptor molecules
Sharon D. Bryant1, Josh Dekeyser2, Curt Omiecinski2, Ewa Marczak2, and Lawrence H. Lazarus3. (1) Laboratory of Pharmacology and Chemistry, National Institute of Environmental Health Sciences (NIEHS), PO Box 12233, MD: B3-05, Research Triangle Park, NC 27709, bryant2@niehs.nih.gov, (2) Dept. of Veterinary Science, Center for Molecular Toxicology and Carcinogenesis, Pennsylvania State University, University Park, PA 16802, (3) Peptide Neurochemistry, NIEHS, Research Triangle Park, NC 27709

Ligand-based virtual screening was utilized to identify active molecules targeted for splice variants of the nuclear constitutive androstane receptor (CAR) and G-protein coupled opioid receptors. Pharmacophores were derived from known active molecules using LigandScout. CAR pharmacophores were generated based on shared or merged features of CITCO, meclizine, pregnane and clotrimazole. Models for opioid ligands were developed using pharmacophores of 2',6'-dimethyltyrosine (Dmt), 1,2,3,4-tetrahydroisoquinoline carboxylic acid (Tic) and low energy structures of Dmt-Tic-Bid (1H-benzimidazole-2-yl) derivatives. The pharmacophores were deployed as 3D-search queries using Catalyst to screen several chemical databases.

Two hits from the NCI database activated CAR2 in the biological assays. An opioid pharmacophore identified three active hits; one exhibiting high ƒÝ-affinity (KiƒÝ = 0.075 nM). Pharmacophore development and virtual screening methods offer a feasible and effective approach to identify unique molecules relevant for activating CAR splice variants and an alternative approach to identify pharmacophores for virtual screening when bioactive ligand conformations and receptor binding sites are unknown.

64 High performance robust datamining for cheminformatics
Geoffrey Fox1, Seung-Hee Bae1, Rajarshi Guha2, Marlon Pierce3, Xiaohong Qiu2, David J Wild2, and H. Yuan2. (1) Indiana University School of Informatics, Bloomington, IN 47408, gcf@indiana.edu, (2) School of Informatics, Indiana University, Bloomington, IN 47406, (3) School of Informatics, Bloomington, IN 47408

We describe a suite of parallel data mining algorithms applied to Cheminformatics and Bioinformatics. The algorithms are packaged as services that execute in parallel on multicore clusters obtaining high efficiency on large problems. The main implementation language is C# but the performance analysis includes other languages (C,C++). Initially the suite includes clustering, mixture models and a variant of GTM (Generative Topographic Mapping) for visualization. We use deterministic annealing introduced by Durbin for TSP, Rose for clustering and Ueda and Nakano for mixtures to lessen chance of being trapped in local minima. This approach allows number of clusters and mixtures to be determined by dataset and not specified a priori. The presentation presents initial applications and detailed performance results.

65 Sugar crops as feedstocks for the biofuels industry
Edward P. Richard Jr., Thomas L. Tew, Robert M. Cobill, and Anna L. Hale, USDA-ARS Sugarcane Research Laboratory, 5883 USDA Road, Houma, LA 70360, erichard@srrc.ars.usda.gov

Sugar cane is one of the most efficient of the C4 grasses in converting sunlight into biomass – biomass that includes a high percentage of sugar that can be easily converted to ethanol with current technology. Three high fiber sugar cane varieties were released in 2007 as dedicated “energy cane” crops where energy, as opposed to raw sugar, is the desirable end product. Based on the average of four yearly fall harvests, the three varieties produced soluble solid and dry fiber yields of 13.0 and 15.9 Mg/ha, respectively. In an attempt to reduce storage costs at biorefineries, tall-growing sweet sorghums are being evaluated as a companion crop to sugar cane. When planted in the early spring and harvested in the late-summer prior to sugar cane harvest, soluble sugar and dry biomass yields of 8.1 and 15.7 Mg/ha were obtained with these sorghums. In an effort to further expand both the feedstock delivery season and the geographic range of adaptation of sugar cane, additional varieties of energy cane with higher levels of cold tolerance and higher fiber yields are being developed.

66 Modifying the corn genome to meet the US biofuel agenda
Mariam Sticklen, Department of Crop and Soil Sciences, Michigan State University, East Lansing, MI 48824, stickle1@msu.edu

In order to be converted into ethanol, lignocellulosic biomass must be pretreated to remove lignin and allow enzyme access to polysaccharides, which are then broken down into fermentable sugars – both costly processes. This talk documents the progress made in down regulating the crop lignin biosynthesis pathway to reduce the need for pretreatment, and presents results on modifying the corn genome to produce the enzymes needed to convert cellulose into fermentable sugar within the corn biomass. The enzymes used include the thermophilic Acidothermus cellulolyticus E1 endo-cellulase, the fungal Trichoderma reesei (CBH1) exo-cellulase, and the rumen microbial Butyrivibrio fibrisolvens H17c beta-glucosidase. The transgenic corn plants produce these enzymes only in their leaves and stalk, and store them in sub-cellular compartments. Compartmentalization of the hydrolysis enzymes prevents interference with cytoplasmic activities and also stops enzymes from breaking down the cell wall polysaccharides before the crop is harvested for conversion. It can also increase the level of production of enzymes in each plant.

67 Sugar yields from pretreatment and enzymatic hydrolysis of corn stover and poplar
Charles E. Wyman, Chemical and Environmental Engineering, University of California, Riverside, CA 92521, cewyman@engr.ucr.edu, Bruce E. Dale, Dept of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, Richard Elander, National Bioenergy Center, National Renewable Energy Laboratory, Golden, CO 80401, Mark Holtzapple, Chemical Engineering Department, Texas A&M University, College Station, TX 77843, Michael R Ladisch, Laboratory of Renewable Resources Engineering Department, Purdue University, West Lafayette, IN 47907-2022, Y. Y. Lee, Department of Chemical Engineering, Auburn University, Auburn University, AL 36849, Colin Mitchinson, Genencor International, Palo Alto, CA 94304, and John N. Saddler, Dean of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada

Pretreatment is essential to high yields and low costs for biological processing of cellulosic biomass to fuels and chemicals. A team experienced in biomass hydrolysis formed a Biomass Refining Consortium for Applied Fundamentals and Innovation (CAFI) to develop comparative data for leading pretreatment options of ammonia expansion, aqueous ammonia recycle, controlled pH, dilute acid, lime, and sulfur dioxide steam explosion using shared feedstocks, enzymes, procedures, and analytical methods. Commons sources of corn stover and poplar wood were employed as feedstocks to determine the effect of considerably different classes of cellulosic biomass on performance. Comparative data were developed on sugar recovery for each feedstock during pretreatment and after digestion of the solids produced using different loadings and combinations of enzymes supplied by Genencor International. Standard protocols were applied to close material balances based on this data and used to determine overall sugar release from each combination of pretreatment and enzymatic hydrolysis. All pretreatments recovered high yields of sugars from the hemicellulose and cellulose in corn stover, although high pH technologies tended to be somewhat more effective. However, yields were much more variable for applications of the same technologies to poplar wood, and significant performance differences were observed among these pretreatments for the same poplar variety from two different locations. Overall, these results show the importance of linking selection of pretreatment technology with the feedstock used.

68 Potential and perspectives of polymers produced by biotechnology
Alexander Steinbüchel, Institut für Molekulare Mikrobiologie und Biotechnologie, Westfälische Wilhelms-Universität Münster, Corrensstraße 3, Münster D-48149, Germany, Fax: +49-251-8338388, steinbu@uni-muenster.de

Bacteria are capable of synthesizing a wide range of biopolymers with interesting properties. Since they produce these biopolymers from renewable resources and since some of these biopolymers could replace synthetic polymers produced from fossil resources, they are of high value for transition to the bioeconomy. Polyesters and polyamides produced by bacteria belong to such biopolymers. Polyhydroxyalkanoates (PHA) are relatively well investigated. They give thermoplastic or elastic water-insoluble materials that could be used for biodegradable and compostable packaging materials, various disposables and resorbable materials in medicine and pharmacy. Depending on the composition, the material properties of PHAs resemble those of polypropylene or natural rubber with regard to their properties. In contrast, polyamides are polymers that are at least at certain pH values soluble in water. They are applicable as antibiotic substances, as super absorbing materials or could be used for detergent applications. In case of the latter, they are useful as a substitute of polyacrylic acid. Furthermore, biopolymers must not necessarily be biodegradable. Our laboratory developed recently a process to produce by microbial fermentation polythioesters (PTE). A recombinant strain of Escherichia coli expressing a non-natural pathway was engineered for this purpose. So far, no enzyme or microorganism could be detected, which degraded PTEs and which used it as a carbon and energy source for growth. This is the first example of a persistent biopolymer. Considering the fact that persistent polymeric materials are frequently used in construction, PTEs offer the possibility to produce from renewable resources also non-biodegradable materials.

69 Growth-arrested corynebacteria as whole-cell biocatalysts for biofuel production
Hideaki Yukawa, Microbiology Research Group, Research Institute of Innovative Technology for the Earth (RITE), 9-2 Kizugawadai, Kizugawa 619-0292, Japan, Fax: +81-774-75-2321, mmg-lab@rite.or.jp

Worldwide attention is currently focused on bioethanol production from viewpoints of global warming prevention and energy security enforcement. However, feedstock for current bioethanol production processes comprises food crops, which will be in limited supplies in the near future. Therefore, there is a pressing need to use abundant lignocellulosic biomass, some obtained from inedible parts of food crops, is of as demand for bioethanol proliferates.

Corynebacterium glutamicum has widely been used in industrial microbial production of amino acid and nucleic acid. We constructed ethanologenic C. glutamicum strains to demonstrate ethanol production. Using these strains, RITE-bioprocess which enabled the cells to produce various substances at high volumetric productivity under growth-arrested conditions, with cells of the strain packed in a reactor at high density. Growth-arrested conditions were implemented by oxygen deprivation, Now, we are trying to improve the process for ethanol production from mixed sugars containing hexose and pentose sugars derived from lignocellulosic biomass.

70 Anaerobic fermentation of glycerol in Escherichia coli: A new path to biofuels and biochemicals
Ramon Gonzalez, Departments of Chemical & Biomolecular Engineering and Bioengineering, Rice University, MS-362, P.O. Box 1892, Houston, TX 77251-1892, Fax: 713-348-5478, Ramon.Gonzalez@rice.edu

The production of chemicals and fuels via microbial fermentation has been largely based on the use of sugars as carbon sources. This trend could change in the near future due to the large surplus of glycerol generated as inevitable by-product of biodiesel fuel production. Glycerol is not only abundant and inexpensive but also a highly reduced molecule, which offers the opportunity to produce fuels and (reduced) chemicals at yields higher than those obtained with the use of common sugars. Fully realizing this potential, however, would require the anaerobic metabolism of glycerol in the absence of external electron acceptors. Unfortunately, anaerobic fermentation of glycerol is restricted to a small group of microorganisms, many of them not amenable to industrial applications. For example, E. coli and S. cerevisiae, considered workhorses of modern biotechnology, are thought to metabolize glycerol only via respiration. However, we have discovered that E. coli can fermentatively metabolize glycerol when cultivated under appropriate conditions. We have demonstrated the fermentative nature of this process along with the role of different fermentative pathways. A novel trunk pathway responsible for glycerol conversion into glycolytic intermediates was identified. Based on our findings, we propose a new paradigm for the 1,3-PDO-independent fermentation of glycerol in enteric bacteria in which trunk and auxiliary pathways work in partnership to attain redox balance. Our current work focuses in the use of the knowledge base created by the aforementioned studies to engineer E. coli and other microorganisms for the production of fuels and chemicals from crude glycerol. We will present at the meeting our latest results in this area, including the development of biocatalysts for the production of ethanol, hydrogen, formic and succinic acids, among other products.

71 Chemoinformatics: Recognition through teaching
Alexandre Varnek, Laboratoire d’Infochimie, Louis Pasteur University, 4, rue B. Pascal, Strasbourg 67000, France, Fax: +33-3-88416104, varnek@chimie.u-strasbg.fr

Chemoinformatics is not still recognized as a scientific field by the ministries of education and other governing academic organizations in most European countries. One of the ways to raise its public awareness is to integrate chemoinformatics into chemistry curricula. Unlike most of European universities where the topics of chemoinformatics are a part of courses in medicinal or pharmaceutical chemistry, at the Louis Pasteur University of Strasbourg (ULP) it is taught as an individual discipline at the undergraduate (licence), graduate (master) and postgraduate (PhD program) levels. The “Introduction to Chemoinformatics” (32h of lectures and tutorials) is delivered for the 3rd year students. At ULP students complete the master program in chemoinformatics in two years: three semesters of formal instructions and hands-on tutorials followed by a semester of industrial or academic training. Different aspects of teaching chemoinformatics as a distinct but integral component of the general chemistry curriculum will be discussed.

72 Graduate training in chemoinformatics at the University of Sheffield
Valerie J. Gillet, John Holliday, and Peter Willett, Department of Information Studies, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom, Fax: +44 (0) 114 2780 300, v.gillet@sheffield.ac.uk

The Chemoinformatics Research Group at the University of Sheffield has several decades of experience in training scientists to PhD level, many of whom now hold senior positions in the field throughout the world. In 2000, our teaching activities were extended to masters level provision with the introduction of the world's first MSc in Chemoinformatics through financial support provided by the UK government (the Engineering and Physical Sciences Research Council). The programme has been widely supported by industry through a variety of mechanisms. More recently, our activities have been extended to include undergraduate teaching in a move to integrate chemoinformatics teaching into mainstream chemistry teaching. We also run an intensive one week short course in Chemoinformatics which provides hands-on training for scientists already in employment. The high demand for this course indicates that there is a continuing need for basic training in the field. This paper will review our activities in these areas.

73 Molecular informatics: Research and learning
Jonathan M Goodman, Unilever Centre for Molecular Science Informatics, Cambridge University, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, J.M.Goodman@ch.cam.ac.uk, John BO Mitchell, Unilever Centre for Molecular Informatics, University of Cambridge, CB2 1EW Cambridge, United Kingdom, and Robert C. Glen, Department of Chemistry, Unilever Centre for Molecular Science Informatics, Cambridge CB2 1EW, United Kingdom

Research is the main focus of the Unilever Centre for Molecular Science Informatics at the University of Cambridge's Department of Chemistry. However, the centre also provides both undergraduate and postgraduate teaching in molecular informatics. The undergraduate course in molecular informatics is the most advanced course in the chemistry degree programme that is taken by all the students in the class. At this stage, the students have already had the opportunity to specialise in particular areas of chemistry, and will soon move on to specific research projects. We provide information about the molecular informatics tools available to them, as they move towards the research environment, and demonstrate how data they gather can be combined to provide knowledge that would be inaccessible without cooperative effort and suitable analysis.

74 Developing a cheminformatics education and teaching center for the Web 2.0 world
David J Wild, School of Informatics, Indiana University, Bloomington, IN 47408, djwild@indiana.edu

In this presentation we will describe our development of a cheminformatics teaching and research hub at Indiana University. The talk will focus on three areas: our current educational programs (including Distance Education), our research center focus on cyberinfrastructure for cheminformatics, and our opinions on the most inspiring educational and research topics for the future. In particular, we will discuss some of the technical and pedagogical opportunities that we are pursuing for integrating cheminformatics into the "Web 2.0" world and potential ways for engaging the next generation of chemistry students with informatics tools.

75 Open toolkits and applications for chemoinformatics teaching
Christoph Steinbeck, Univ. Tuebingen, WSI-RA, Sand 1, Tuebingen D-72076, Germany, er@doktor-steinbeck.de

Chemoinformatics has seen the emergence of a number of open software packages and open standards in the last few years, including molecular editors, molecular viewers, toolkits for the conversion of chemical data formats and fully fledged chemoinformatics libraries. These components constitute a chance for chemoinformatics education, allowing the teacher to provide students no only with abstract algorithms or concepts but with concrete implementations, which can be studied in source code and tested with real data.

Further, in times of notoriously small education budget, the use of free software packages in courses allows keeping the costs low. In this talk the author gives an overview on the use of existing open source chemoinformatics software in his own chemoinformatics teaching effort.

Projects discussed include chemical editors such as JChemPaint, 3D viewers like Jmol, chemoinformatics libraries (CDK and JOELib), conversion packages (OpenBabel) and integrated workbenches like Bioclipse.

76 Mounting an undergraduate Chemoinformatics course with free software
Joao Aires-de-Sousa, REQUIMTE and Department of Chemistry, New University of Lisbon, campus FCTUNL, 2829-516 Caparica, Portugal, jas@fct.unl.pt

An undergraduate introductory Chemoinformatics module is presented that provides the fundamentals of computer processing of chemical information. Learning Chemoinformatics reinforces basic chemical concepts (e.g. stereochemistry), develops the ability for multidisciplinary approaches (e.g. in QSAR studies), and promotes transferable competences related to data management.

The Cheminformatics module is part of the 3rd year program of the Applied Chemistry degree at Universidade Nova de Lisboa, Portugal – an EU first-cycle “Bologna degree”, recently certified by ECTN with the Eurobachelor® label. The module covers three main topics: the representation of molecular structures and reactions, molecular descriptors, and machine learning. It provides a hands-on learning approach, making use of free web services for the calculation of 3D models, molecular descriptors, regression analysis, and training of neural networks. It also uses free software for academic purposes, as well as free data sets. The teaching material is available at http://www.dq.fct.unl.pt/staff/jas/qc .

77 Structure-focused pharmacophore models for teaching and exploring protein-ligand interactions
Thierry Langer, Institute of Pharmacy, Department of Pharmaceutical Chemistry, University of Innsbruck, Innrain 52, Innsbruck A-6020, Austria, thierry.langer@uibk.ac.at, Gerhard Wolber, Inte:Ligand GmbH, 2344 Maria Enzersdorf, Austria, and Daniela Schuster, Department of Pharmaceutical Chemistry, Computer Aided Molecular Design Group, University of Innsbruck, Institute of Pharmacy, Innsbruck A-6020, Austria

Feature-based 3D pharmacophore models have proven to be highly valuable query tools for database mining in virtual screening application scenarios. They reflect in a transparent manner the interactions between ligands and their respective binding sites and can be visualized easily. Therefore, in teaching cheminformatics related methods, such models are extremely interesting for exploring and understanding binding interaction patterns quickly and easily. We have created LigandScout, a user-friendly pharmacophore-generating platform to allow users to generate, visualize, and manipulate intuitively 3D pharmacophore models starting with ligand-target complex structures. In addition to the 3D visualization, our program includes a sophisticated 2D depiction algorithm for displaying a projection of the binding interactions. Since the core application is written in Java, it can be included easily into web-based e-learning and teaching protocols such as the PharmXplorer platform

78 Reaction prediction, classification, and retro-synthesis using a rule-based reaction expert system
Jonathan H. Chen, Qian-Nan Hu, and Pierre Baldi, Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697, chenjh@uci.edu

A rule-based reaction expert system, initially developed to support computer-based learning in organic chemistry, has been expanded in scope to address a broader set of chemoinformatics problems ranging from reaction prediction and discovery, to retro-synthetic analysis and combinatorial library design, to automated annotation and classification of large reaction databases. The current system comprises over 1,200 manually-curated reaction patterns written using the SMIRKS language with transformation rule extensions to enable robust predictions. Applications of this expert system will be demonstrated in: (1) reaction prediction, including complete mechanism diagrams, for interactive learning in organic chemistry and validation of synthesis plans; (2) retro-synthetic analysis, with automated combinatorial library design, available over the Web; and (3) automated annotation and classification of a large database containing about 4M chemical reactions. Select applications available at http://cdb.ics.uci.edu.

79 PowerMV: A free resource for viewing and manipulation of SD files
S. Stanley Young, NISS, PO Box 14006, Research Triangle Park, NC 27709, Fax: 919 685 9300, young@niss.org

PowerMV can be used as a stand-alone program, www.niss.org/PowerMV, or as a web service, http://eccr.stat.ncsu.edu/TSWeb/Default.aspx . PowerMV features include the following:

  1. Supports MDL SDF format
  2. Displays molecules in multiple columns.
  3. Displays properties contained in SD file in a table.
  4. Table of molecule pictures and properties can be exported to Excel
  5. Calculates three types of binary atom pair, fragment counts and continuous weighed Burden number molecular descriptors.
  6. Allows for similarity searching.
  7. Calculates Drug-like properties like LogP, PSA, MW, HBAs, HBDs, etc.
  8. Computes QSAR models, Least Angle Regression (LARS) and LASSO-2.
  9. Outlier detection using tetrads method (Douglas Hawkins, et al). (Code implemented by Andrew Wong).
  10. Novel robust single value decomposition (RSVD) for large datasets with missing values or outliers.

PowerMV should be a useful tool for teaching chemical informatics methods.

80 NMR-based mixture analysis of juices and beverages using an integrated online cross-platform cheminformatics tool
Istvan Pelczer1, Miichelle D'Souza2, and Gregory Banik2. (1) Department of Chemistry, Princeton University, Princeton, NJ 08544, Fax: 609-258-6746, ipelczer@princeton.edu, (2) Informatics Division, Bio-Rad Laboratories, Philadelphia, PA 19102

Education is most efficient if entertainment is in the mix, too. A real classroom exercise will be presented using an NMR-based direct mixture analysis of fruit juices that applies statistical and component level methods, and take advantage of the highly integrated multiplatform capabilities of the KnowItAll U software from Bio-Rad Laboratories.

We will first run very high-sensitivity 1H and 13C NMR of freshly squeezed juices from cherry tomatoes and a variety of grapes. The data are then processed and analyzed statistically leading to demonstrate obvious clustering between species and more refined distinction for different grapes. Interactive tools help to identify characteristic components. This simple case study relates to outstanding recent developments in the field of metabolomics and metabonomics.

For more mature audiences the substances can be extended to all kinds of beverages; we have early results on beer ("beer-omics") and whisky. Such analyses have serious potential and extended implications in the food industry.

81 Chemist-librarian: The best of both worlds
F. Bartow Culp, Mellon Library of Chemistry, Purdue University, 504 West State Street, West Lafayette, IN 47907-2058, bculp@purdue.edu

In the Internet age, isn't the concept of a librarian outmoded? If easy and almost unlimited information access is available to anyone at the click of a mouse button, why should a chemist consider academic librarianship as a career? Well, there are many reasons, including excellent job prospects, a high degree of career satisfaction, plus the chance to be a central player in the current redefinition of how science is being done. In this age of high-entropy information, the unique combination of abilities that we chemist/librarians bring to our jobs gives us not only the power to organize and access chemical information, but it can also enhance the value of that information and improve the entire communication process itself. We will present examples of how chemist/librarians are integral participants in the advancement of both of their professions.

82 You know you're a chemical information searcher if
Brian M Bridgewater, Search and Analysis Services Group, Knowledge Center, Rohm and Haas Company, 727 Norristown Road, P.O. Box 904, Spring House, PA 19477, Fax: 215-641-7811, bbridgewater@rohmhaas.com

For a Ph.D. chemist, crossing over to a career in Chemical Information Search and Analysis may seem like a strange choice to some after spending nine years in academic and industrial lab-based R&D roles. However, looking back with the usual clarity of hindsight, signposts were there pointing the way all along. Do you get a big kick out of finding a really useful journal article for your lab colleagues? Are you still the go-to person for family trying to find that perfect kohlrabi recipe, even though they know you have no idea what a kohlrabi is? Have you had a heated discussion with a fellow researcher at one in the morning about who gets to use the only working library terminal for a literature search? If so, there just may be a searcher in you trying to get out.

In this presentation, I hope to show an example of how exploring interests and activities which you take enjoyment from and are particularly dedicated to in a traditional R&D role can be invaluable in identifying and transitioning into a satisfying and rewarding non-traditional career path that is tailor made for you.

83 Information highway to drug discovery
Lynne P. Greenblatt, Chemical and Screening Sciences, Wyeth Research, CN-8000, Princeton, NJ 08543, Fax: 732-274-4850, greenbl1@wyeth.com

Over the past two decades, the amount of data generated in research has been increasing steadily and exponentially. The task of transforming this data into meaningful knowledge that can help the researcher to attain scientific and business goals faster and more efficiently has fallen to the relatively new discipline of informatics. Cheminformatics is the branch of informatics that deals with chemical structures and related information. Cheminformatics groups have developed many techniques and tools for analyzing chemical structures as they relate to chemical space, chemical properties, biological activity, ADME properties and so on. This talk will focus on the role of the Cheminformatics group in the drug discovery process at a major pharmaceutical research organization. The presenter will also discuss the career path she took to her current position, and touch on other careers in chemical information within the organization.

84 Path less traveled: From the periodic table to public relations
Janice E. Mears, Communications, Chemical Abstracts Service, 2540 Olentangy River Rd., Columbus, OH 43202, Fax: 614-447-3837, jmears@cas.org

A degree in chemistry or related sciences provides an excellent foundation for career paths in sales and marketing in chemical, pharmaceutical, and chemical information companies.

This paper will focus on the experiences of one individual's transition from working in a clinical laboratory into sales, marketing and communications, first at a fortune 100 company and then at Chemical Abstracts Service (CAS). It will describe the skills and responsibilities of these non-traditional careers and outline how best to prepare for the transition. A discussion of CAS and its role as a global leader in scientific information will be intertwined with comments on other non-traditional careers at CAS.

85 Breaking news: Chemistry is everywhere
Ivan Amato, Chemical & Engineering News, 1155 16th St., NW, Washington, DC 20036, i_amato@acs.org

Besides encompassing the rules, theories, know-how and equipment by which our more technically-oriented neighbors transform the raw stuff of the world into the constructed landscapes in which we spend most of our time, chemistry is a language. And despite the truism is everywhere in all of our lives, chemistry is a language known almost exclusively by the sort of people who attend ACS meetings. A very small subpopulation of the journalism community has mastered enough of this language to serve as interpreters, translators, and prepared observers, who can relay to various publics those events, developments, concerns, outrages, discoveries, technologies, and other consequential happenings that emerge from the chemical enterprise. I am privileged to be among this group, but when I received my bachelor's degree in chemistry about 20 years ago, I did not realize that I was opening a way to a career with the luxurious requirement that you must never stop learning.

86 Careers in patent law: Going beyond the bench with your chemistry degree
Justin J. Hasford, Finnegan, Henderson, Farabow, Garrrett and Dunner LLP, 901 New York Avenue, NW, Washington, DC 20001, Fax: 202.408.4400, Justin.Hasford@finnegan.com

This presentation will provide an overview of career opportunities in patent law, including an examination of the fields of patent prosecution, litigation, and licensing. Recommended and required programs of study will be discussed, and employment opportunities will be described, including options for attorneys, patent agents, and technical specialists. Most importantly, this presentation will explore and emphasize the ability to utilize your technical background in a maximally beneficial manner in the arena of legal services.

87 Application of novelty detection in virtual screening
Johann Gasteiger1, Dimitar Hristozov1, and Christof H. Schwab2. (1) Computer-Chemie-Centrum, University of Erlangen-Nuremberg, Erlangen 91052, Germany, Fax: 0049-9131-8526566, gasteiger@chemie.uni-erlangen.de, (2) Molecular Networks GmbH, D-91052 Erlangen, Germany

Novelty detection is used to identify patterns that do not belong to the space covered by a given data set. In ligand-based virtual screening, chemical structures perceived as novel lie outside the known activity space and can be discarded from further investigation. Compounds not perceived as 'novel' are suspected to share the activity of the query structures.

Four different ligand-based virtual screening scenarios including eight different biological targets and utilizing Self-Organizing Maps as a novelty detection device are presented: (1) prioritizing compounds for high-throughput screening; (2) selecting a number of active compounds from a large database; (3) assessing the probability that a given structure will be active; (4) selecting the most active structure(s) for a biological assay.

The performance of two techniques in a retrospective ligand-based virtual screening, similarity search with data fusion and novelty detection, is investigated. Three different structure representations, fingerprints, topological autocorrelation, and radial distribution functions, are compared.

88 Theory and practice of statistical significance for molecular similarity scores: When is a similarity score "significant"?
Pierre Baldi, Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697, Fax: 949 824-4056, pfbaldi@uci.edu, and Ryan W. Benz, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697-3445

One of the most fundamental tasks in bioinformatics or chemoinformatics is to search large databases for molecules that are “similar” to a given query , or set of queries. In bioinformatics, BLAST has become one of the workhorses of modern biology, allowing biologists to search sequence databases and retrieve ranked list of hits associated with significance scores (“e-values”). In chemoinformatics, similarity and search algorithms for small molecules have also been derived but, surprisingly, a theory of when a small molecule hit is significant has not yet been developed. Here we develop and apply a theory of statistical significance for small-molecule similarity scores. As in the case of BLAST, significance is assessed against a random background model. Several tractable background models of randomness are introduced and a statistical theory of Z-scores and Extreme Value Distributions is derived for similarity scores, such as Tanimoto scores, with specific implications for practical searches.

89 Fast and accurate prediction of small-molecule 3-D structures
Ryan W. Benz, School of Information and Computer Sciences, University of California, Irvine, ORU Genomics and Bioinformatics, Irvine, CA 92697-3445, rbenz@uci.edu, and Pierre Baldi, Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697

Three-dimensional molecular representations are vital for a wide variety of computational chemistry and chemoinformatics methods. In this work, we present our latest efforts to improve upon existing computational 3D structure generation methods and produce an open, fast, and accurate 3D structure generator for small molecules. Our hierarchical method incorporates the use of rigid segment and torsion angle databases composed of nearly 2 million fragments from 400,000 experimental structures, along with fast numerical methods to cover gaps in the database when needed. This approach is robust and always returns a structure given a valid 2D input. The accuracy was assessed by comparing the RMSD between the generated and experimental structures, and was typically below 1 Ang. In addition to a description of the method and its performance, we will also present new efforts to incorporate machine learning into our generation method.

90 Making 3-D structure searching easier
Robin Taylor, Cambridge Crystallographic Data Centre, 12, Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44 1223 336033, taylor@ccdc.cam.ac.uk

Google has raised the bar: people now expect instant answers. For those working in chemical information, this is hard to achieve. At CCDC, we wish to facilitate the searching of small-molecule and protein crystal-structure databases. Three approaches have proved useful. Firstly, raw crystal-structure data can be pre-processed into derivative databases which provide access to tailored information (e.g. molecular geometries, intermolecular contacts) without the need for explicit substructure searching. Secondly, tight integration between databases and client applications makes crystal-structure data instantly accessible to users of those applications, without them needing to see the native database interface. Thirdly, the flexibility of 3D searching can be enhanced by providing scripting interfaces to the underlying database objects. A combination of these approaches makes 3D structure searching easy, accessible and powerful. Uses include geometrical validation of structural models, analysis of protein-ligand interactions, generation of pharmacophores, and investigation of the similarities and differences between crystal-structure polymorphs.

91 Chemical Structure Lookup Service (CSLS)
Markus Sitzmann1, Igor V. Filippov2, Wolf-Dietrich Ihlenfeldt3, and Marc C. Nicklaus1. (1) Laboratory of Medicinal Chemistry, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, 376 Boyles St, Frederick, MD 21702, sitzmann@helix.nih.gov, (2) Laboratory of Medicinal Chemistry, SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD 21702, (3) Xemistry GmbH, D-35094 Lahntal, Germany

We present an overview of the recent advances of our Chemical Structure Lookup Service (CSLS), a web service allowing one to search for the occurrence of tens of millions of compounds in close to a hundred different commercial and public databases. CSLS combines pointers to structures in databases of different nature, such as purchasable screening samples; ligand, binding and crystal structure databases; bioassay databases; toxicological and environmentally relevant databases. CSLS is freely available at http://cactus.nci.nih.gov/lookup. We also discuss our CACTVS hashcode-based NCI/CADD Structure Identifiers, which are calculable for any small molecule. They provide an important search mechanism in CSLS, enabling it to function essentially as an “address book” of any small molecule. They are specifically designed to enable a fine-tunable yet rapid compound identification even in very large datasets. They can be set to be sensitive to a variety of chemical features such as tautomerism, different resonance structures drawn for a charged species, and presence or absence of certain fragments like counter ions, or to ignore any or all of these features. They are used in several of our web services, and tools to generate them are made freely available to the public.

92 Advances in industrial biotechnology and biorefining
Matthew T. Carr, Industrial and Environmental Section, Biotechnology Industry Organization, 1201 Maryland Ave SW, Suite 900, Washington, DC 20024, mcarr@bio.org

Recent advances in industrial biotechnology are revolutionizing the production of fuels, chemicals and consumer products from a range of renewable feedstocks. These new biobased products have the potential to greatly reduce U.S. dependence on imported petroleum, slash greenhouse gas emissions, and boost local economies in all 50 states. Numerous pilot cellulosic biorefineries are now in the planning and construction stages, and several biobased chemical platforms are now being commercialized. This presentation will highlight the technologies and commercial ventures that are forming the leading edge of the next industrial re-evolution.

93 When small is beautiful: SME scale engineering
R. Elaine Groom, QUESTOR Centre, The Queen's University of Belfast, David Keir Building, Stranmillis Road, Belfast BT9 5AG, United Kingdom, e.groom@qub.ac.uk

The agriculturally-based economy in the North and South of Ireland affords many opportunities in the area of bioenergy and energy from biomass. The QUESTOR Centre, a university/industry cooperative research centre using the US NSF model, has an established research programme working closely with SMEs (Small to Medium Enterprises) in this area.

The paper will discuss a number of projects in the area of biogas production, biofuel development and the production of new biomass-based fuels, which are being developed with SMEs in the region. It will discuss the advantages of process intensification and the production of small scale plant and how this has the potential for far-reaching benefits to the bioeconomy.

94 Current perspectives on licensing and technology transfer in technology industries
Andrew H. Berks, Hoffmann & Baron LLP, 6 Campus Dr, Parsippany, NJ 07054, andy@andyberks.org

Licensing and technology transfer often plays a critical role in the development of products in technology. This talk will explore the purpose of licensing arrangements, how they work, and results parties can expect to realize.

95 Cellulosic biofuels and Shell
Jean-Paul Lange, Shell Global Solutions International B.V, Badhuisweg 3, 1031 CM Amsterdam, Netherlands, jean-paul.lange@shell.com

Climate change, energy security and agricultural support are pushing governments around the world to stimulate renewable energies and biofuels. While in full development, ethanol and fatty methyl esters are revealing severe shortcomings e.g. on food price, limited CO2 savings and land degradation. Mankind is therefore turning to lignocellulosic residues as feedstock for a 2nd generation biofuels.

Lignocellulose is difficult to convert and requires, therefore, new and complex manufacturing technologies. Shell is developing several of these technologies in partnerships with industries and universities.

Ethanol and fatty methyl esters might not be the biofuels of the future because of sub-optimal properties. Shell is exploring advanced 2nd generation biofuels that are fully compatible with present fuels and cars and, ideally, offer advantageous performance properties.

To be successful in the long run, the large-scale deployment of 2nd generation biofuels requires rapid improvements in production costs, strong government support to early movers and large investments in plants and infrastructure.

96 Cellulosic ethanol gets ready for prime time
Carlos Riva, Verenium Corporation, 55 Cambridge Parkway, Cambridge, MA 02142, carlos.riva@verenium.com

Mounting interest in advanced biofuels in the United States, including cellulosic ethanol, is driven by a “perfect storm” of economic, political and environmental trends. These factors include historically high petroleum prices; acute energy security concerns; and rising public consciousness of the profound threat posed by climate change. Verenium is a leading developer of cellulosic ethanol process technology. In February 2007 the company broke ground on the first true demonstration-scale cellulosic ethanol facility in the nation, located in southwestern Louisiana. This facility, with a nominal capacity of 1.5 million gallons, is scheduled for mechanical completion in early 2008. It is designed to produce ethanol from a variety of agricultural residues and specially-bred energy crops. This presentation will provide an overview of Verenium's cellulosic ethanol technology, describe some of the lessons learned through this first-of-a-kind demonstration project, and address some of the remaining challenges that must be met if the vision of cost-competitive, low-carbon biofuels is to be translated into commercial reality.

97 Building a cellulosic biofuels industry from the ground up: Tennessee Biofuels Initiative
Kelly Tiller, Director of External Operations, Office of Bioenergy Programs, University of Tennessee, 2506 Jacob Drive, Knoxville, TN 37996-4570, ktiller@tennessee.edu, and Timothy Rials, Director of R&D, Office of Bioenergy Programs, University of Tennessee, Knoxville, TN 37996-4570

Tennessee is leading an ambitious effort to develop a cellulosic biofuels industry. With more than $300 million in state, federal, and private funding, Tennessee is poised to succeed. The centerpiece is construction and operation of a 200 TPD cellulosic biorefinery, supplied by locally grown switchgrass and wood chips, scheduled to be completed by mid-2009. A significant focus is developing a supply chain for providing sufficient, reliable, and sustainable dedicated energy crop feedstocks such as switchgrass. The integrated project is designed to answer a variety of research questions, ranging from feedstock supply systems to pre-treatment and conversion technology approaches and systems to product distribution and marketing. The project is led by the University of Tennessee Institute of Agriculture, with primary partners Mascoma Corporation and the Oak Ridge National Laboratory. The integrated systems approach of the Biofuels Initiative is designed to foster development of a large scale commercial biofuels industry.

98 Challenges in commercializing production of fuels from cellulosic biomass
Charles E. Wyman, Chemical and Environmental Engineering, University of California, Riverside, CA 92521, cewyman@engr.ucr.edu

Ethanol and other fuels derived from abundant, low cost cellulosic biomass including agricultural and forestry residues, portions of municipal solid waste, and dedicated woody and herbaceous crops offer substantial environmental, economic, and strategic benefits. The need now is to commercialize biofuels technologies that are competitive to realize their impressive benefits. However, the cost for first-of-a-kind applications is much higher than projected by initial estimates or for later plants because allowances for a possible drop in performance with scale-up, extra equipment to insure process operability, and extra contingencies to pay for unforeseen delays are introduced to ensure financial coverage. These additional costs can impede or prevent a new biofuels industry from emerging. Thus, a challenge is to capitalize on opportunities to reduce costs and improve the ability to accurately project the performance of large-scale plants to accelerate the commercialization and reduce the cost layers for new biofuels. Nonetheless, government assistance will likely be vital to launch the first few projects and lay the foundation for rapid growth of a competitive industry that will have major societal benefits.

99 Analyzing large chemical substance answer sets in SciFinder: Techniques for comprehensive retrieval and subsequent exploration
Anthony J. Trippe, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: 614-447-5443, atrippe@cas.org

This presentation will demonstrate new developments in analyzing large (greater than 5,000) substance answer sets when using SciFinder from CAS. Utilizing framework analysis in conjunction with Tanimoto similarity comparisons, it is now possible to analyze broad structure searches which are designed to be as comprehensive as possible. Previously broad searches represented difficulties with regards to interpretation of the results, but with the new techniques presented it is now possible to isolate the areas of interest in a comprehensive search while discarding the tangential areas.

100 Challenges in structure searching: Tools, knowledge, and experience
Lora Burgess, CAS, 2540 Olentangy River Road, Columbus, OH 43202, lburgess@cas.org

Many of the challenges in structure searching lie with the level of expertise of the searcher. Knowledge of database content and coverage, as well as availability or accessibility of quality tools determine the type and effectiveness of the structure search that can be done. CAS offers different tools for users with different needs, from SciFinder® products that offer easy to use access to CAS data, to STN, which offers access to many databases containing chemical structure information along with a high level of control over answer retrieval. These tools are especially designed to take full advantage of the unique organization and content of the CAS Registry, the most authoritative substance database, including 33 million records for small molecules. Challenges arise in user education and selecting the appropriate tools and content for the question at hand.

101 Overcoming eccentricities of inorganic and organometallic substructure searching
Judith Currano, Chemistry Library, University of Pennsylvania, 3301 Spruce St. 5th Floor, Philadelphia, PA 19104-6323, Fax: 215-898-0741, currano@pobox.upenn.edu

Databases are extremely inconsistent in their methods of representing bonds and coordinations between metals and ligands in organometallic substances, making comprehensive and accurate substructure searches quite difficult. Sometimes, coordination is represented by a single linkage from the ligand to the metal; at other times, each coordinating atom is shown to “bond” to the metal center; and some databases go so far as to draw metals and ligands as separate components, without giving any indication of their connection. Understanding the eccentricities of the various information sources goes a long way towards gaining maximal retrieval from an inorganic or organometallic substructure search. The peculiarities of several key structure and reaction databases are discussed, and tips and techniques are given for using structure in combination with other techniques to find structurally similar inorganic and organometallic substances.

102 Comparing chemical structure searching in multiple structural databases
Donald Walter, Customer Training, Thomson Scientific, Suite 250, 1800 Diagonal Rd, Alexandria, VA 22314, Fax: 703 519 5838, Don.Walter@Thomson.com, and Bob Stewart, Corporate Markets/Dialog, Thomson Scientific, Philadelphia, PA 19104-3302

There are several different ways of searching the patent and technical literatures using chemical structure searching. In Thomson Scientific systems alone, there are several ways of searching the same databases on different systems. How do the results compare? This talk will explore several ways of searching, and will compare and contrast searches done on several platforms.

103 Comparing Merged Markush Service and Marpat search results: Two case studies
Joseph M. Terlizzi, Questel, Inc, 81 Pierrepont St., Brooklyn, NY 11201, jterlizzi@questel.orbit.com

CAS' Marpat database on STN and the Thomson/INPI Merged Markush Service (MMS) on Questel are two well-known and highly esteemed online services for searching for Markush structures in patents. This paper will describe how two Markush structures can be interpreted from the source patent and entered into each system, and how and why results in each system can vary. Limitations and advantages in each system will be discussed and the power of understanding and using both systems will be encouraged.

104 Follow the price and flow of chemicals using worldwide trade statistics
Bob Stewart, Corporate Markets/Dialog, Thomson Scientific, 3501 Market Street, Philadelphia, PA 19104-3302, bob.stewart@thomson.com

A huge amount of valuable information can be inferred from worldwide trade statistics. Applications for this data include assessing market share, tracking competition, monitoring trade flow, identifying trading partners, monitoring price fluctuations and tracking the movement of products around the globe. Official trade statistics are available from many countries, but data can often be deduced for countries that do not report official statistics. This presentation will show examples of many of these applications with a focus on the chemical industry.

105 Role and practice of price assessment in chemical markets
Stephen Burns, ICIS, 3355 West Alabama, Suite 700, Houston, TX 77098

Chemical markets are among the least transparent of major world commodity markets. This presentation explores the role of benchmark contract settlements; methodologies for price assessment, and the use of assessments in long-term supply contracts; factors influencing the volume of spot transactions; and the developing role of regulators as market liquidity increases.

106 Untangling the chemical information pricing web
Janette B. Carver, Chemistry Physics Library, University of Kentucky, 150 Chem Phys Bldg, Lexington, KY 40506, jbcarv1@email.uky.edu, and Patricia Kirkwood, University of Arkansas Libraries, University of Arkansas, Fayetteville, AR 72701-4002

Need to know quick and easy resources in which to look for chemical pricing information for your undergraduate chemical engineering students? This presentation encompasses pricing resources that are easy to find and use. These range from free web sites to subscription based resources. A sample of the resources include: ICIS, C&E News, The Engineer, Knovel, Compendex and several government resources. Resources discussed stem from answering project and general pricing information questions asked by chemical engineering students.

107 Information needs during the chemical engineering senior design sequence at the University of Arkansas
Robert R. Beitle, Department of Chemical Engineering, University of Arkansas, 3202 Bell Engineering Center, Fayetteville, AR 72701, Fax: 479.575.7926, rbeitle@engr.uark.edu

Students in the chemical engineering program at the University of Arkansas are required to complete a senior design experience that focuses on process development. Students are expected to develop a preliminary process design, capital cost estimate, and manufacturing cost estimate for a large scale manufacturing facility that produces a chemical or biochemical commodity such as poly(ethylene), gasoline, or ethanol. Student success requires providing reasonable estimates that are up to date, requiring access to a wide variety of literature that includes vendor specific information, chemical data, thermodynamic data, and industry-specific information. Access to information resources is provided through key interactions between chemical engineering and the engineering librarian. In the presentation, we identify information needed and discuss how students are introduced to the process which leads to the collection of technical data on their process. Students are shown both successful and unsuccessful strategies in order to experience both pitfalls and triumph.

108 Teaching students to use company and industry information during their job search
Jeremy R Garritano, Mellon Library of Chemistry, Purdue University, 504 W. State St., West Lafayette, IN 47907, jgarrita@purdue.edu

Libraries and educational institutions provide access to a wide variety of resources that are useful in investigating potential employers, preparing for interviews, and predicting trends in industry. However, often this information is not found in traditional scientific and technical resources that many chemistry and chemical engineering students are familiar with using. This paper will present a number of strategies for using company and industry resources to look at financial information, industry developments, and opinions of current employees, among other attributes, to better help student prepare themselves for the job market.

109 Conformational selection revealed by flexible-ligand flexible-protein docking
Zunnan Huang, Department of Chemistry and Biochemistry and Center for Nanoscience, University of Missouri-Saint Louis, One University Boulevard, Saint Louis, MO 63121, Fax: 314-516-5342, zn_huang@yahoo.com, and Chung F. Wong, Department of Chemistry and Biochemistry and Center for Nanoscience, University of Missouri-St. Louis, St. Louis, MO 63121

We present a new flexible-ligand flexible-protein docking model in which the protein can adopt conformations between two extreme protein structures observed experimentally. By testing this model on the docking of four diverse ligands to protein kinase A, we found that the ligands were able to dock properly to the protein by selecting their preferred conformations. Essential protein movement such as that involving the glycine-rich loop was found to be important for the ligands to move from the surface of the protein into the binding site. By imposing relatively soft conformational restraints to the protein during docking, one can reduce computational costs yet permit essential conformational changes that are required for these inhibitors to dock successfully to the protein. We also introduced a new measure, in addition to the usual root-mean-square-deviation (RMSD), to judge the agreement of a docking structure to the experimental one. In using a flexible protein model, RMSD is sensitive to the way that two protein structures are superimposed before the RMSD between the ligand structures is calculated. We found that evaluating the correlation between intra- and intermolecular distance matrices provided a useful additional check.

110 Classification models for hERG inhibitors by counter-propagation neural networks
Khac-Minh Thai, Emerging Field Pharmacoinformatics, Department of Medicinal Chemistry, University of Vienna, Althanstrasse 14, Vienna A-1090, Austria, Fax: +43-1-4277-9551, thaikhacminh@univie.ac.at, and Gerhard F. Ecker, Department of Pharmaceutical Chemistry, University of Vienna, Vienna A-1090, Austria

Inhibition of hERG channels prolongs the ventricular action potential and correspondingly the QT-interval with the risk of torsade de pointes arrhythmias resulting in sudden cardiac death. Therefore, early prediction of hERG K+ channel affinity of drug candidates is becoming increasingly important in the drug discovery process. Counter-propagation neural networks (CPG-NN) were applied to design computational models for classification and prediction of hERG blockers based on a dataset of compounds taken from the literature and several sets of molecular descriptors including physicochemical parameters, VSA and 2D-, and 3D-SIBAR descriptors. The CPG-NN with a 3-dimensional output layer combined with a set of 11 hERG relevant descriptors gave excellent results especially in classifying compounds with IC50-values in the range of 1 - 10µM. The total accuracy values obtained for training and test sets are 0.93-0.95 and 0.83-0.85, respectively. In each class of hERG compounds, the GH scores archived are satisfactory with 0.89 to 0.97 for the training set and 0.74 to 0.87 for the test set. This model provides possible strategies for improving the performance of predicting and classifying compounds having hERG IC50 values in the range of 1-10 µM.

We gratefully acknowledge financial support by the Austrian Science Fund (grant # L344-N17) and the ASEA-Uninet in co-operation with the Austrian Council for Research and Technology Development and the Austrian Academic Exchange Service (ÖAD).

111 Discovery and applications of power-laws in organic chemistry
Ryan W. Benz1, S. Joshua Swamidass2, and Pierre Baldi2. (1) School of Information and Computer Sciences, University of California, Irvine, ORU Genomics and Bioinformatics, Irvine, CA 92697-3445, rbenz@uci.edu, (2) Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92612

Chemical fingerprints are the basis for molecular similarity search methods used in most chemical database systems, including Chemical Abstracts Service, PubMed, and ChemDB. Understanding the statistical properties of these fingerprints is crucial for developing new and improved chemical search techniques. Through the study of fingerprints from large chemical databases, we have discovered that the distributions of several combinatorially extracted fingerprint features, such as labeled paths and trees, follow power-law distributions. These power-laws can be used to generate more realistic probabilistic models for fingerprints. We have also found that the power-laws can be leveraged to produce highly efficient compression schemes for chemical fingerprints. These compression schemes losslessly encode fingerprints in approximately 300 bits, or 1/3 the size of typical, lossy compressed fingerprints. Using these lossless representations, the exact similarity scores between pairs of molecules can be computed, leading to improved recall of drug-like molecules using similarity search methods.

112 Does being "lazy" in a smart way help?
Shaillay Kumar Dogra and Kalyanasundaram Subramanian, Cheminformatics, Strand Life Sciences Pvt. Ltd, No. 237, Sir C. V. Raman Avenue, Raj Mahal Vilas, Bangalore, India, Fax: +91-80-23618996, shaillay@strandls.com

"Lazy" learning or instance-based learning involves holding learning (and predicting) any model functions till the actual example whose value needs to be predicted is presented. K-nearest neighbor (kNN) method is such a “lazy” method that is used popularly in QSAR modeling (k=3,5). Also, some recent results indicate that the accuracy of prediction is higher if a test compound has more neighbors in the training set (that was used to build the QSAR model). The decision to call a compound as a ‘neighbor' is made on a pre-decided threshold of some similarity measure. Thus, in one case we are setting the number of neighbors as a fixed number while in the other we decide them dynamically on a pre-fixed similarity cutoff. Here , we propose integrating the two methods by determining the ‘k' number as a function of a given similarity cut-off value. We investigate if this makes for better learning.

113 How do I know my model is telling me the right thing?
Shaillay Kumar Dogra, Cheminformatics, Strand Life Sciences, 237, C V Raman Avenue, Raj Mahal Vilas, Bangalore 560080, India, shaillay@strandls.com, and Kalyanasundaram Subramanian, Cheminformatics, Strand Life Sciences Pvt. Ltd, Bangalore, India

Computational models like those derived using QSAR are used to predict properties for drug-like compounds. There are various procedures and tests that are adopted during model building to ensure that the models being learnt are ‘good'. However, the story does not end here. When using these models to predict values for ‘unknown' compounds there are distinct ways by which one can believe that the predicted values are accurate. Such ‘confidence measures' could be based on statistical or algorithmic methods, or use chemical similarity based measures to determine applicability of the model to the ‘unknown compound', both quantitatively and qualitatively. We shall discuss some of these methods here. Determining the ‘expected' accuracy of prediction is of immense application especially when using QSAR models that were derived using training data belonging to some chemical space but now after deployment are being used for predicting values of all sorts of compounds.

114 On the problem of imbalanced datasets
Shaillay Kumar Dogra and Kalyanasundaram Subramanian, Cheminformatics, Strand Life Sciences Pvt. Ltd, No. 237, Sir C. V. Raman Avenue, Raj Mahal Vilas, Bangalore, India, Fax: +91-80-23618996, shaillay@strandls.com

Whether one is able to learn good QSAR models from the given data, amongst various other factors, also depends upon the nature of data. For example, when training classification models, imbalanced datasets are a particular problem. This is of relevance when the class of interest is the one that is under-represented in the training examples. Real world data on which predictions need to be made may show even sparse distribution making any feedback and subsequent model corrections rare. Even complex modeling algorithms are sensitive to class imbalance and their performance is thus compromised. Some of the methods to tackle the problem of class imbalance range from simple solutions like under- or over-sampling the class of interest to complex ones that involve tweaking feature selection and modeling algorithms towards learning the representation of the imbalanced class better. We shall be discussing some of these measures here and report our findings.