Abstracts, 229th ACS National Meeting
San Diego, CA, March 13-17, 2005

Titles link to slides when available. Please note: Presentations given at CINF symposia have been posted to the CINF website with express permission granted by the authors who retain the original copyright. These presentations are for information purposes only and cannot be further disseminated without the author's prior written permission.

CINF 1:  "The importance of being Ernest": Why gathering and cleaning all the relevant data matters for patent analysis
Anthony J. Trippe, Science IP/Chemical Abstracts Service, 2540 Olentangy River Rd., Columbus, OH 43210, atrippe@cas.org

More and more in the process of making critical business decisions, technical patent and non-patent information is used as a means to determine competitive position and formulate company strategy on technical subjects. The importance of having all the relevant data available for analysis and having that data normalized so accurate statistics can be generated cannot be overstated.

The purpose of this talk will be to examine the requirements for ensuring that, as much as possible, all of the pertinent data, whether from patent or non-patent sources, has been gathered. Further, this presentation will examine the pitfalls of performing an analysis on an incomplete data set or or on a collection which has not been cleaned and normalized. Specific examples from the author's personal experience will be shared.

CINF 2:  Patent analysis: The technical intelligence professional’s adjustable spanner
Robert A Stembridge, Global Marketing Services, Thomson Scientific, 14 Great Queen Street, London, United Kingdom, bob.stembridge@thomson.com

The use of patent analysis in tracking the development and evolutionary trends within a technology is a vital component in the technical intelligence professional's toolbox. However, as with any specialist tool, a degree of knowledge and experience goes a long way towards using the tool safely.

Using a case study approach, we will explore some of the issues in using patent information and illustrate how valuable technical intelligence can be derived from judicious use of patent analysis.

CINF 3:  Technology oriented competitive intelligence: A primer
Bruce Mason, Research, Development & Technical Services, CIBA Vision Corporation, 11460 Johns Creek Parkway, Duluth, GA 30097, Fax: 1-678-415-7467, bruce.mason@cibavision.novartis.com

Competitive or business intelligence comes in many shapes and flavors. Think of a business function and place the term in front of the word intelligence and you have described a facet of competitive intelligence. For example, marketing intelligence, sales intelligence, distribution intelligence, manufacturing intelligence, human resource intelligence, are but a few. Technical or technology intelligence is given significant emphasis in many organizations because it historically has been linked to R&D, research trends, scientific breakthroughs and innovation. In other words, technology oriented competitive intelligence encompasses how a competitor does things, e.g., develops new products or services, manages processes, responds to scientific advancements that impact its industry, and interacts with its customers and suppliers. An overview of technology oriented competitive intelligence, what it is, who is served, analytical tools, and frameworks for assessing competitors' technology will be discussed.

CINF 4:  Rapid technology intelligence process
Alan L. Porter, R&D, Search Technology, Inc, 4960 Peachtree Industrial Blvd, Norcross, GA 30071-1580, Fax: 770-263-0802, aporter@searchtech.com

Technical intelligence conveys value when it affects decision processes. Too many managers and professionals have come to disregard technical intelligence because it has been too slow to provide timely guidance. This is changing. I indicate how combining five key features enables rapid technology intelligence processes (RTIP): • Immediate desktop access to science & technology database search results • Standardized sets of technology questions • Templates of “innovation indicators” to answer specific questions to profile a technology or an organization • Analytical software tuned by macro's to clean those data and generate results pertinent to the question at hand in seconds • Wizards to guide the user to the right answers, presented in the right form, for the target audience. RTIP answers certain technical questions in minutes. It provides essential empirical evidence to inform strategic technology decisions in hours. These capabilities can, and will, change the very nature of technology management.

CINF 5:  PatGen DB: A consolidated genetic patent database platform
Richard JD Rouse, PatentInformatics, Inc, PO Box 948586, La Jolla, CA 92037, rjdrouse@patentinformatics.com

Patent information is voluminous. According to the 2003 United States Patent and Trademark Office (USPTO) annual report, the office received 333,452 applications; this accounts for 913 applications a day. Compared to the wealth of online resources covering genomic, proteomic and derived data the scientific community is rather underserved when it comes to patent information related to genetic sequences. Here we describe, PatGen DB, an integrated database containing data from bioinformatic and patent resources. This resource is an open-ended service designed to enable customized searching and database compilation. Features of PatGen DB can be searched at http://www.patgendb.com where bibliography, taxonomy and sequence search tools are provided.

CINF 6:  Globalization trends measured via patent analysis
Anthony F. Breitzman Sr., CHI Research, Inc, 10 White Horse Pike, Haddon Heights, NJ 08035, Fax: 856-546-9633, abreitz@chiresearch.com

There have been a lot of discussions of globalization and outsourcing of jobs in manufacturing industries, but virtually no discussion concerning globalization of R&D. Trends in patent activity show increased globalization in R&D that is noticeable in patents. In the last 10 years, large companies have increased R&D efforts outside their home countries. As an example General Electric has always had facilities all over the world, but until very recently virtually all of its R&D was done in the US. In 1994, 96% of its patents were invented in the US, in 2004, that number is 86%. The numbers are similar for many US companies. In Japan we see the same thing. In 1994, 99.5% of Canon's US patents were invented in Japan. In 2004, that number is down to 94% and the number for Sony is down to 85%. This has huge implications in terms of jobs, since a 5% decline in patents invented in the US translates to a $300 billion+ drop in GDP, but it also has implications for competitive intelligence, counter-intelligence, etc. In this study we examine US and EP patents in order to analyze these trends. Questions we consider are, is globalization of R&D occurring? Is it a positive or negative for US companies? That is, is the US a net exporter or importer of R&D jobs in recent years? Finally, we take a similar look at companies in Japan and Europe and attempt to answer the same questions.

CINF 7:  Assembling the information mosaic
Donald Walter, Customer Training, Thomson Scientific, 1725 Duke Street Suite 250, Alexandria, VA 22314, Fax: 703 519 5838, Don.Walter@Thomson.com

Technical Intelligence requires information from many sources and disciplines, and of many types. This talk will focus on the integration of patent, technical and business information as raw material for analysis. Case studies will show how the information mosaic can be assembles into an informative picture.

CINF 8:  Analyzing and presenting chemical structural information in support of competitor or technology assessment
Kerry G. Stanley, Science IP, Chemical Abstracts Service, 2540 Olentangy River Rd, Columbus, OH 43202-1505, Fax: 614-447-5627, kstanley@cas.org

In many areas of research any truly diligent review of the technological landscape will require an assessment of a structural profile. This may be most evident in exploring SAR relationships in the pharmaceutical industry but may apply to other areas as well; for instance an analysis of the monomeric components imparting specific properties to a class of polymers. This talk will present several case studies where a "R-group Analysis" of a class of chemical structures will provide insights into the chemical approaches explored by differing competitors within a research area. Similar analytical approaches may be used to showcase the similarities, and most importantly, the differences in a class of compounds otherwise viewed only at the individual substance level. For an organization interested in innovating in a crowded art space this type of analysis is useful for identifying which organizations have covered what structural modifications to a central class of compounds or around a specific scaffold.

CINF 9:  Start-up companies and chemical informatics: A professional service provider's perspective
Robert D. Feinstein, Kelaroo, Inc, 312 S. Cedros Ave., Suite 320, Solana Beach, CA 92075, rdf@kelaroo.com

Kelaroo integrates and enhances the drug discovery efforts of companies through a combination of cheminformatics products and professional services. We have worked with dozens of start-ups and other small drug discovery companies. Most small companies face similar challenges in terms of balancing basic research needs, budget constraints and resource issues. However, start-ups typically strive for novel research techniques that may not conform to commercially available software and database solutions. We will present our perspective on how start-ups and other small drug discovery companies can best prioritize and implement solutions to their cheminformatics needs. Examples will include commercial and custom systems for reagent management and procurement, library enumeration, compound registration and archival, and biological data management.

CINF 10:  Developing an hepatotoxicity database
James Kelly, Amphioxus Cell Technologies, Inc, 11222 Richmond Ave, Suite 180, Houston, TX 77082, jkelly@amphioxus.com

Amphioxus Cell Technologies has developed a series of tools for high throughput hepatotoxicity testing. These tools are intended to be used early in the drug discovery process so that structure toxicity relationships can be developed along with SAR, allowing compounds to be optimized for toxicity and activity simultaneously. It became clear that in order for this system to be truly useful, we needed to develop a database of compounds that had been screened through the assays. This would allow our customers to place the results in the context of other known toxins and structurally related compounds. We set about screening several thousand known compounds in each of seven assays at multiple concentrations. We quickly realized that our then current information resources were insufficient. We needed a system that could group the results according to chemical structure and that would allow structural searches within the database, yet, we had only rudimentary knowledge of chemistry based software. With the help of MDL Information Systems, we were able to implement a relatively sophisticated system quickly and inexpensively without the addition of substantial information technology resources.

CINF 11:  Battling the data avalanche: A chemical data management solution for the start-up company
Antony Williams, Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON M5H 2L3, Canada, tony@acdlabs.com

The pharmaceutical and chemical industries are well acquainted with the challenges of managing various forms of chemical data across an organization. These challenges are augmented when considering the plight of start-up companies, whose monetary and human resources are often severely compromised relative to the need to manage the volumes of chemical data they are generating.

This talk will discuss the emergence of a novel database software system designed for standardizing and consolidating chemical information company-wide. The software integrates chemical structures with images, reaction diagrams, documents, and text in a manner that is customizable to the user, and thus is malleable to the specific data management needs of an organization. Databases that are built in this system are searchable by chemical structure, sub-structure, text, and other user-defined data fields. The databases can be distributed via thick client or shared across an organization via a web interface. Such databases are easily accessible by all beneficiaries in the company, and can be connected to commercial tools for physical property and spectroscopy prediction, systematic nomenclature generation, and analytical data management (for example, NMR, MS, IR, UV, HPLC, and GC).

CINF 12:  Integrating ISIS/Host RCG databases with other applications
Mark Runyan, Richard Sandstrom, Julie Myhre, Alex Tulinsky, and Ambrogio Oliva, Cell Therapeutics, Inc, 501 Elliott Avenue West, Suite 400, Seattle, WA 98119, mrunyan@ctiseattle.com

Our group deploys a variety of cheminformatics and biological database management software at CTI. Most data originates with the registration of new chemical entities in an ISIS/Host Relational Chemical Gateway (RCG) database; therefore we must integrate ISIS/Host RCG databases with a variety of other systems which manage related information. We look for the simplest and most direct method of integration, but our decisions are always guided by the requirements and capabilities of the third party application. Common methods are direct access and replication, both of which rely on the underlying framework of the Oracle relational database from which ISIS/Host RCG is based. Specific integration techniques and associated implementation details will be discussed in the context of CTI's Scientific Systems environment, which includes ISIS/Host, IDBS ActivityBase, Chemical Computing Group's Molecular Operating Environment, and other third party applications.

CINF 13:  Capturing and aggregating large-scale discovery data in a start-up environment
Susan M. Baxter, National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, smb@ncgr.org, Jacquelyn Fetrow, Departments of Physics and Computer Science, Wake Forest University, and Stephanie J. Reisinger, ProSanos Corporation

GeneFormatics' founding target identification technology was ideally suited as a platform for lead discovery and evolved into an in-house, centralized application for matching small molecules with human protein tyrosine phosphatase targets. The large-scale approach taken by GeneFormatics (GFI) for target and lead discovery required relational databases and applications to automate workflow, to compute and update large amounts of sequence information regularly, to manage intellectual property, and, importantly, to reliably and quickly deliver information to customers. The major challenges faced by GFI were the integration of disparate, genomic-scale databases, and the rapid development of an automated work-flow to manage and analyze the data. To solve this, GeneFormatics used a multidisciplinary team of research scientists, who articulated the short and long-term needs, and professionally trained software and database engineers, who quickly translated those needs into useful and validated software applications.

CINF 14:  Mobilizing published data to make informed drug discovery decisions
Russ Hillard, Marketing, Elsevier MDL, 14600 Catalina Street, San Leandro, CA 94530, russ@mdli.com

High throughput techniques in chemistry and biology generate ever-increasing volumes of chemical structures, physical properties, and bioassay data. Much of this data is indexed in databases, posted on web sites or divulged in patents, conferences, journals and reviews.

The researchers' challenge is to extract actionable information – without being experts in locating data sources and using multiple search applications. Typical questions concern which chemical series to explore, the synthesis or modification of compounds, purchasing of starting materials, pharmacological profiles, metabolic liabilities or toxic properties, and safety issues.

This presentation focuses on using Elsevier MDL's DiscoveryGate service to answer such questions. DiscoveryGate delivers key chemistry-related databases from a variety of sources including CrossFire Beilstein, ChemInform, MDL Available Chemicals Directory, MDL Drug Data Report, MDL Toxicity and MDL Metabolite. It is linked by reaction type to major reference works on chemical synthesis. Researchers can view cited papers or patents from licensed electronic repositories such as ScienceDirect. We will compare the use of DiscoveryGate with other non-integrated sources and discuss searching workflows and strategies.

CINF 15:  The Vault, ArQule’s dry compound archive
Rebecca J. Carazza, Research Informatics, ArQule Inc, 19 Presidential Way, Woburn, MA 01801, rcarazza@arqule.com

ArQule's strategy in January 2002 required that we leverage our scientific excellence and resources to support our transition into a recognized R&D organization with compounds in clinical development. With this, we identified the need to change from a solution phase, plate based storage of compounds with limited characterization to a fully managed dry compound archive with increased characterization of compounds to support compound identification as well as legal needs. In less than four months time with equipment costs under $30K, ArQule defined and implemented robust new processes, including software and hardware systems to manage dry compounds. The new processes included preparing and characterizing, submitting, storing and handling, requisitioning and dispensing of dry compounds that have been synthesized as singletons or in high-throughput production.

CINF 16:  Extracting knowledge and delivering data: From the analytical laboratory to the chemist's desktop using web-enabled technologies
Antony John Williams, Scientific Development, Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596, tony@acdlabs.com

Walk-up or open-access laboratories have dramatically impacted the ability for a small organization to support the analytical needs of its chemists. Commonly, skilled professionals assume the duty of laboratory manager as well as skilled technical consultant. As part of this responsibility one challenge is the distribution of data from the instruments to the chemist as well as providing enabling technologies to extract full-value from the data. Open-access laboratories are heterogeneous in nature requiring that data from a series of techniques can be distributed in a homogenizing fashion. The world-wide web has certainly assumed the primary mantle of electronic communication nowadays and would be assumed to be an ideal solution for analytical data dissemination as well as management and distribution of the extracted knowledge. This talk will detail technical approaches for the delivery of heterogeneous analytical data, including integrated chemical structures, to an organization.

CINF 17:  What do they want from me? A chemistry librarian explores liaison needs and desires
Beth Thomsett-Scott, Reference and Information Services, University of North Texas Libraries, P.O. Box 305190, Denton, TX 76226, Fax: 940-565-3695, bscott@library.unt.edu

Have you ever wondered what a liaison librarian does? What their role is in providing services to an active chemistry department in an academic library? What do the faculty members, students and staff want from the library? This session will answer many of your questions!

Three years ago, I became a Chemistry Liaison Librarian at the University of North Texas. My last chemistry course was in 1986! The challenges and thrills that have occurred since then will be presented. Lessons learned and recommended preparations will be discussed. Survey results and comments from chemistry faculty on the traits and skills they desire in chemistry librarians and what they want from a chemistry liaison librarian will be offered. Examples and advice from other practicing chemistry librarians will be included to provide a well-rounded information session.

CINF 18:  Opportunity knocks: Chemical information careers in industry
David A. Breiner, Technical Information Center, Cytec Industries Inc, 1937 West Main Street, Stamford, CT 06904, Fax: 203-321-2985, david.breiner@cytec.com

Working in an industrial information center requires a vast array of skills and talents, and can be an extremely rewarding and challenging career. Whether searching online databases, designing educational webpages, or conducting training sessions, today's information professionals must understand their customers' needs first and foremost. The rapidly changing technology landscape requires information professionals to proactively deliver valuable solutions and services that drive productivity for their organization. Simply stated, they must get the right information to the right people at the right time. Therefore, developmental opportunities must always be sought to gain the necessary experience to be successful in industry.

This presentation will reflect on a 14 year career in chemical information ranging from sales to management. Highlighted experiences will include working as an account representative, searching chemical and patent literature, training end-users, building websites, and managing a technical information center. Lessons learned and career strategies will also be shared.

CINF 19:  From lab chemist to patent searcher: Why, what, and how
Randall K. Ward, Science & Maps, Brigham Young University, Harold B. Lee Library 2320, Provo, UT 84602, Fax: 801-422-0466, randy_ward@byu.edu

If one is a practicing lab chemist and is looking at different career options, patent/information searching is one to seriously consider. In the form of questions, this presentation will specifically cover three aspects of becoming a patent searcher. First explored will be “Why would one want to be a patent searcher?” Most of the observations in this section come from years of personal experience. Secondly, “What does a patent searcher do?” This section will cover the kinds of work involved as well as a typical “day in the life of . . . “. The third question is “How would one become a patent searcher?” In this section, some common threads in the career progression to patent searching will be explored as well as the author's own personal path. Interspersed within the presentation will be slightly liberal doses of advice on career planning from the author's own experience.

CINF 20:  Chemical information careers at U.S. GOCO research laboratories
Diane M. Kozelka, Rio Rancho, NM 87144

Technical information specialists are continually challenged, when helping their customers find the right answer for that obscure question. When working for a government-owned, contractor-operated (GOCO) facility, that usually occurs every week! The technical information needs of a GOCO technical library are very similar to any other technical library, with one large exception -- classified requests. I will discuss the unique resources that a GOCO technical library has access to (especially since 2001), and which resources are available to non-governmental organizations as well. Additional information about the US GOCO labs will be mentioned, if time permits.

CINF 21:  Chemical information in not-for-profit nirvana
Anne T. O'Brien, Creative Connections, 15 Crest Drive, Tarrytown, NY 10591-4305, Fax: 914-631-5241, ronanne@attglobal.net

Foundations, societies, public radio and television, hospitals, high schools, public heath, emergency, and world service organizations, have purpose. They serve society and each of us. What is particularly challenging, especially demanding, most rewarding for a chemical information professional working in these environments? Which individual human traits are needed? What are the unusual opportunities? What is uniquely compelling about working in these surroundings? Why do individuals choose the non-profit sector? The presentation will use examples from well-known organizations to initiate discussion of the financial, human, technical, time-pressure, and career development challenges – and the potent corresponding rewards – of serving in such settings.

CINF 22:  So you are thinking of becoming an online information entrepreneur
Alan Engel, Paterra, Inc, 526 N Spring Mill Road, Villanova, PA 19085-1928, Fax: 610-527-2041, aengel@paterra.com

The financial and technical barriers to becoming an online information vendor are as low as they have ever been. The sci-tech information market is broken and in need of innovation. Open Access and other initiatives are roiling the waters and making raw information materials increasingly available. Is it time to contribute your talents to the fray as an online information entrepreneur? The author will provide pointers drawn from 18 years of experience as an independent consultant, translator and online information vendor.

CINF 23:  Careers in science writing and publishing
Lynne Friedmann, Freelance Science Writer, P.O. Box 1725, Solana Beach, CA 92075, Fax: 858-793-1144, lfriedmann@nasw.org

To individuals who love science but not necessarily lab work science writing sounds appealing as a career alternative. But it's a highly competitive field that requires specialized training and in many cases the mind-set of a small-business owner. People who write about science for a living fall into two broad categories: 1) science journalists who are staff reporters for news organizations or freelance writers who write for magazines and the Web, and 2) science writers who find work as public information officers for universities, government science agencies, and research institutions or as public relations professionals for industry. In the publishing arena, technically trained individuals work as acquisition editors for major publishing houses or university presses. Nonfiction book writers author original works, co-author/edit book with other scientists, or "ghost write" manuscripts. The common denominator in all these endeavors is communicating science in an accurate yet compelling manner. Training requirements, science-writing programs, lifestyle issues, and strategies for entering the field and building a science-writing career will be discussed.

CINF 24:  Career opportunities in computational chemistry and computer-assisted drug design
J. Phillip Bowen, Center for Drug Design, Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, 401 New Science Building, PO Box 26170, Greensboro, NC 27402-6170, Fax: 336-334-5402, jpbowen@uncg.edu

Computer-based methods have changed the world, particularly scientific research. Computational chemistry may be defined as the use of theory and computer technology to calculate molecular structures, properties, and related effects. Today computational chemistry methods are widely used in industrial and academic settings throughout the world to gain insight into chemical and biochemical problems at the molecular level. Over the years the uses of computer-based methods in drug design have been successful in predicting biological activity. With the increasing awareness of the power of computational chemistry, new career opportunities have emerged. This presentation will focus on discussing career options in computational chemistry.

CINF 25:  Sharing chemical information without sharing chemical structure
Lingling Shen1, Karl M. Smith2, Brian B. Masek2, and Robert S. Pearlman1. (1) Laboratory for the Development of CADD Software, University of Texas, College of Pharmacy, Austin, TX 78712, Fax: 512-471-7474, shenl@list.phr.utexas.edu, bob.pearlman@optive.com, (2) Optive Research, Inc

There are various reasons for which scientists might want to share measured and/or calculated properties or “descriptors” of chemical compounds without revealing the actual chemical structures of those compounds. However, there is growing concern that, using emerging software technology, the chemical structures could be deduced from the chemical information which is shared.

We will briefly describe software technology which, unless precautions are taken, can indeed be used to deduce chemical structures from chemical descriptors. We will also discuss how the ability to deduce structure depends upon which descriptors or which combinations of descriptors are used. Lastly, we will suggest a simple but very effective mechanism by which chemical information (descriptors) can be shared in a manner which enables the desired use of the information but which thwarts efforts to deduce the corresponding chemical structures.

CINF 26:  How to reveal without revealing
Ruben Abagyan1, Eugene Raush2, and Levon Budagyan2. (1) Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road TPC-28, La Jolla, CA CA 92037, abagyan@scripps.edu, (2) R&D, Molsoft LLC

Safe exchange of data associated with chemical compound along with the essential descriptors of the compound, but without revealing its structure is highly desirable. Solving this problem may dramatically expand the public knowledge base on physico-chemical and biological properties of compounds. We present statistical analysis of the difficulty of deciphering the chemical structure and make recommendations on how to modify this process to make it more robust and safe.

One idea is to add artificial numerical noise to the descriptors to the degree which can be tolerated by the property prediction methods. For example, knowing the molecular mass of a compound to four-to-five decimal places is sufficient to derive the molecular formula (still not the structure), while knowing the molecular mass to 1-to-10 dalton accuracy makes cracking the formula next to impossible. At the same time, the druggability rules may easily tolerate that 1-to-10 dalton uncertainty in the mass value.

The deciphering complexity depends strongly on the initial conditions of the task. There are two radically different situations, namely, searching among the ~20 million available/known compounds, or searching among a virtually infinite number of the theoretically possible compounds. We demonstrate that recognizing a compound from a database of available compounds using a set of descriptors is a relatively easy but not always unambiguous task. However, finding a non-available theoretical compound using rounded or distorted numerical descriptors, as well as finite length chemical fingerprints is practically impossible.

CINF 27:  Reverse engineering chemical structures from molecular descriptors: How many solutions?
Jean-Loup Faulon, William M. Brown, and Shawn Martin, Computational Biology Dept, Sandia National Laboratories, P.O. Box 969, MS 9951, Livermore, CA 94551, Fax: 925-924-3020, jfaulon@sandia.gov

Physical, chemical and biological properties and are the ultimate information of interest for chemical compounds. Disregarding the information sharing system one designs, this system should allow for the calculation of such properties and activities. Molecular descriptors that map structural information with activities and properties are obvious candidates for information sharing. In this talk we examine to what extent the sharing of chemical descriptors is safe, by computing how many structures in the chemical universe match a given set of descriptor values. Precisely, we examine several classical 2D descriptors (from the CODESSA software package) and molecular fragments (signature descriptors) for various properties including log P and IC50. We first select sets of descriptors that provide meaningful QSARs for the chosen properties. Next, we stochastically search (using a bond swapping algorithm JCICS 1996, 43, 731) and deterministically count and enumerate (JCICS 2003, 43, 721) the compounds matching the selected descriptors.

CINF 28:  Possibilities for transfer of relevant data without revealing structural information
Omoshile O. Clement and Osman F. Guner, 9685 Scranton Rd, Accelrys Inc, San Diego, CA 92121-3752, omoshile@accelrys.com

In this paper, we will discuss how we have approached the problem of keeping structural information proprietary in the early years of predictive ADME/Tox model development. At that time, scientists in the industry wanted to evaluate the predictive models, but were not willing to share their structures. At the same time, the commercial model developers were willing to run the scientists' structures trough the model, but they were not willing to reveal which descriptors were important for a particular predictive model. We developed a process where the scientists could perform calculation on a broad number of commercially available public descriptors and forward this property file, instead of the structures. Meanwhile, the model developer could extract those descriptors that are used in the predictive model, run the model and pass on the results back to the scientist. We will discuss pros and cons of such approach. We propose to address questions such as: Can structural information that is proprietary be compromised from descriptors in ADME/Tox models? And can ADME/Tox predictions be made purely from descriptors without the need explicit knowledge of chemical structures, proprietary or otherwise?

CINF 29:  Screens as a secure descriptor of chemistry space
Nikolay Osadchiy and Sergey Trepalin, Department of Chemoinformatics, ChemDiv, Inc, 11558 Sorrento Valley Rd, San Diego, CA 92121, Fax: 858-794-4931, no@chemdiv.com

Chemical structure provides exhaustive description of a compound, but it is often proprietary and thus an impediment in the exchange of information. An effective representation of structural properties of a chemical library can be made with Screens - a set of substructures pertaining to this library. We define Screen as a structural fragment, centroid of N-bond lengths radius between the central atom and the atoms maximally remote from it. Screens, and their occurrence frequencies, are gathered for each atom being used as a center and for each compound in the library. Using Screens descriptor, we can assess its similarity to another library and select compounds which enrich its chemistry space or, alternatively, fill its voids. While providing a relevant description of the compounds, the descriptor conceals real structures and can facilitate the exchange of sensitive information. A case study about Screen descriptor applications at ChemDiv will be presented.

CINF 30:  Why relevant chemical information cannot be exchanged without disclosing structures
Dmitry Filimonov and Vladimir V. Poroikov, Russian Academy of Medical Science, Institute of Biomedical Chemistry, Pogodinskaya Str., 10, Moscow 119121, Russia, Fax: 007-095-245-0857, dmitry.filimonov@ibmc.msk.ru, vladimir.poroikov@ibmc.msk.ru

For usual confidential exchange of information between two or several persons traditional cryptographic means can be applied. It is easy to show that any meaningful (relevant) information about chemical structures can be used for search of either a particular compound itself or its close analogues. Since the meaningful information is presented by different descriptors, set of these descriptors can be used as a fingerprint to search for a particular molecule itself or molecules with a particular property. The success of recognition depends only on the number of used descriptors. However, this information may be not enough for appropriate QSAR/QSPR investigations. Some case studies based on the analysis of NCI and MDDR databases will be presented.

CINF 31:  Are topomers a useful representation for “safe exchange of chemical information”?
Richard D. Cramer, Chief Scientific Officer, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241, dcramer@tripos.com

Encoding molecules into a useful but non-structurally-revealing representation is a difficult problem. Different applications will require different representations. However for applications involving biological or other shape-related effects, topomer properties have several relevant and particularly well-characterized behaviors. Thus topomers exemplify a relatively specific candidate structure encoding, whose strengths and weaknesses as a useful representation for “safe exchange of chemical information” may be instructive to consider.

CINF 32:  The perfect storm: Electronic publishing and the Internet
Stephen R. Heller, Physical and Chemical Properties Division, NIST, Gaithersburg, MD 20899-8380, srheller@nist.gov

The frenzy of Open Access has come to the publishing scene in the past 1-2 years like a major storm. With each month come new activities in this area. Much is being said and written about Open Access, with very strong proponents for and against Open Access.

Organizations that fail to recognize and confront technological and market changes often tend to lose their positions, if not their organizations. History is replete with such examples. In the 18th century the power looms replaced the handloom weavers, In the early 20th century the horse and buggy industry giving way to automobiles, In the late 20th century the airplane replaced the train and boat for long distance traveling. Now, at the start of the 21st century the technology of the Internet is threatening the way in which the 3+ century old scientific publishing industry and libraries which subscribe to scholarly publications have done business for many decades.

In this presentation the author promises to provide many facts, many extreme opinions, and no solutions.

CINF 33:  Scientific and technological data in society
René Deplanque, FIZ CHEMIE Berlin, Franklin Str. 11, 10583 Berlin, Germany, deplanque@fiz-chemie.de

The use of scientific data has changed over the years. In the past very large databases, both bibliographical and factual, where build up as large archiving and retrieval systems for published data. Within the last years a concentration process took place in databank production. Hardly any new database entered the market. The use of databases today is commonplace and they are accepted tools within the scientific working process. But with the advance of the Internet, evolving Grid technology and the open access initiatives new ways of handling and distribution of data will change the functions and applications of information systems. As the user of yesterday was satisfied by finding the appropriate publication nowadays, for the user of information systems the direct application of information within the scientific process is of greatest importance. Networking of computers to calculate immense amounts of experimental data, networking of experiments, and easy inexpensive access to a full text publications is changing the scientific community. This talk will give an overview where we are and what we have to expect next, and how this will effect the everyday work of the scientist.

CINF 34:  Open access and the Chemical Semantic Web
Peter Murray-Rust, Unilever Centre for Molecular Informatics, University of Cambridge, University Chemical Laboratory, Lensfield Road, CB2 1EW Cambridge, United Kingdom, Fax: +44-1223-763076, pm286@cam.ac.uk, and Henry S. Rzepa, Department of Chemistry, Imperial College London

We have developed the Chemical Semantic Web so that computers can understand primary publications and act upon them. An autonomous machine could read and understand an issue from J. Med. Chem., extract the information, run high-throughput computations and systematize the results leading to new scientific insights.

For robots the most exciting and most tractable part of scientific publications are formalized presentations of data (e.g. analytical proof of synthesis) and supplemental data (e.g. crystallography and spectra). We argue that these are "facts" under the Berne Copyright convention and therefore re-usable without hindrance. For many decades humans have manually abstracted articles and produced compilations and we argue that robots can do the same to great communal benefit. However it appears that some publishers now see a journal as a database and may regard chemically-aware robots as unacceptable under their license terms.

The public Semantic Web currently depends on complete absence of barriers to the re-use of information. Robots cannot currently negotiate license agreements, logon to sites, or make micropayments. We see Open Access, especially to data, as an exciting opportunity to transform chemical informatics and provide a global knowledge base. We shall present arguments that funders, researchers, editors and readers should promote a model of publication for Open Data.

We shall provide online demonstrations of the power and potential of the Chemical Semantic Web based on Open Access to primary publications.

CINF 35:  RDF-based molecular relationships, the Semantic Web and the future of scientific publishing
Henry S. Rzepa, Department of Chemistry, Imperial College London, South Kensington Campus, London SW7 2AY, United Kingdom, h.rzepa@imperial.ac.uk, Omer Casher, Information Architecture and Engineering, GlaxoSmithKline, and Peter Murray-Rust, Unilever Centre for Molecular Informatics, University of Cambridge

We describe an XML/RDF model developed to improve the classification and (open) accessibility of chemical information within the de facto output of electronic journals. This model enhances the Adobe eXtensible Metadata platform (XMP), an RDF vocabulary which can be readily embedded in text documents such as SVG or CML (Chemical Markup Language), or a variety of binary documents which support it such as PDF or JPEG. Molecular structures for given journal articles are represented as unique INChI identifiers and embedded in electronic articles as part of the XMP. By extracting this XMP from multiple and related articles and managing it with an RDF repository, expandable lightweight Chemical Ontologies, fine tuned to a scientist's research needs can be auto-generated. The use of Semantic Web technologies to link the Chemical Ontology with related resources on the Web is explored. Here, using INChIs as the nodes for establishing the relationships provides a "semantically intuitive" alternative to text based relationship mapping.

CINF 36:  Movement toward open access: Why new models of research communication are inevitable
Ann J. Wolpert, Director of Libraries, Massachusetts Institute of Technology, 14S-216, 77 Massachusetts Avenue, Cambridge, MA 02139, awolpert@mit.edu

Advances in computing and communications technologies over the past decade have introduced significantly disruptive technologies into both the conduct of research and traditional systems of research reporting and scholarly communication. The open access “movement” developed as a response to two separate phenomena. First, researchers and educators began to use and appreciate the power of new computational and communications technologies in their research, teaching, and collaboration. Second, these same researchers and educators became aware that control over the record of published research was moving into the proprietary hands of publishers who did not always share their values, and that such control might well stifle scientific progress and diminish learning opportunities in the 21st century. Publishers, scientists, librarians, and universities need to move beyond the current narrow debate about the sustainability of 20th century publishing models. Scientists and educators will not turn back from the advantages of new computing and communication technologies. It is time to devise new models of scientific publishing that support the larger interests of research and education.

CINF 37:  Open access and the BERLIN DECLARATION: The MPG strategy
Robert Schlögl, Department of Inorganic Chemistry, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, Berlin 14195, Germany, Fax: +49-30-8413-4401, acsek@fhi-berlin.mpg.de, and Theresa Velden, Heinz Nixdorf Zentrum für Informationsmanagement in der Max-Planck-Gesellschaft

The Internet drives a transformation of the scientific discovery and dissemination processes. It is currently used as multifunctional tool to support the traditional work flows. The vision of MPG is to integrate the internet into the scientific work flows. The realisation of “e-science” that is attempted by research institutions world-wide requires creative solutions on several levels of legal, organisational and technical dimensions. Open access, the unlimited free and immediate access to all materials of scholarly interest is a corner stone of e-science. Many components of e-science exist already today as disciplinary or island solutions. A key effort is needed to link and exchange their information content over national, institutional and disciplinary borders.

CINF 38:  Open reader access, a better business model? A view from the STM-Association
Pieter Bolman, International Association of Scientific, Technical & Medical Publishers, The Hague, Netherlands, bolman@stm.nl

The STM-Association is a global organisation and is emphatically 'business model neutral'. STM's main concern in the Open Access debate is that new business models are sustainable in such a way that continuity and enhancement of access for researchers, scholars, and practitioners is guaranteed, that it attracts innovation, and that the publishing system maintains its independence from any national government. We will apply these criteria when examining the current status of both Open Access Publishing per se and open access via the 'self-archiving' route.

CINF 39:  Springer Open Choice: evolution, not revolution
Derk Haank, Chief Executive Officer, Springer Science+Business Media, Heidelberger Platz 3, Berlin 14197, Germany, derk.haank@springer-sbm.com

In response to Open Access, Springer is now letting its authors decide: They can choose between the traditional publishing model and an additional new model, Springer Open Choice. In the latter model, it is the authors and not the users who pay for publishing quality and service. The paper is then accessible via the Internet free of charge to anyone interested. This would make things cheaper for libraries, but it also means that funds would be diverted. Scientists and researchers now have an opportunity to show how serious they are about wanting Open Access. We're prepared to experiment.

CINF 40:  Secure statistical analyses on distributed databases
S. Stanley Young1, Alan Karr2, and Ashish P. Sanil2. (1) Bioinformatics, National Institute of Statistical Sciences, PO Box 14006, Research Triangle Park, NC 27709, genetree@bellsouth.net, (2) NISS

A principal reason for sharing chemical data is to conduct analyses of the combined data that are more powerful and informative than analyses of the individual databases. The impediments to "full" sharing are well known: proprietary information, the scale of the data, and even the reluctance to disclose who "owns" particular data points. Trusted third parties, whether human or machine, are not seen as feasible strategies.

We show how computer science concepts known as secure multi-party computation (specifically, secure summation) can be used to perform two important classes of statistical analyses--regression and recursive partitioning--for "horizontally partitioned" data. That is, the databases contain the same attributes (for example, chemical descriptors) for different sets of compounds. The basis of the methods is secure sharing of data summaries that are sufficient (Indeed, they are known as sufficient statistics.) to conduct the analyses. We also note how secure database query techniques can be used to deal with "duplicate" compounds that may be in more than one of the databases.

The techniques will be illustrated with applications to real data.

CINF 41:  Encoding molecular structures as ranks of models: A new, secure way for sharing chemical data and development of ADME/T models
Igor V. Tetko, Institute of Bioorganic & Petrochemistry, Kiev, Ukraine and Institute for Bioinformatics, Neuherberg D-85764, Germany, itetko@vcclab.org

In order that the lead compound will become a drug it has to possess a number of important ADME/T properties, e.g. favorable lipophilicity and solubility. The poor ADME/T profiling of drugs may result in their fail during the late stages of development. Some companies have experimental databases of such properties. A sharing of these data could develop much better models for the whole community but the proprietary value of chemical structures is a major impediment to do this. Recently we developed ALOGPS program (http://www.vcclab.org) . It can incorporate the user-specific data and dramatically improve its prediction ability for similar series of compounds. The external molecules are represented in it as ranks of 64 neural network models, i.e. as an array of 64 numbers where each number is in [0,63] range. Such representation makes it impossible to disclosure the underlining chemical structures and allows a secure sharing of corporate data.

CINF 42:  Open access, open minds
Andrea Twiss-Brooks, University of Chicago, John Crerar Library, 5730 S. Ellis Ave, Chicago, IL 60637-1403

Discussions of open access publishing are characterized by highly charged rhetoric and nearly religious fervor. Proponents of open access are highly visible, and often occupy what appears a moral high ground. Publishers are coming under significant pressure from government authorities, scientific communities, and other parties to move to open access models of publishing journals. Libraries and their institutions are caught in the middle, wanting to support what is best for scientific communication, while coming to grips with the organizational and financial implications of transition to new publishing models. At this time, even the most highly touted open access publishing efforts should be considered experiments. Open access publishing carries both risks and benefits for these various stakeholders. This presentation will attempt to identify major risks and benefits of open access publishing for libraries and their organizations and the data needed by those organizations to make responsible decisions regarding open access.

CINF 43:  Wide road to open access
Nicholas R. Cozzarelli, Department of Molecular and Cell Biology, University of California, Berkeley, 16 Barker Hall MC 3204, Berkeley, CA 94720-3204

Scientific publishing is undergoing a revolution, but thus far chemical journals have stayed on the sidelines. I suggest that they start with releasing back content six months after publication, preferably at PubMed Central. I think the financial loss will be minimal and the gain to chemists, from students to professionals, will be enormous. The ACS runs many of the best journals in chemistry. It is an organization with a proud past and should now play a leadership role in shaping the improved access to the scientific literature. Many others would follow their lead. I will also discuss additional aspects of Open Access that are followed by the journal I edit, the Proceedings of the National Academy of Sciences.

CINF 44:  Chemistry journals: A modest proposal
Steven M. Bachrach, Department of Chemistry, Trinity University, 1 Trinity Place, San Antonio, TX 78212, Fax: 210-999-7569, sbachrach@trinity.edu

Solutions to the journals crisis have coalesced around a small number of options: open access, preprint archives, embargo periods, consortia arrangements. These efforts focus on the concern of ever-rising costs of STM journals. While I will briefly suggest that enhanced publication is the real publication revolution awaiting the STM world, I will offer a proposal for re-positioning of the journal components amongst the interested parties (authors, universities and chemical industries, publishers, and the abstracting/indexing services) that preserves their value-added roles yet allows for potentially cheaper dissemination of information.

CINF 45:  Open access publication: One editor’s perspective
Lawrence J. Marnett, Biochemistry, Vanderbilt University School of Medicine, 23rd Ave at Pierce, Nashville, TN 37232-0146, Fax: 615-343-7534, larry.marnett@vanderbilt.edu

Electronic publishing has had a dramatic impact on scientific publishing. The speed of submission, review, and access is significantly improved and the numbers of libraries subscribing to packages of journals produced by single publishers has increased. Most institutional subscriptions provide unlimited access to all users within the institution's network. However, for individuals not affiliated with an institution, electronic access to a range of journals is very uneven. Multiple proposals have been made to provide unlimited access at no charge to articles 6-12 months after their publication. Implementing open access is a desirable goal but it presents significant challenges to scientific publishers, particularly those affiliated with non-profit societies. The presentation will focus on some of the key issues as seen through the eyes of an editor of an American Chemical Society journal.

CINF 46:  Publishing implications of open archiving proposals: An examination of academic chemistry research funding sources
George S. Porter, Caltech Library System, 1-43, Pasadena, CA 91125-4300, Fax: 626-431-2681, george@library.caltech.edu

Speculation is currently rife about the possible impact of the National Institutes of Health (NIH) proposed mandate for open archiving of all NIH-funded research. The speculation making the rounds is routinely devoid of data, which seriously undercuts one's ability to judge the probability of any projected future for the STM publishing industry and scholarly communication. Similar initiatives have been proposed by the Parliament Science & Technology Committee and by the Wellcome Trust, a charitable source of funding for biomedical research.

We reviewed the funding sources acknowledged by authors from six leading US chemistry departments (Caltech, Harvard, MIT, Stanford, Yale, and UCSD) in their journal articles published in 2004. In addition, a corresponding survey was conducted of the journal articles produced from Oxford and Cambridge universities.

Alternative Open Access models include the “author pays” Open Access journal concept. The same analysis of funding sources and publication frequency could be used to project the additional costs associated with the dissemination of research results within this model and the funding sources which might be expected to cover those fees. An analysis was prepared of the declared funding sources in the research articles of PLoS Biology, PLoS Medicine, and 4 BMC titles for the period 2003-2004. These were compared with the funding sources acknowledged in a month's worth of research articles from Nature, Science, and JAMA, and a single issue of PNAS, JACS, and Chemical Communications. We attempt to discern whether the authors' funding sources influence their choice of journal in which to publish.

CINF 47:  Practical use of scientific and engineering information at United Technologies and Hamilton Sundstrand
Suzanne Cristina, Information Research, Hamilton Sundstrand, 1-3-BC38, One Hamilton Road, Windsor Locks, CT 06096, suzanne.cristina@hs.utc.com

Corporations conduct numerous engineering/scientific/business projects each year. Increasingly, the output of this research is in electronic formats including documents and datasets and databases. This technical intelligence is stored in a variety of formats such as document management systems or records management systems. However, in many corporations, technical intelligence is generally hard to discover and reuse especially after the project is completed. This presentation will cover how United Technologies is taking a basic business driver and utilizing it to create, develop, sustain and reuse technical information throughout the corporation.

CINF 48:  Aqueous solubility prediction using 7,000 compounds
Paulius J. Jurgutis, Andrius Sazonovas, and Pranas Japertas, Pharma Algorithms, Inc, 591 Indian Road, Toronto, ON, Canada, jurgutis@ap-algorithms.com

Aqueous solubility of a compound can be characterized by multiple means. For example, consider the "intrinsic SW" vs. "characteristic SW", SW in pure water vs. SW in buffer, SW of free electrolytes vs. SW of salts, SW by dissolution vs. SW by precipitation, etc. Different types of solubilities can be described by different superpositions of three factors - crystallization, solvation, and ionization. Provided that the influence of ionization can be estimated from pKa calculations, solvation and crystallization remain the most important factors. Most frequently they are crudely estimated by the following expression: - log SW » log P + mpo, where mpo is melting point divided by 100. For hydrophilic compounds with log P < 0 this equation produces deviations of up to 6 log units. Similar deviations are also observed in any other computational models that neglect fine crystallization effects caused by certain structural ensembles. In this work we describe a new method that automatically captures these ensembles, producing accurate SW estimations for new compounds. Predictive algorithm development was performed using an off-hand approach that is available in Auto-Builder (a new software package from Pharma Algorithms). The resulting algorithm is based on the analysis of characteristic SW values for 7,000 compounds, most of which were crystalline electrolytes. Predictions include characteristic SW values in un-buffered water, 95% confidence intervals of SW predictions, and solubility in buffer under pH 2 - 12. The obtained calculations can be very useful in practical estimation of SW from different experimental assays.

CINF 49:  Estimation of estrogen receptor binding affinity using theoretical molecular descriptors
Denise Mills1, Subhash C. Basak1, and Douglas M. Hawkins2. (1) Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, Fax: 218-720-4328, dmills@nrri.umn.edu, (2) School of Statistics, University of Minnesota

Calf estrogen receptor binding affinity was modeled using the quantitative structure-activity relationship approach for a set of 46 compounds consisting of 2-phenylindoles and 5,6-dihydroindolo[2,1-α]isoquinolines. Molecular descriptors based solely on chemical structure were partitioned into three classes based on level of complexity and demand for computational resources. The topostructural descriptors encode information strictly on the adjacency and connectedness of atoms within a molecule, while the topochemical descriptors encode chemical information such as atom and bond type in addition to topological information. The geometrical or 3-dimensional indices encode three-dimensional aspects of molecular structure. For comparative purposes, three regression methods were used, namely ridge regression (RR), partial least squares (PLS), and principal components regression (PCR). Results indicated that RR generally outperforms PLS and PCR, and acceptable models were obtained from the use of the topochemical descriptors alone.

CINF 50:  Alchemist Club at Missouri Western State College
Janessa M Hovey, Jessica M McKinzie, Cindy M Peters, LeeAnn M Schuster, Alexa Cook, Shellney A Oehlert, and Michael B Mears, Alchemist Club, School, 4525 Downs Drive, St. Joseph, MO 64507, jmh7742@mwsc.edu

The Alchemist Club at Missouri Western State College has been on the campus since the 1980s. The Chapter even received an Outstanding Chapter award from the American Chemical Society in 1985-1986. Over the course of the past few years, the Alchemist Club has decreased in numbers of members. One of the major goals for this year was to get the club back rolling with activities on Campus and in the Community. Activities for the year have included a booth at the Campus Family Day, a float for Homecoming, participating in Super Science Saturday and hosting a Boy Scout Workshop. In doing so, the club now has many new members and may even have the largest numbers in the history of our Chapter.

CINF 51:  Application of rough set theory to structure-activity relationships
Joachim Petit, Pharmacology and Toxicology, University of Arizona - College of Pharmacy, 1703 E. Mabel street, PO Box 210207, Tucson, AZ 85721-0207, Fax: 520 626 2466, petit@pharmacy.arizona.edu, and Gerald M Maggiora, Department of Pharmacology and Toxicology, University of Arizona

Rough set theory (RST), developed more than 25 years ago by Pawlak, provides a powerful means for organizing and analyzing data. RST is a set-based method that uses equivalency relationships to group objects with similar attributes into indiscernability classes, which are the basis for the development of decision rules. The present work focuses on an application of RST to structure-activity relationships. A brief introduction to RST will be presented along with an example of how it can be applied to develop decision rules from structure-activity data.

CINF 52:  Canonicalized systematic nomenclature in chemoinformatics
Jeremy J Yang, OpenEye Scientific Software, 3600 Cerrillos Road, Suite 1107, Santa Fe, NM 87507, Fax: 505.473.0833, jj@eyesopen.com

A fundamental task of chemistry is identifying distinct chemical entities. In chemoinformatics, species must be specified rigorously to facilitate unambiguous expression of chemical data and knowledge. A theoretically equivalent task is determining the equality of two molecules. However, the meaning of sameness or identity depends upon the context or hierarchical chemical level of abstraction, for example, whether stereochemistry or tautomerism is considered. An important subset of this problem can be addressed by graph theory which applies well to valence models for covalently bonded molecules. Algorithms generating canonical (unique) identifiers for chemical graphs exist and are available. However, due to the multiple contexts mentioned, a single algorithm is not sufficient to solve all problems. This study reviews some existing canonicalization methodology and describes new methods implemented by chemoinformatics library OEChem and other OpenEye tools.

CINF 53:  Data publication @ source via the open archive initiative
Simon J. Coles1, Jeremy G Frey1, Michael B. Hursthouse1, Leslie A Carr2, and Christopher J Gutteridge2. (1) School of Chemistry, University of Southampton, Southampton, United Kingdom, Fax: 442380596723, S.J.Coles@soton.ac.uk, (2) School of Electronics and Computer Science, University of Southampton

A crystallography-based examplar for open archive publication of scientific data will be presented.

Advances in instrumentation and computation have caused an explosion of scientific data. However, this has not resulted in the expected growth of scientific databases and the reason for this can be clearly identified as a publication bottleneck. As a result of this situation, the user community is deprived of valuable information, and the funding bodies are getting a poor return for their investments!

Unlike other disciplines the chemical sciences have been reluctant or slow to embrace the 'preprint concept'. This poster outlines a pre-print procedure for the rapid and effective dissemination of structural information to the scientific community (eCrystals) which removes the lengthy peer review process that hampers traditional publication routes, but provides an alternative mechanism. eCrystals is built on a concept developed in the computer science community whereby an author may reveal archives of information to the public. eCrystals makes available all raw, derived and results data from a crystallographic experiment via a searchable and hierarchical system. Bibliographic and chemical metadata items, which are associated with the data, are published through standard protocols and therefore immediately and globally disseminated.

Hence scientific data may be disseminated in a manner that anyone wishing to utilise the information may access the entire archive of data related to it and assess its validity and worth. Recent advances in developing this approach to openly publish ANY form of chemical, or indeed scientific, data will also be presented.

CINF 54:  Designing libraries from HTS data: Hot fragments and activity models
Carolyn M. Barker and James E Mills, Molecular Informatics, Structure and Design, Pfizer Global R&D, Ramsgate Road (ipc 636), Kent CT13 9NJ, Sandwich, United Kingdom

Parallel chemistry and high throughput screening (HTS) are an integral part of Drug Discovery. HTS is routinely used to identify novel chemical series but the data, as a whole, are rarely used to drive compound design. This paper demonstrates that mining HTS data is key to designing information-rich libraries. We highlight the application and success of an array of new library design approaches, for example: Multiple-target activity models and mining HTS data at the fragment level (existing in available monomers). Key issues, such as how to optimise multiple dimensions (primary and secondary pharmacology, ADMET and physical properties) will be discussed.

CINF 55:  Hierarchical quantitative structure-toxicity relationship (Hi-QSTR) modeling of aquatic toxicity and mutagenicity
Denise Mills, Subhash C. Basak, and Brian D. Gute, Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, Fax: 218-720-4328, dmills@nrri.umn.edu

Two toxicity endpoints were modeled using the hierarchical quantitative structure-toxicity relationship (Hi-QSTR) method, namely aquatic toxicity, LC50, for a set of 69 benzene derivatives and mutagenicity for a set of 95 aromatic and heteroaromatic amines. With the hierarchical approach, we begin with the least complex descriptors, the topostructural (TS), which encode information strictly about the adjacency and topological distances between atoms in a molecule. The topochemical (TC) descriptors encode chemical information, such as bond and atom type, in addition to information about molecular topology. The geometrical (3D) descriptors are more complex yet, encoding three-dimensional aspects of molecular structure. Finally, the quantum chemical (QC) descriptors encode electronic information. In particular, we were interested to see whether the addition of the quantum chemical descriptors, which are more demanding in terms of computational resources, results in significant model improvement. Marginal improvement in model quality was obtained upon the addition of such descriptors.

CINF 56:  MGE: A model generating engine and its applications
Sabine Schefzick, Discovery Technology (Scientific Computing), Pfizer Global R&D, 2800 Plymouth St., Bldg.28/G-131W/G-9, Ann Arbor, MI 48105, Fax: 734-622-2782, sabine.schefzick@pfizer.com, and Mary Bradley, Discovery Technology (Scientific Computing), Pfizer Inc

Abstract text not available.

CINF 57:  Mutagen/non-mutagen classification of congeneric and diverse sets of chemicals using computed molecular descriptors: A hierarchical approach
Denise Mills1, Subhash C. Basak1, Douglas M. Hawkins2, and Brian D. Gute1. (1) Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, Fax: 218-720-4328, dmills@nrri.umn.edu, (2) School of Statistics, University of Minnesota

Ridge linear discriminant analysis was used to classify a diverse set of 508 mutagens/ non-mutagens, as well as three structurally homogenous subsets, viz., 260 monocyclic carbocycles and heterocycles, 192 polycyclic carbocycles and heterocycles, and 124 aliphatic alkanes, alkenes, and alkynes. Software programs including POLLY, Triplet, Molconn-Z, Sybyl, and MOPAC were used to calculate a large and diverse set of theoretical molecular descriptors. Subsequently, the descriptors were divided into hierarchical classes based on level of complexity and demand for computational resources. Results indicate that inclusion of the more complex descriptors does not lead to a significant increase in model quality. In addition, correct classification rates for the relatively homogeneous subsets are comparable to those obtained for the entire set of 508 diverse compounds, indicating that the diverse set of theoretical descriptors is capable of representing the diversity of structural features present in the data set.

CINF 58:  NMR spectral invariants as numerical descriptors for diastereomers and enantiomers
Ramanathan Natarajan and Subhash C. Basak, Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, Fax: 218-720-4328, rnataraj@nrri.umn.edu

Topostructural or topochemical invariants derived for a molecule from its molecular graph (hydrogen included or suppressed) based on the edge count or information content can differentiate structural isomers. However, they are incapable of differentiating geometrical isomers because the 3-D orientations of the atoms in a molecule are not considered in their computation. Although the next generation indices, the geometrical indices such as 3-D Weiner index, can account for molecular volume etc., diasteroisomers cannot be distinguished. While attempts have been made by Schulz et al., in this line, the indices created by them could not be applied in SAR modeling. NMR, a powerful tool in the hands of chemists, can differentiate diasteroisomers of a compound because it sees the three dimensional disposition, “environment”, of the protons in a molecule; that is to say it has a higher dimensional perception of a molecule than that of a chemist who tries to visualize using a molecular graph. This higher dimensional perception is used by us in converting NMR spectra into invariants. The new spectral invariant thus generated can differentiate diastereoisomers. The ability of NMR to differentiate diastereomeric compounds has been used to assign absolute configuration of several organic compounds after reacting with chiral derivatizing agents. We have shown how the 1H-NMR data of such derivatives can be used to calculate spectral invariants for the enantiomers of 1) chiral alcohols from their esters with 2-methoxy-2-(1-naphthyl)propionic acid and, 2) α-chiral carboxylic acids from their esters with ethyl 2-hydroxy-2-(9-anthryl) acetate.

CINF 59:  Partition of solvents–co-solvents of nanotubes: Proteins and cyclopyranoses
Francisco Torrens, Institut Univesitari de Ciencia Molecular, Universitat de Valencia, Dr. Moliner-50, EI-1-38, Burjassot (Valencia) 46100, Spain, Fax: 34-96-354-3156, Francisco.Torrens@uv.es

The main contribution to the water-accessible surface area of lysozyme helices is the hydrophobic term, while the hydrophilic part dominates in the sheet, what is related to the 1-octanol-, cyclohexane- and chloroform-water partition coefficients P_o-ch-cf of helices, which are greater than those of the sheet are. The analysis of atom-group partial contributions to log_P_o-ch-cf allows building local maps. The molecular lipophilicity pattern differentiates among helices, sheet and binding site. For a given atom, log_P is sensitive to the presence of other atoms. The contributions of C_70-a-c atoms to log_P are slightly greater than that of d-e are, which correlate with the distances from the nearest pentagon. (10,10) is the favourite single-wall carbon nanotube (SWNT), presenting consistency between a relatively small aqueous solubility and great P_o-ch-cf. Efforts to use fullerenes-SWNTs in therapeutic applications are re-evaluated.

CINF 60:  Prediction of biologic partition coefficients and binding affinities using QSAR models
Denise Mills1, Moiz M. Mumtaz2, Hisham A. El-Masri2, Douglas M. Hawkins3, and Subhash C. Basak1. (1) Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Hwy, Duluth, MN 55811, Fax: 218-720-4328, dmills@nrri.umn.edu, (2) Computational Toxicology Laboratory, Division of Toxicology, Agency for Toxic Substances and Disease Registry, (3) School of Statistics, University of Minnesota

For contaminants, toxicological data are usually not available to conduct health risk assessments. In such cases, ATSDR and other federal agencies often recommend the use of surrogate values obtained from computational tools such as quantitative structure-activity relationship (QSAR) techniques and physiologically based pharmacokinetic (PBPK) modeling. In an ongoing effort to develop alternative toxicity assessment methods, we have applied QSAR to compute: 1) tissue:air partition coefficients, including fat:air, liver:air, and muscle:air, for a group of 46 low molecular weight volatile organic compounds (VOCs); 2) blood:air partition coefficients for a set of 39 VOCs; and 3) aryl hydrocarbon (Ah) receptor binding affinity for a set of 34 dibenzofurans. The structural descriptors consisted of four classes based on increased level of complexity and computational demand: topostructural (TS), topochemical (TC), geometrical (3D) and quantum chemical (QC)