#225 - Abstracts

ACS National Meeting
April 23-27, 2003
New Orleans, LA

8:35 1 The ACS Committee on Professional Training's Library Survey: Is there a future for modern chemical information as a central component of education in chemistry?
Jeanne E. Pemberton, Department of Chemistry, University of Arizona, 1306 E. University Blvd, Tucson, AZ 85721, Fax: 520-621-8248, pembertn@u.arizona.edu

In the fall of 2000, the ACS Committee on Professional Training undertook a survey of all approved chemistry programs to ascertain the current situation with respect to library and chemical information resources and their accessibility to students. The results indicate a wide range of expenditures for chemical information between institutions of different size, and suggest a growing problem of affordability of modern chemical information resources, especially at institutions that offer only bachelor's and master's degrees in chemistry. The current state of chemical information resources at ACS-approved institutions based on the results of this survey will be presented, and the serious issues and concerns that these results raise will be discussed.

9:05 2 Evolving doors of access to ACS Web Editions.
Dean J. Smith, Sales & Marketing, American Chemical Society, 1155 16th Street NW, Washington, DC 20036, Fax: 202-872-6005, d_smith@acs.org - SLIDES

Since the inception of the Web Editions in 1998, ACS Publications has continually revised its pricing models to address the needs of all customers. As a result, the cost per article of ACS publications has steadily decreased over the years. Subscription prices for ACS journals have traditionally been as much as 40% less than the competition while providing the highest quality of chemical research. The Option B pricing model for Web Editions is a consortia-based approach allowing the widest range of access across the largest number of institutions from small to large. In the first two years of its existence, the ACS did not charge a consortia entry fee for institutions without ACS journals. The ACS has since taken an innovative approach and experimented with a number of entry fees for schools without any purchasing history. ACS Publications has completed a pilot study in 1999 with UCAIR institutions to measure usage levels at 33 small colleges. The results of these findings and an in-depth analysis conducted by an outside consultant on market penetration to small colleges has presented options for consideration to provide low-cost alternatives.

9:35 3 Serving academia: Adapting to the needs of scientific students and faculty.
Craig Stephens, Manager, North American Sales and Customer Support, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, cstephens@cas.org

A major component of the CAS mission is to meet the needs of the academic community. CAS has introduced many new products and programs in recent years to meet these needs with features, terms, and conditions adapted for the requirements of campus-wide access to scientific information for institutions large and small. The widespread and rapid acceptance of SciFinder Scholar by universities around the world exemplifies the features and requirements that constitute successful research for the academic community at B.S., M.S. and Ph. D. granting institutions. How CAS assessed and met such needs by drawing upon the input of the user community and our own experience will also be discussed.

10:05 4 Affordable tools for teaching undergraduates at small institutions and community colleges.
Patricia Kirkwood, Science Librarian, Pacific Lutheran University, Mortvedt Library, Tacoma, WA 98447, Fax: 253-535-7315, kirkwope@plu.edu - SLIDES

Community colleges and small undergraduate institutions that routinely graduate less than 20 chemistry majors a year are at a disadvantage when it comes to teaching chemical information literacy. The major resources are simply too expensive and are not available. At $10,000 plus per shared seat for Scholar, more than $5,000 for Chemical Abstracts Student Edition, and Web of Science priced even higher, smaller institutions cannot justify the expenditure either by the amount of research funding they receive nor by the number of students served. The librarian (or faculty member is a librarian is unavailable) must teach information skills using the products that are available for little or no cost. Information products like the NIST Web Book, CRC Handbook of Chemistry and Physics, General Science Abstracts, Basic BIOSIS, JStor, online encyclopedias and full-text database available through various vendors are the resources to consider. In this presentation, I will review resources that are affordable and propose a plan for teaching the basic chemical information literacy skills required by the CPT with the available tools rather than instruction that focuses on specific information tools and platforms. A well designed program that uses affordable tools can teach life long information literacy skills and prepare students well for the work world or graduate school.

10:35 5 How much is enough? CPT Guidelines and chemical information access in research universities.
David Flaxbart, Chemistry Library, University of Texas at Austin, Welch Hall 2.132, Austin, TX 78713, flaxbart@uts.cc.utexas.edu - SLIDES

While research libraries have not had much difficulty in the past meeting CPT library guidelines, the changing formats of chemical information and the soaring costs associated with them have brought an array of new challenges to even the largest and best-funded libraries. The CPT Library Survey of 2000 revealed a number of disturbing trends in the adequacy and affordability of access to required information, affecting all types of institutions. This presentation will outline some of the issues faced at a large ARL library, and examine some of the possible solutions.

11:05 6 Chemical information and chemical informatics literacy at a research university.
Gary D. Wiggins, Chemistry Library, Indiana University, 800 E. Kirkwood Avenue, Chemistry Building Room C003, Bloomington, IN 47405-7102, Fax: 812-855-6611, wiggins@indiana.edu - SLIDES

The Department of Chemistry at Indiana University offers four one-hour chemical information/informatics courses on the undergraduate level and two three-hour courses on the graduate level. Most of the courses have been taught via teleconferencing across two campuses during the past two years, with some lectures delivered from England in one graduate course. A mix of free and commercial software and databases is used in the courses. Methodology, software, and cost figures will be presented.

1:35 7 Green chemistry: Sustaining a high-technology civilization.
Terrence J. Collins, Department of Chemistry, Carnegie Mellon University, 4400 Fifth Ave., Mellon Institute, Pittsburgh, PA 15213-2683, Fax: 412-268-1061, tc1u@andrew.cmu.edu - SLIDES

Because we do not live in a sustainable civilization, sustainability has become the most important single idea for universities for the next century. Chemists are principal custodians of the technological challenges of sustainability. As quickly as possible, we must learn how to develop the research and educational programs that will be essential for steering our communal thinking and our technology base toward sustainable directions. In chemical research, three areas stand out as being vital—the invention of more efficient technologies for converting solar to electrical or chemical energy, the replacement of polluting chemical technologies with economical non-polluting substitutes, and the development of renewable feed-stocks for the chemical industry. These areas will be briefly sketched with an emphasis given to pollution reduction. Chemistry-oriented books and materials that are available, are becoming available, or that one hopes will become available to deal with the sustainability dilemma will be discussed.

2:20 8 Biological engineering: From blue roses to space suits.
Cory Craig, Physical Sciences and Engineering Library, University of California, Davis, One Shields Avenue, Davis, CA 95616, Fax: 530/752-4719, cjcraig@ucdavis.edu - SLIDES

Biological engineering is the application of engineering principles to biological and medical problems. Biological engineering research has played a key role in the development of innovations as diverse as: artificial limbs, space suits, and the production of synthetic vaccines. Evidence of the breadth and cross-disciplinary nature of this discipline is found in the many different fields of biological engineering, including: biomedical engineering, biochemical engineering, environmental engineering and agricultural engineering. This talk will outline the development and history of biological engineering, provide an overview of the types of research conducted in different areas of biological engineering, and identify major breakthroughs and current challenges in the field. This overview will help information professionals provide better reference assistance to library patrons in this exciting and evolving area of research.

2:50 9 Nano nonet: Nine things chemistry librarians need to know about nanoscience.
F. Bartow Culp, Mellon Library of Chemistry, Purdue University, West Lafayette, IN 47907, bculp@purdue.edu - SLIDES

The hot fields of nanoscience and nanotechnology deal with structures and materials in the 1-100 nm dimensional scale. While the concept has been discussed for over thirty years, only recently have experimental breakthroughs made reality out of theory. The purpose of this session is to give an overview of these fields, and to emphasize points of particular interest to chemistry librarians, including terminology and information resources.

3:20 10 Chemoinformatics, cheminformatics, chemical informatics: What is it?
Gary D. Wiggins, and Wendie Shreve, Chemistry Library, Indiana University, 800 E. Kirkwood Avenue, Chemistry Building Room C003, Bloomington, IN 47405-7102, Fax: 812-855-6611, wiggins@indiana.edu - SLIDES

The terms "chemoinformatics," "chemiinformatics," "cheminformatics," and "chemical informatics" are all used to describe a broad array of computer techniques and applications to solve chemistry problems. We will look at the areas that comprise chemical informatics by examining the topics in existing textbooks and other secondary sources. The identified topics will be mapped to the graduate courses in the chemical informatics program at Indiana University.

3:50 11 Combinatorial materials research: Opportunities, challenges, and successes.
Laurel A. Harmon, Striatus, 8703 Webster Hills Rd., Dexter, MI 48130, Fax: 734-661-0409, lharmon@striatus.com

The introduction of combinatorial methods into materials discovery introduces new challenges for both laboratory methods and for informatics. This talk highlights applications in which combinatorial methods are being successfully applied to materials, issues that arise in combinatorial materials research, and informatics strategies that are being developed. Despite many similarities, there are fundamental differences between high throughput approaches to drug and material discovery. Materials research imposes new requirements for data management, data storage and data analysis. New strategies for experiment planning are required to effectively navigate the high-dimensional experimental spaces. Different modes of combinatorial experimentation are outlined, mapping, screening and optimization, with examples drawn from current high throughput combinatorial materials research.

1:30 12 Recent developments in OpenURL (SFX) linking at the University of Chicago.
Andrea B. Twiss-Brooks, John Crerar Library, University of Chicago, 5730 S. Ellis, Chicago, IL 60637, Fax: 773-702-7429, atbrooks@midway.uchicago.edu

SFX (from ExLibris) is a linking technology based on the OpenURL protocol for creating customized links among diverse information products. The University of Chicago Library implementation of SFX to provide better management of electronic resources and improved service to the scholarly community is described.

The Library defined its electronic collection, and constructed rules that guide SFX in creating context-sensitive links. These customized, context-sensitive links use web-transportable packages of metadata to connect users to resources and services. Links to resources are dynamically generated to provide information about all available online copies. In addition, SFX services have been configured to include searches of rich print collections and additional information about journals.

Current and future developments described include an OpenURL generator/DOI resolver tool, a dynamically generated comprehensive online journal A to Z list, and additional SFX services such as automated interlibrary loan request generation.

2:00 13 Preserving data: The role of databases in future scientific discovery.
John Rumble Jr., Standard Reference Data, NIST, 100 Bureau Drive MS 2310, Gaithersburg, MD 20899-2310, Fax: 301-926-0416, john.rumble@nist.gov

A wide variety of methods have been used to save and preserve scientific data for thousands of years. The physical nature of these means and the inherent difficulties of sharing the physical media with others who need the data have been major barriers in advancing research and scientific discovery. The information revolution has changed this in significant ways: ease of availability, breadth of distribution, size and completeness of data sets, and documentation. As a consequence, scientific discovery itself is changing now, and in the future, perhaps even more dramatically. In this talk I will review some historical aspects of data preservation and the use of data in discovery. And I will provide some speculations on how preserving data digitally might revolutionize scientific discovery.

2:30 14 Knowledge discovery in a database of biochemical pathways.
Johann Gasteiger1, Martin Reitz1, and Oliver Sacher2. (1) Computer-Chemie-Centrum and Institute of Organic Chemistry, University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen 91052, Germany, Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de, (2) Molecular Networks GmbH

A database of biochemical reactions has been built on the basis of the Poster Biochemical Pathways originally produced by Boehringer Mannheim (now Roche Diagnostics). Each structure is represented by a connection table including stereochemical information. Reactions are coded by the bonds broken and made in the course of the reaction. It will be shown how models of the transition states of biochemical reactions can be developed and be compared with inhibitors of enzymes. Furthermore, the information allows the definition of similarity of reactions which is compared with standard enzyme classification.

3:00 15 Dynamic data evaluation: Algorithm development and analysis for thermodynamic properties of pure organic compounds.
Vladimir V. Diky1, Robert D. Chirico1, Xinjian Yan1, Randolph C. Wilhoit2, and Michael Frenkel1. (1) Thermodynamics Research Center (TRC), National Institute of Standards and Technology (NIST), Mailstop 838.00, 325 Broadway, Boulder, CO 80305, diky@boulder.nist.gov, (2) Texas Experimental Engineering Station, Texas A&M University System

Traditional critical data evaluation is an extremely time and resource consuming process, which includes extensive manpower applied to data collection, mining, analysis, fitting, etc. Furthermore, it must be performed far in advance of need, which has led to a significant part of the existing recommended data having never been used. The concept of “dynamic” data evaluation was developed by TRC at NIST, and requires large electronic databases (such as the TRC Source data system) capable of storing essentially all of the ‘raw/observed’ experimental data known to date with descriptions of relevant metadata and defined uncertainties. In combination with expert-system software this system allows production of recommended property values (with uncertainties) dynamically or ‘to order.’ Aspects of the implementation of Dynamic Data Evaluation will be discussed including thermodynamic consistency between related properties, selection of fitting equations, use of estimated properties, and uncertainty propagation. The output of the software being designed includes complete sets of thermodynamic property data with reliable uncertainties for any (including hypothetical) organic compound.

3:30 16 A self-organizing algorithm for extracting the intrinsic dimensionality of large high-dimensional data.
Dimitris Agrafiotis, and Huafeng Xu, Research Informatics, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Drive, Exton, PA 19341, Fax: 610-458-8249, agrafiotis@3dp.com

We present stochastic proximity embedding (SPE), a novel self-organizing algorithm for producing meaningful underlying dimensions from proximity data. SPE attempts to generate low-dimensional Euclidean embeddings that best preserve the similarities between a set of related observations. The embedding is carried out using an iterative pairwise refinement strategy that attempts to preserve local geometry while maintaining a minimum separation between distant objects. Unlike previous approaches, our method can reveal the underlying geometry of the data without intensive nearest neighbor or shortest-path computations, and can reproduce the true geodesic distances of the data points in the low-dimensional embedding without requiring that these distances be estimated from the data sample. More importantly, the method scales linearly with the number of points, and can be applied to very large data sets that are intractable by conventional embedding procedures. The advantages of the algorithm are illustrated using examples from the molecular diversity and conformational analysis literature.

4:00 17 From gene to lead: An architecture for cooperative drug discovery.
Stephen A. Baum, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, sbaum@accelrys.com, and Shikha Varma, 9685 Scranton Rd, Accelrys Inc

The challenges and rewards of multi-disciplinary cooperation in drug discovery efforts are well appreciated. While the rewards of collaborative research are enormous, the problem of knowledge management of data generated by chemists, biologists, and modelers remains a challenge. Even in today’s networked computing environments, incompatible file formats, disparate data sources, and lack of methods and application integration amongst other factors stall information transfer and use between scientists of differing disciplines. Applying data generated by colleagues working in differing scientific disciplines with divergent tools and methods can provide fresh perspectives in drug discovery. Even when results are successfully transferred to others, often the inability to understand how, when and why data were generated questions the utility and credibility of such information in subsequent studies. Discovery Studio applications are addressing many of the challenges associated with multi-disciplinary collaboration by providing an underlying multi-tiered system architecture that facilitates information and file transfer, data management, data mining and methods integration across various scientific disciplines. This discussion will describe the Discovery Studio architecture as well as automated electronic data capture, reporting, and promotion of data to a shared project workspace for cooperative drug discovery across various scientific disciplines. A virtual pharmaceutical drug discovery scenario presented here illustrates how computational methods can be applied within one environment - from gene to lead.

2:15 17 BTEC Informatics solutions for pharmacogenomics in a high-throughput environment: Discovery Studio GeneAtlas and AtlasStore
Lisa Yan, Azat Badretdinov, Mikhail Velikanov, Yin Yu, Michael Pu, Steven Potts, and Sándor Szalma. Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121

With the vast amount of protein sequences determined from the genomic sequencing project, there is an emerging need to use high throughput methods to find their function and prioritize targets. In the target characterization process it is also important to understand the effect of the variation of these sequences and structures. GeneAtlas is an automated, high throughput pipeline for the protein annotation using sequence similarity detection, homology modeling, and fold recognition methods. An optimized protocol which uses PSI-BLAST and SeqFold to search for homologous structures from PDB database; MODELER to build 3D models for the sequences based on the template structure, and Profile-3D/Verify to assess the quality of the model structure will be discussed.

Many functional features based on the 3D model of the protein are identified, including protein active sites, ligand binding sites, and functional motifs. Furthermore, from the multiple sequence alignment of the protein family and the model structure, the residues which are essential for protein function or important for the selectivity of ligand binding are identified by the evolutionary trace method. All model structures and 3D annotations are stored in a relational database, AtlasStore, and the queries can be searched and analyzed though Accelrys’ Discovery Studio Modeling graphical interface. The AtlasStore interface can visualize and analyze the effect of single-nucleotide polymorphisms on protein structures. Also, the Discovery Studio environment enables the researchers to deploy Accelrys’ lead identification and discovery tools. The integration of these tools in a knowledge management and decision system makes the application of the pharmacogenomics paradigm in a high-throughput manner possible.

8:30 18 Chemical handbooks: Glorious past, questionable future
F. Bartow Culp, School of Library Science, Purdue University, Mellon Library of Chemistry, West Lafayette, IN 47907 - SLIDES

Chemistry handbooks are almost as old as modern chemistry itself. For nearly two centuries, compilations of chemical information such as Gmelin, Beilstein and Landolt-Boernstein have organized the diffuse primary literature to make facts easily available to the chemist. The increasing size and complexity of the chemical literature in the 20th century signaled the demise of the comprehensive nature of such efforts, but the notion and reality of the handbook persists today. While their current formats and even definitions have changed over time, modern handbooks share some core characteristics: They are selective in scope, labor intensive to prepare, and costly to purchase. In order to appeal to the new consumers of chemical information, some publishers have converted their handbooks into electronically searchable products, while others have held to primarily print versions. It is reasonable to question whether, in the coming age of disembodied journals and deconstructed texts, there will even be a place for the classically organized handbook. The purpose of this talk will be to review briefly the history of chemistry handbooks, to look critically at their present incarnations, and to propose some means of their survival in the brave new world of the Internet, e-books, and metadata.

2:45 18 BTEC Integrating chemical structures, biological activity fingerprints, and gene expression profiling for drug discovery
Leming Shi, Zhenqiang Su, Aihua Xie, Chenzhong Liao, Wei Qiao, Dajie Zhang, Zhibin Li, Zhiqiang Ning, Weiming Hu, and Xianping Lu. Chipscreen Biosciences, Ltd, Research Institute of Tsinghua University, Suite C301, Shenzhen 518057, Guangdong, China

Chipscreen Biosciences, Ltd. (www.chipscreen.com), a drug discovery company specialized in novel small molecule therapeutics for type II diabetes, osteoporosis and menopause syndrome, benign prostate hyperplasia, and cancer, has developed a proprietary chemical genomics approach to accelerate the discovery of new medicines from its collections of natural products, traditional Chinese medicines, and synthetic chemical libraries. Central to Chipscreen’s drug discovery platform is its capability of integrating computer-aided drug design, medicinal chemistry, parallel multi-target high throughput screening, global gene expression profiling, and informatics to rapidly and effectively advance the drug discovery process. To fulfill Chipscreen’s drug discovery needs we have developed an integrated biochemoinformatics software system to effectively store and analyze various types of experimental data including chemical structures, biological activity fingerprints, and gene expression profiling. Applications of the software system in our internal drug discovery projects will be presented.

9:00 19 CRC Handbook of Chemistry and Physics: From paper to web
Fiona Macdonald, CRC Press, 23 Blades Court, Deodar Road, London SW15 2N, United Kingdom, Fax: +44 20 8871 3443, fmacdonald@crcpress.com, and David R. Lide, Editor - SLIDES

In print for nearly 90 years, the CRC Handbook of Chemistry and Physics has become an institution in many laboratories worldwide. Today the demands are for instant desktop access, the most current information, plus sophisticated search and display facilities. This talk will focus on the challenges encountered in getting the 'Rubber Bible' on the web, the latest version of which is now available at http://www.hbcpnetbase.com/.

3:30 19 BTEC Design of experiments based on pharmacogenomics analysis of databases containing gene expression profiles and compound activities to help elucidate molecular mechanisms
Chihae Yang1, Paul Blower1, Robert W. Brueggemeier2, and Jeanette A. Richards2. (1) LeadScope, Inc, Columbus, OH 43212, (2) Division of Medicinal Chemistry & Pharmacognosy, College of Pharmacy, The Ohio State University, 500 West 12th Avenue, Columbus, OH 43210

High throughput genomic studies are producing large databases of molecular information on cancers and other cell and tissue types. Hence, the opportunity to link these accumulating data to the drug discovery process becomes a real possibility. However, despite the introduction of a new paradigm and methodologies, this large amount of information and significant investment has not led to dramatic increases in productivity in drug discovery. In the past, we have correlated the gene expression profiles of NCI 60 cell lines to compound activity patterns of the same cell lines. Genes in specific biological process pathways were correlated with certain chemical scaffolds, whose associations were used to build molecular hypotheses. The gene selection was carried out using a gene hierarchy built on annotations from the Gene Ontology Consortium; the hierarchical classification based on biological process was used to differentiate gene expression patterns of various cell types. The chemical scaffolds are built by extracting common cores of the compounds responsible for the initial correlations with gene expression profiles. These compound scaffolds are used to probe the genes within certain pathways. The set of selected genes and the compounds provide a foundation for building hypotheses based on molecular mechanisms. Concentrating on the data mining results from breast cancer cell lines, design of further experiments to actually test the molecular hypothesis of pairs of compound and genes will be discussed.

9:30 20 The next 100 years: The evolution of The Merck Index toward a fully electronic publication
Irwin Schreiman1, Barbara Solomon1, Jonathan Brecher1, and Ann Smith2. (1) Informatics, CambridgeSoft Corporation, 100 CambridgePark Drive, Cambridge, MA 02140, ischreiman@cambridgesoft.com, jbrecher@cambridgesoft.com, (2) Merck & Co. Inc

Among printed reference works, The Merck Index stands out for its integrity, detail and longevity. We faced many challenges when converting this treasure trove of information to electronic form. Throughout the conversion process, we focused on the importance of data quality. The well-structured data that was used for production of the print version eased the creation of the electronic version. We have found when undertaking a project such as this, it is also important to understand the objectives. What data needs to be searchable -- and therefore included in a database -- versus what can be presented in a static file format (such as a "pdf" file)? Questions such as "Which fields must be searchable independently?" and "How should numerical searching work? (ranges, significant figures, etc.)" must be addressed. Finally, presentation versus accessibility must be gauged to ensure a successful end-user experience.

4:00 20 BTEC Pharmacogenomics: When chemical abstracts is not enough
Philip Barnett and Claudia Lascar. Science/Engineering Library, City College of New York (CUNY), Convent Avenue at 138th Street, New York, NY 10031

One of the challenges in pharmacogenomics research is locating and retrieving all the relevant literature on any topic within pharmacogenomics and the intertwined discipline of pharmacogenetics. Analysis of subject coverage reveals the indexing and abstracting services most needed in these two disciplines. Unlike most fields of chemistry where Chemical Abstracts suffices for most searches, this source contains less than half of the research literature in pharmacogenomics and pharmacogenetics. Supplementing a Chemical Abstracts search with the always free PubMed still does not recover all the literature in this field. Several databases, Science Citation Index, Biosis, Pascal, Embase, International Pharmaceutical Abstracts, and Derwent Biotechnology Abstracts all contain unique additional references not included in the other databases. Even an often overlooked database, Cancerlit has some unique material. All of these sources must be searched to recover the literature in pharmacogenomics and pharmacogenetics, a rather interdisciplinary research area. While searching using the root terms "pharmacogenomic or pharmacogenetic" retrieves most of the relevant literature in this field, often specific topics within pharmacogenomics must be searched using a search strategy tailored to the exact subject being sought. Citation analysis and subject coverage examination reveal this field's most relevant journals, the ones most needed by researchers in pharmacogenomics and pharmacogenetics.

10:00 21 Science of Synthesis/Houben-Weyl: Conversion of a major reference work in organic synthetic chemistry (print) into an interactive, highly accessible electronic product
Dr. M. Fiona Shortt de Hernandez, Georg Thieme Verlag, Ruedigerstrasse 14, D-70469 Stuttgart, Germany, Fax: 0049-711-8931777, fiona.shortt@thieme.de - SLIDES

Houben-Weyl (http://www.houben-weyl.com/) is an indispensable treatise for every synthetic chemist serving the scientific community with a critical selection of synthetic methods. The project was established in 1909 and contains over 140 volumes covering all aspects of organic synthetic chemistry. The fifth edition which will carry on the tradition of Houben-Weyl but will include new features such as safety information, scope, comparison of methods etc. was launched in the year 2000 and called Science of Synthesis (http://www.science-of-synthesis.com/). This series is edited by D. Bellus (Basel, Switzerland), E. N. Jacobsen (Cambridge, USA), S. V. Ley (Cambridge, UK), R. Noyori (Nagoya, Japan), M. Regitz (Kaiserlautern, Germany), P. J. Reider (Thousand Oaks, USA), E. Schaumann (Clausthal-Zellerfeld, Germany), I. Shinkai (Tsukuba, Japan), E. J. Thomas (Manchester, UK), and B. M. Trost (Stanford, USA). Science of Synthesis will be published in a total number of 48 volumes and will contain ca. 150,000 reactions. Science of Synthesis has been designed using new workflows and production techniques as well as XML technology so that this work is not just available in book format but in an electronic format as well. The electronic product which was developed in collaboration with InfoChem and an international advisory board is available as an Intranet or Internet solution offering powerful text, substructure, and reaction searching. The Houben-Weyl archive is now available in a digital format as well so that it is now possible to use an intuitive electronic guide (designed by Thieme Publishers) to access over 100 years of invaluable information!

10:30 22 The knovelized e-reference
Robert R. Brand, knovel Corporation, 33 Main Street, Newtown, CT 06470, rbrand@knovel.com - SLIDES

While the classical reference/handbooks decline with the disappearance of library shelf space, they are being supplanted by it’s digital representation delivered to networks worldwide.

This paper will demonstrate in concrete terms the vision of reference books, handbooks and their paperless cousin the database only version. The new representation should be interactive and deep searchable (IDS) and accessible on the Internet; the e-Reference.

Essential to the new emerging model is the role of aggregation with similar data, maximizing portability of customized data sets, mobility of data elements across classical handbooks and keyword searching via a common interface. New data element interactivity will be presented in detail along with near future elements.

11:00 23 Building a virtual reference collection in chemistry
Patricia Kirkwood, Science Librarian, Pacific Lutheran University, Mortvedt Library, Tacoma, WA 98447, Fax: 253-535-7315, kirkwope@plu.edu - SLIDES

So your library offers reference on demand through the web. Great! Now you can get what you need without leaving the lab. But when you chat with the librarian you find that the table you need to work with is only available in the library and its way too big to fax. So now you still have to find time to go to the library. Is this a common complaint? Why develop a virtual reference service if you don't have a virtual reference collection to support the service. Electronic reference resources such as handbooks, encyclopedias, and dictionaries are becoming more available and much more usable. However, so far, the tools don't get used very much. What can the library do to create the basic electronic reference collection in the sciences? After the collection is chosen, how do users find out about it and figure out how to use it? Of course, there are always more questions than answers as this work is done. How should the librarians and the publishers/vendors work together to make sure the licenses and the technology work together to provide a valuable and usable resource for the chemist?

11:30 24 The next step at major reference works
Claudia Pick, Peter Loew, Josef Eiblmaier, and Hans Kraut, InfoChem GmbH, Landsberger Straße 408, D-81241, München, Germany, Fax: +49 89 5 80 38 39, Claudia.Pick@infochem.de - SLIDES

Meanwhile digitalization is standard and most of the primary chemistry literature is already online available. The next step is the digitalization of major reference works secondary literature that combines the information of the primary literature with the expertise of highly trained scientists creating validated review articles. But the digitalization of major reference works is offered in different qualities and scales. A new dimension in digitalization of major reference works is the addition of the feature of structure and reaction search and, moreover, to provide one system that allows global searching in several major reference works at a time. InfoChem GmbH cooperated with publishing houses like John Wiley & Sons, Springer-Verlag, and Thieme-Verlag in the design and development of Internet and Intranet versions of electronic major reference works. The software used in these web versions (e-EROS from John Wiley, Science of Synthesis from Thieme, CAC from Springer) has been developed exclusively by InfoChem allowing - among other things - the retrieval of structures, reactions and text in several major reference works at the same time.

8:30 25 Capitalizing on the value in in vitro hepatotoxicity data
Philippa R.N. Wolohan, and Robert D. Clark, Research, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314-647-9241, pwolohan@tripos.com

Pre-clinical decision making in the hit or lead generation phase of drug development is routinely made based on in vitro cellular screening studies. In order to make salient predictions of toxicological properties in man it is critical to fully understand the in vivo relevance of models based on such cellular assays. Assessing the predictive value of such models is no simple task and becomes an even more pertinent issue when we make the leap into in silico modeling of such systems. We will present an evaluation of hepatotoxicity data in a human cell line, discuss the biological and statistical concepts associated with interpreting this data and our strategy for factoring these considerations into the design of our in silico models. Understanding the natural limitations of training models on in vitro data allows us to better determine the appropriate level of confidence to have in predictions made from such models in a rational drug discovery setting.

9:00 26 Computational models for predicting chemical toxicity
Julie E. Penzotti, and Gregory A. Landrum, Rational Discovery LLC, 555 Bryant St. #467, Palo Alto, CA 94301, Penzotti@RationalDiscovery.com

Despite major advances in the field of toxicology, safety assessment remains a costly challenge in chemical development. Computational approaches to identify compounds that are hazardous to human health or the environment are of great interest. These approaches can be used early in the development process to select compounds likely to have fewer toxicity liabilities and to prioritize toxicity studies in risk assessment. Because multiple (possibly unknown) mechanisms can lead to the same toxicity endpoint, algorithms for toxicity prediction must be capable of handling multiple patterns of activity and modes of action. We have developed a unique ensemble approach for building computational models to screen large numbers of chemical structures for toxicological properties. A major strength of our method is its ability to provide a confidence level for each prediction that can be used to identify compounds which require further testing. Our approach and its application to modeling toxicological endpoints will be presented.

9:30 27 Facing database mining challenges in ecotoxicity
Jacques R. Chretien1, Marco Pintore1, Nadège Piclin1, Frederic Ros2, and Emilio Benfenati3. (1) BioChemics Consulting, Centre d'Innovation, 16, rue Leonard de Vinci, Orleans cedex 2 45074, France, Fax: + 33 2 38 41 72 21, jacques.chretien@univ-orleans.fr, (2) University of Orleans, CBI / Chemometrics & BioInformatics, (3) Department of Environmental Health Sciences, Istituto di Ricerce "Mario Negri"

New DMB tools, based on Genetic Algorithms and Fuzzy Logic, were developed and applied on large data sets of toxic chemicals, in order to establish general Structure-Activity Relationships (SAR). Several salient examples will be shown, underlining possibilities and limitations of the proposed procedures. These examples deal with three biological models: (i) a series of 235 pesticides studied on rats or (ii) on trout and (iii) a series of 568 chemicals studied on fathead minnow. Levels of good prediction of 75%, for test sets, support particular interest of these DBM tools in the area of ecotoxicity, due the high variability affecting the experimental procedures.Importance of a powerful strategy in Molecular Experimental Design (MED) based on supervised self organizing maps (sup-SOM) will be underlined to handle chemical diversity and the real predictive power of any predictive model relatively to large chemical data bases. (We acknowledge financial support from The European Commission: project IMAGETOX, HPRN-1999-00015).

10:00 28 In silico methodologies for predictive evaluation of toxicity based on integration of databases
Chihae Yang, LeadScope, Inc, Columbus, OH 43212, cyang@leadscope.com, and Ann Richard, National Health & Environmental Effects Research Lab, U.S. EPA

The ability to accurately “predict” toxicity with in silico methods is increasingly emphasized as industry moves toward efficient up-front screening to reduce late stage attrition. However, current methods for structure-based toxicity estimation are not yet satisfactorily predictive. Reasons include the intrinsically complex nature of chemically induced toxicity and the lack of data from which the “models” or “predictions” are derived. Although toxicity information is publicly available, most of these databases are not optimized for building structure-toxicity relationships. The relationship between quality of data and prediction model accuracy intensifies the need for improved access to quality toxicity information. This paper describes collaboration between an EPA-sponsored public initiative, DSSTox (Distributed Structure Searchable Toxicity) database network, and a private sector effort, LIST (LeadScope In Silico Tox) focus group. Both are working towards improved data access and the integration of disparate data formats from various data sources. DSSTox is promoting SDF format for toxicity databases inclusive of chemical structures, whereas LIST is developing controlled vocabularies and mapping the data fields of SDF and XML schema. Improving prediction capability by integration of data to enhance chemical space, a shared goal of the DSSTox and LIST initiatives, will be discussed. This abstract does not reflect EPA policy nor does mention of trade names indicate EPA endorsement.

10:30 29 Mining molecular fragments with MoFa: Finding relevant substructures in sets of molecules
Michael R. Berthold1, Heiko Hofer1, and Christian Borgelt2. (1) Research, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314 647 9241, berthold@tripos.com, (2) University of Magdeburg

We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the National Cancer Institute's HIV-screening dataset.

1:30 30 Compressed Chemical Markup Language for compact storage and inventory applications
M Karthikeyan, Deepak Uzagare, and S Krishnan, Information Division, National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India, Fax: +91-20-5893973, karthi@ems.ncl.res.in

CML representation is well documented however due its size comparison with other existing file format it is prohibitive for many applications. If suitable tool is developed to store CML format in compressed mode without loss of information and freedom of use then it will encourage users community to apply CCML for their applications. Here in NCL we developed a methodology for encoding chemical structures as compressed CML generated by popular chemical structure generating programs like JME. This CCML format consists of both SMILES and/or equivalent data along with co-ordinate information about the atom for generating chemical structures in plain text format. Each structure generated by JME in standalone or generated by virtual means can be stored in this format for efficient retrieval, as it requires about one tenth or below of actual CML file format, since the SMILES describes the interconnectivity of the molecule. This CCML format is compatible for automated inventory applications

2:00 31 New chemical information interchange standards based on CML: A submission for the Object Management Group
Mitchell A Miller1, Scott S. Markel1, Juan C. Esteva2, and Wendy L. Sharp3. (1) LION bioscience, 955 Ridge Hill Lane, Midvale, UT 84047, mitchell.miller@lionbioscience.com, (2) Department of Computer Information Systems, Intelligent Solutions / Eastern Michigan University, (3) Intelligent Solutions

A new standard for chemical information interchange is presented. This standard is based on Chemical Markup Language (CML) but has been extended to support multiplexed structures - multiple isomers, tautomers and conformers for a given compound - as well as chemical searching across a variety of database types.

2:30 32 Novel applications of XML in chemistry
Peter Murray-Rust, Unilever Centre for Molecular Informatics, Cambridge University, UK, Lensfield Road, CB2 1EW Cambridge, United Kingdom, pm286@cam.ac.uk, and Henry S. Rzepa, Chemistry, Imperial College

Following from our concept of the datument (an integration of data+document in XML) we present here a review of a wide range of chemical concepts in active information objects. We shall demonstrate how these can be used by machines as well as read by humans. Current use of chemical information in human hands is both expensive and highly error-prone and robot chemists will act as "information prosthetics" to carry out routine or high-throughput e-chemistry. The semantics of XML chemistry can be engineered so that actions and interpretations can be determined from reference dictionaries. When datuments replace conventional articles they can be used for many tasks such as extraction of data, control of instruments or running calculations. This forms the infrastructure of a semantical chemical GRID, where knowledge is available without ontological impedance. These chemical XML resources have been layered on the emerging global technologies of peer-to-peer communications and Web Services. As these develop towards the Berners-Lee web-of-trust, the security and authentication protocols are fundamentally integrated into modern chemical e-communication.

3:00 33 The family of XML languages in chemistry
Henry S. Rzepa, Chemistry, Imperial College, London, United Kingdom, rzepa@ic.ac.uk, and Peter Murray-Rust, Unilever Centre for Molecular Informatics, Cambridge University, UK

XML supports chemistry through a family of interoperating XML languages which support a wide range of core concepts. The design is modular and extensible. Chemical Markup Language (CML) describes molecules and crystal structures, including their complete electronic description and flexible representation in connection tables. Physical data are represented in Scientific Technical Medical Markup Language (STMML) which supports a wide range of numeric data types and scientific units. Intensive properties of substances are described through a PropertyType library in the SELFML system. Reactions, along with mechanisms, stoichiometry and associated physical quantities are managed by CMLReact. Computational Chemistry Markup Language (CCML) supports all aspects of the input, control and analysis of chemical computation. These are represented in XML Schemas which can validate the strucure, vocabulary, dataypes and values within chemical documents. With the addition of XSLT stylesheets complex chemical concepts can be encapsulated as machine-enforceable rules leading to a major increase in the quality and reusability of chemical information.

4:30 34 Open Meeting. Committees on Publications and on Chemical Abstracts Services
Robert J. Massie, Director, Chemical Abstracts Service, American Chemical Society, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: (614) 447-3713, rmassie@cas.org, and Robert D. Bovenschulte, American Chemical Society Publications Division, 1155 16th Street NW, Washington, DC 20036, Fax: (202) 872-6060, rbovenschulte@acs.org
8:00 35 Web-based tools for cheminformatics and drug design
Marc C Nicklaus1, Wolf-Dietrich Ihlenfeldt2, Johannes H. Voigt3, and Frank Oellien2. (1) Laboratory of Medicinal Chemistry, National Cancer Institute, National Institute of Health, Building 376, Boyles Street, Frederick, MD 21702, Fax: 301-846-6033, mn1@helix.nih.gov, (2) Computer Chemistry Center, Institute of Organic Chemistry, University of Erlangen-Nuremberg, (3) Laboratory of Medicinal Chemistry, National Cancer Institute - Frederick Cancer Research and Development Center, National Institutes of Health

We present a collection of web-based services at http://cactus.nci.nih.gov, useful for drug design and cheminformatics in general. Among others, we present the Enhanced NCI Database Browser (http://cactus.nci.nih.gov/ncidb2) for searching in 250,000 compounds and a large number of calculated properties, including hundreds of predicted biological activities; the GIF Creator for Chemical Structures (http://cactus.nci.nih.gov/services/gifcreator/), a tool to generate GIF and PNG images of chemical structures from 2D or 3D input files in many different formats, and with numerous rendering options; the Online SMILES Translator (http://cactus.nci.nih.gov/services/translate/), a service that converts SMILES strings into Unique SMILES, and converts between SMILES, SDF, PDB, MOL and other formats, including, if applicable, multi-structure files; the Online Pseudorotation Tool (http://cactus.nci.nih.gov/Pseurot/), which calculates pseudorotation parameters as used in the fields of nucleoside/nucleotide chemistry, and correctly recognizes and processes DNA and RNA, both single and double strand, nucleoside analogs with non-standard sugars, nucleosides/nucleotides complexed with proteins and other tough cases; the Self-Organized Map (SOM) of Compounds Tested in the NCI anti-HIV Screen (http://cactus.nci.nih.gov/services/som_qsar/), a self-organizing map (SOM) of 42,000 AIDS-screened compounds clustered by structure similarity, onto which the user can map compounds from the 42k AIDS set, predefined datasets, or even one's own compounds, and which allows the user to run searches in the whole NCI Open Database, starting from seed compounds in the SOM; and the NCI Screening Data 3D Miner (http://cactus.nci.nih.gov/services/3DMiner/), a service employing VRML for visualization and data mining in the NCI's 60 cell line anti-tumor screening data.

8:00 36 Marked photoconductivity enhancement of poly(2,5-dialkoxy-p-phenylene vinylene)-perylene derivative composites film upon annealing
Wei Feng Sr.1, Haifeng Yu2, Yaobang Li1, Akihiko Fujii3, and Katsumi Yoshino3. (1) Department of Chemical Engineering, Institute of Polymer Science and Engineering, Department of Chemical Engineering,Tsinghua Universit, Beijing 100084, China, Fax: 86-10-62770304, weifeng@tsinghua.edu.cn, (2) Department of Chemical Engineering and School of Materials Science and Engineering, Tsinghua University, (3) Department of Electronic Engineering, Department of Electronic Engineering, Graduate School of Engineering, Osaka University

The interplay between phase separation in composite films comprising poly(2,5-dialkoxy-p-phenylene vinylene) (ROPPV) and perylene derivative(PV) which show photoinduced charge transfer and photovoltaic performance has been nvestigated. The change in morphology and molecular reorientation occurring in composite films upon annealing were investigated using SEM. Upon annealing, PV microcrystallines of 8-10 micron in size lying parallel to the substrate surface can be obtained. Annealing effect improved the photovoltaic performance of ITO/CP-PV/Al Schottky type solar cells, which can be attributed to the formation of an electron conducting PV crystal network. Preliminary studies indicate that the morphological structure in CP-PV composite film has an important influence to their photovoltaic properties.

8:00 37 Quantitative structure-activity relationship study of histone deacetylase inhibitors
Aihua Xie1, Chenzhong Liao1, Boyu Li1, Zhibin Li1, Zhiqiang Ning1, Weiming Hu1, Xianping Lu1, Jiaju Zhou2, and Leming Shi1. (1) Chipscreen Biosciences, Ltd, Research Institute of Tsinghua University, Suite C301, Shenzhen 518057, Guangdong, China, Fax: +86-755-26957291, aihxie@chipscreen.com, lmshi@chipscreen.com, (2) Institute of Process Engineering, Chinese Academy of Sciences

Histone deacetylases play a critical role in gene transcription and have become a novel target for the discovery of drugs against cancer and other diseases. During the past several years there have been extensive efforts in the identification and optimization of histone deacetylase inhibitors (HDACIs) as novel anticancer drugs. We have identified, collected, and verified the structural and biological activity data for more than 100 compounds and performed an extensive QSAR study on this comprehensive data set by using various QSAR and classification methods. The predictive QSAR model reached an R2 of 0.80 and leave-one-out cross-validated R2 of 0.75. The overall rate of correct prediction of the classification model is around 95%. The computational models have been used in our internal projects on the design and optimization of HDACIs. The advantages and limitations of the models will be discussed.

8:00 38 Ultrafast optical Kerr effect of poly{thiophene-2,5diyl}[(2-carboxy-4-N,N-dimethylamino) azobenzylidene]}(ptcmaabe)
Wei Feng Sr.1, Wen-hui Yi2, Haifeng Yu3, and Hong-cai Wu2. (1) Department of Chemical Engineering, Institute of Polymer Science and Engineering, Department of Chemical Engineering,Tsinghua Universit, Beijing 100084, China, Fax: 86-10-62770304, weifeng@tsinghua.edu.cn, (2) School of Electronics and Information Eng, School of Electronics and Information Eng., Xi¡¯an Jiaotong University, (3) Department of Chemical Engineering and School of Materials Science and Engineering, Tsinghua University

Poly{Thiophene-2,5diyl}[(2-Carboxy-4' -N,Ndimethylamino)-Azobenzylidene]} (PTCMAABE) was synthesized. The time-resolved Optical Kerr effect (OKE) was investigated with femtosecond laser pulses at 790nm. Only a ultrafast component of OKE of PTCMAABE is observed and its dephasing time is 92.7fs, which attribute to p-electron-cloud distortion occurring upon the non-resonant excitation. The second-order hyperpolarizabilities g and third-order nonlinear optical susceptibility c3 of PTCMAABE were also determined by transient OKE. The results show that PTCMAABE exhibit large off-resonant nonlinearities g as large as g=1.66x10-32 esu for each structural units and as large as c3=4.2410-10 esu for the material have been obtained.

8:00 39 Use of Barnard and Daylight fingerprints in ligand-based virtual screening
S. Kuen Yeap1, Mike Snarey1, and Cesare Federico2. (1) Molecular Informatics, Structure and Design, Pfizer Global R&D, Ramsgate Road (ipc 636), Kent CT13 9NJ, Sandwich, United Kingdom, Fax: 44 1302 658463, yeap_sk@sandwich.pfizer.com, (2) Department of Chemistry, UMIST

Large pharmaceutical companies possess millions of proprietary compounds that are regularly screened for activity against targets of interest. By pre-selecting compounds by computational means, virtual screening aims to maximise the chance of finding a hit early in the screening programme.

In ligand-based virtual screening, 2-D fingerprints are often used to screen for ligands similar to a lead or leads. Barnard fingerprints compute the presence or absence of predefined structural features. Daylight fingerprints are hashed to fit all fragments available in the dataset into a bit-string. A comparative analysis of these fingerprints has yet to be reported.

Barnard and Daylight fingerprints of several lengths were computed for a MDDR dataset comprising 3000 known actives belonging to six activity classes and another 7000 compounds selected at random. The compounds were ranked against selected leads using the Daylight Nearest Neighbour algorithm. Analysis of the top-ranking lists shows that: (i) relative performance of the fingerprints is: Daylight 2K folded

These observations support the use of as many structurally diverse leads as are available, and the consensus application of these fingerprint methods for ligand-based virtual screening. The order of fingerprint efficacy was also found to apply to the clustering of drug datasets. Barnard 4K outperformed the shorter fingerprints as judged by medicinal chemists’ intuition.

8:35 40 Patent fundamentals for nonexperts
Edlyn S. Simmons, SourceOne-Business Information Services, Procter & Gamble Co, 5299 Spring Grove Ave., Cincinnati, OH 45217, Fax: 513-627-6854, simmons.es@pg.com

A patent is both a legal document and a technical publication, subject to national laws and precedent, international conventions and traditional forms of expression as well as the conventional language of scientific and technical writing. When searching patent databases and evaluating search results, the legal aspects of patents should always be considered. This presentation provides an overview of the fundamental elements of patent law, and patent documentation. The role of priority under the Paris Convention, the timeline for patent filing, examination and grant, the interpretation of patent claims, and the nature of the patent monopoly are discussed.

9:05 41 Patent information resources for nonexperts
Andrew H. Berks, Merck & Co, 126 E. Lincoln Ave RY60-35, Rahway, NJ 07065-0900, Fax: 732-594-5832

A brief overview of information sources for locating patent information will be presented, including important resources on services such as Questel, STN, and Dialog, and low or no cost web-based resources. Chemical structure databases covering patents will be discussed.

9:55 42 Patent term, expiries, and extensions
Stephen R. Adams, Magister Ltd, Crown House, 231 Kings Road, Reading RG1 4LS, United Kingdom, Fax: +44 118 929 9516, stevea@magister.co.uk

The basic period of patent monopoly enjoyed by the patent holder is fixed by national law, but it is only recently that worldwide standards have begun to develop, notably as a result of the WTO TRIPS Agreement. The actual period obtained, however, is subject to additional variables such as national requirements for annuity payments, transitional legislation and term extensions. These factors will be discussed, with special reference to legal status information sources.

10:25 43 Patentability, infringement, or validity: What kind of search?
Barbara A. Hurwitz, Barbara Hurwitz, consultant, 36 Waverly Street, Portland, ME 04103, Fax: 207-228-6418

The most common type of patent search is a patentability search, that is, a search for the prior art. Freedom to Operate searches (FTO) are basically infringement searches where in-force patent claims are searched to confirm that a new product or process will not cause an infringement problem. Validity searches are performed when the client wishes to invalidate a patent that is currently in force. The scope of a search is determined by which of these searches is needed. Scope refers both to the time period to be covered and the kinds of databases to be searched.

10:55 44 Demystifying the patent-search process
Randall K. Ward, Harold B. Lee Library, Brigham Young University, 2320 HBLL, Provo, UT 84602, Fax: 801-422-0466, randy_ward@byu.edu, and Barbara J. Ikeler, Novartis Pharmaceuticals Corp

To many, patent searching may have an aura of sophistication that is intimidating and can make one less than confident in attempting such searches. The authors will try to give confidence to the non-expert regarding some of the basics of patent searching, though it is supposed that patent searching experts could find many caveats and exceptions to the information presented herein. Of course it is crucial to use judgment as to when to consult an expert, but the intent here is to share concepts that may (in many instances) provide adequate search results. Presented will be some databases (with examples and characteristics) necessary in basic patent searching, such as Derwent World Patents Index, Chemical Abstracts, and INPADOC. Lightly touched will be tools and systems useful in patent searching, such as STN, DIALOG, and Micropatent

11:25 45 Patents sell, but who's searching? The rise of the nonexpert in the patent-searching arena
Katharine Hancox, Product Development Group, Chemistry Division, Thomson Derwent, 14 Great Queen Street, Holborn, London, United Kingdom, katharine.hancox@derwent.co.uk, and Gez Cross, Product Development Group, Chemistry Division, Thomson Derwent

Patent searching - a mystifying art practised by seasoned Information Professionals equipped with an array of complex search strategies and command languages, prohibited to the non-expert searcher for reasons of complexity and cost. As organisations continue to realise the value of patent information across a wider range of business functions, patent searching has evolved to encompass the end-user as well as the specialised searcher.

Critical to meeting the needs of this ever-growing user communities is enabling powerful, simple methods to search, navigate and analyse patent data without detracting from the value of the content. In this paper we will use case studies on a particular platform, the Derwent Innovations Index, to show how web-based services are evolving to meet these needs, delivering flexible searching, Alerting and personalisation with links to citing and cited patents, full-text patent sources and related scientific literature.

1:20 46 History of the DARC system
Jacques-Emile Dubois, University Denis Diderot, ITODYS, Paris 75005, France, dubois@paris7.jussieu.fr

Chemistry communication patterns changed dramatically in the 60s and 70s. Digital computing required novel languages and codes. Employing topology to describe and handle molecules was the basic paradigm of the DARC (Description, Acquisition, Retrieval, and Correlation) System. New original and coherent concepts for identification, retrieval and correlation of structures were developed for technological and academic needs. DARC implementation profited greatly by cooperation with nongovernmental institutions, e.g. ACS, CAS, IUPAC, as well as private industries, to adapt, hone and harmonize its original tools and strategies. Mike O’Hara contributed with enthusiasm and competence, both to the classic DARC products, the generic DARC and the DARC/MARKUSH for patents. Human, pedagogical, societal and political facets of the DARC story enlighten this history of the past 50 years.

2:00 47 Substance handling at Chemical Abstracts Service
W. Fisanick, Research and New Product Development, Chemical Abstract Service, 2540 Olentangy River Road, P. O. Box 3012, Columbus, OH 43210, Fax: 614-447-3813, wfisanick@cas.org

Since the advent of the Chemical Registry in 1965, Chemical Abstracts Service (CAS) has developed and used a variety of approaches and techniques relative to the handling of chemical substances. These approaches and techniques involve the representation, registration, and search and retrieval of chemical substances. Representation aspects include the use of 2D and 3D structures along with nomenclature and molecular property surrogates. Registration aspects include the use of special structuring conventions. Search and retrieval aspects include the use of exact, substructure and generic structure search capabilities. A key strategy is the development of a continuum in the representation and access among specific and generic substancs. This paper will review the key approaches and techniques used by CAS for substance handling.

2:25 48 Creating the MARPAT file: Practical and philosophical issues in patent analysis and database building
David E. Connolly, Dept. 56 - Synthetic and Polymer Chemistry, Chemical Abstracts Service, Columbus, OH 43210, dconnolly@cas.org

Creating the MARPAT File: Practical and philosophical issues in patent analysis and database-building abstract: Most users of patent information see only the output of searches from the CAS MARPAT file. This presentation will explore what goes into creating the CAS MARPAT file and the challenges that database builders face in taking complex Markush structures from the literature and turning them into useful and meaningful information for our customers. These include: selection of Markush structures from patents, interpreting confusing or contradictory patent language and translating it into MARPAT coding that complements the CAS Registry File. The presentation will include data and observations on trends in the Markush literature.

2:50 49 Back for the future 2: Cool codes, marvelous Markush, and hot interfaces
Gez Cross, Product Development Group, Chemistry Division, Thomson Derwent, 14 Great Queen Street, Holborn, London, United Kingdom, Fax: +44 207 344 2911, gez.cross@derwent.co.uk, and Katharine Hancox, Product Development Group, Chemistry Division, Thomson Derwent

Throughout his career Mike O’Hara was involved with training in and development of innovative search systems for chemical and patent data, notably the CAS online and Markush DARC systems providing structure searching of the CA Registry and the Derwent and INPI owned MMS file. He continually sought for improvements to the systems he was involved with to enable better retrieval and relevance for his clients.

In the 2002 Skolnik Award symposium, this author spoke about potential improvements in the Derwent patent files to enable older code-based data to be searched more easily by new and existing users, particularly in conjunction with structure searches. This paper will seek to honor Mike’s memory by reporting on progress in implementing these improved search capabilities including recent and forthcoming improvements to the searching of Markush structures from patents.

3:15 50 Tips and tricks for searching MMS
Sandy Burcham, Service Is Our Business, Inc, 111 Lincoln Terrace, Norristown, PA 19403-3317, Fax: 610-630-0863, cass123@earthlink.net

This paper will cover strategies developed to get the most from the MMS search system and possibly some little known facts about content. All of these tips and tricks can be attributed to Mike O'Hara - some learned from him directly and others learned answering customer's questions when covering his phone.

3:40 51 Marketing chemical information in a research organization
David S. Saari, Library Information Center, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033, Fax: 908-740-7015, david.saari@spcorp.com

Michael O'Hara constantly was involved in marketing activities. To honor Mike's contribution to the chemical information community, this paper offers suggestions on how to market information resources and services in a research organization. The objectives of marketing are to transfer knowledge, initiate actions, establish habits, and influence opinions. Marketing communications should include both the features and benefits of the resource or service. Information professionals must create opportunities to build relationships with current and potential clients.

4:05 52 Protys: A full-text English index of new Japanese patents
Alan Engel, Paterra, Inc, 526 N Spring Mill Road, Villanova, PA 19085-1928, Fax: 610-527-2041, aengel@paterra.com

ProtysTM is a new Internet database that provides fulltext English indexing of new Japanese patents. This SDI-targeted database covers four weeks of Japanese Kokai with a one-week lag from publication. The profile-centered user interface allows users to maintain multiple search profiles and easily track profile runs versus database updates. Proximity operators allow colocation searches by sentence, paragraph, claim(s), and section (experimental, description of drawings, etc). Bibliographic search fields include the full set of JPO-applied classification schemes including F-terms and the 'facet' extension to the IPC and FI systems. ('Facets' allow searching, for example, by pharmacological class.) Display options include a term-in-context display that shows the basic information (bibliography, abstract and front page image) plus only those portions of the document that match the fulltext search criteria.

8:00pm 53 Postprocessing of merged Markush service results
Joseph M Terlizzi, Questel-Orbit, 8000 Westpark Drive, McLean, VA 22102, jterlizzi@questel.orbit.com

The Merged Markush Service (MMS) was designed for searching generic and specific chemical structures indexed in the Derwent World Patent Index (DWPI), and INPI’s PHARM database on Questel-Orbit. Compound number results from a search appear in two different vendor formats and can only be searched in these bibliographic databases separately. The Questel-Orbit MEM feature can be used to combine results and display these records. Various vendor software packages, such as Questel-Orbit’s Imagination and STN Express can be used for MMS searching and extraction of these compound numbers to PHARM and DWPI. This presentation will compare these vendor software packages, along with Questel-Orbit’s QWEB interface, for transfer of MMS data to Questel-Orbit’s bibliographic databases. MEMing techniques for combining results, displaying of images, ease of use, and post-transfer to Bizint Smart Charts will all be explored and evaluated.

8:00 54 A life preserver for the data flood
Gregory A. Landrum, Erik Evensen, Julie E. Penzotti, and Santosh Putta, Rational Discovery LLC, 555 Bryant St. #467, Palo Alto, CA 94301, Landrum@RationalDiscovery.com

Recent years have seen great advances in high-throughput screening; HTS systems capable of handling hundreds of thousands (or even millions) of compounds are now routinely used in drug discovery. Flexible new tools are needed to allow chemists to wade through the flood of HTS data without drowning in it. Beyond providing an interface to screening data, these data triage tools will facilitate the development of new insights via efficient mining of results.

We have developed a system, built upon an established computational drug discovery platform, which enables: data mining using tools such as similarity searching and hierarchic clustering; constructing pharmacophore- and/or shape-based models; and applying a variety of machine-learning methods for building predictive models. The system, accessible via GUI and scripting interfaces, is usable by both bench and computational chemists. Here we present an overview of the system and its application to mining the NCI AIDS dataset.

8:30 55 AIMS: Array information management system
David S. Hartsough, Informatics and Modeling, ArQule, Inc, 19 Presidential Way, Woburn, MA 08101, dhartsough@arqule.com

High throughput parallel synthesis places demands on chemical tracking and registration systems that are not present in a single compound synthesis environment. This presentation will describe ArQule’s integrated Array Information Management System (AIMS) that manages workflows and processes associated with parallel synthesis. Specific features to be presented will include tools for array layout, tracking of reagent and product locations, product culling and reformatting tools based upon analytical characterization and process monitoring reports that allow tracking of project timelines and workflow. Powerful analysis and query capabilities that allow managers and scientists to track and evaluate their work will also be presented.

9:00 56 DirectedDiversity(r) informatics: A status report
Victor S. Lobanov, and Dimitris K. Agrafiotis, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Dr., Suite 104, Exton, PA 19341, Fax: 610-458-8249, victor.lobanov@3dp.com

Having good-quality compounds in a high-throughput screening deck can significantly increase the odds of finding good leads and minimize the chance of project failures due to ADMET liabilities. When combinatorial chemistry is the primary source of novel chemical entities, compound selection becomes a daunting task due to the sheer number of possibilities. We have developed efficient algorithms and software systems to automate the analysis of combinatorial libraries and the selection of compounds for a variety of purposes (e.g. diverse libraries for screening, focused libraries for lead optimization, etc). The system is optimized for maximum performance on a desktop computer, and allows complex library analysis and planning experiments to be carried out in nearly interactive time frames.

9:30 57 ASPECT: A lims system for characterization of combinatorial libraries
Brian Deneau, Informatics and Modeling, ArQule Inc, 19 Presidential Way, Woburn, MA 01801, bdeneau@arqule.com

Informatics is a powerful tool for supporting the synthesis and characterization of combinatorial libraries for drug discovery. The interface between Cheminformatics and LIMS is a particularly important area that can provide great benefit to both synthetic and analytical chemists. The ASPECT system is a LIMS system that supports characterization of ArQule's combinatorial libraries. ASPECT is fully integrated with ArQule's AIMS system that supports the production of lead generation and lead optimization libraries. This presentation will cover the tracking and analysis capabilities that ASPECT offers for both the synthetic and analytical chemists. The capabilities of the ASPECT system for streamlining library characterization and analysis will also be described.

10:00 58 GeminiChemistry: Automating rapid analog synthesis
John Brohan Brohan, Automation Consulting, Traders Micro, 317 Barberry Place, Dollard des Ormeaux, Montreal, QC H9G 1V3, Canada, Fax: no Fax, jbrohan@videotron.ca, and Rejean Fortin, Medicinal Chemistry, Merck Frosst & Co Canada

Many of the constraints met with in chemistry are solved in an embedded manner by GeminiGemistry .

  • Spreadsheets define the "Unit of Work" which is a rack of vials. Each vial has a different volume.
  • These spreadsheets are the link to the software for managing the library synthesis.
  • The eight tips can be divided into Four Organic and Four Aqueous with separate solvent paths and wash stations.
  • The reaction blocks with refluxing or chilling or heating.
  • Non Surface Sensing liquids are dealt with directly by computing surface position.
  • Stir Bars are implemented directly, they affect the surface height and the lowest point of the pipette’s movement.
  • Inert Atmospheres are managed in the reaction blocks. Argon can also dry out tubing before aspirating paraphoric liquids.
  • Several steps are linked together in a Job.

  • 10:30 59 High-throughput chromatographic method selection and structure verification
    Michael McBrien, and Eduard Kolovanov, Advanced Chemistry Development, 600-90 Adelaide W, Toronto, ON M5H 3V9, Canada, michael@acdlabs.com

    The advent of LCMS structure verification for high-throughput and walk-up laboratories has led to the development of so-called “generic” chromatographic methods. These methods are designed to be applicable to given groups of samples such that reasonable chromatographic performance is observed without the necessity of consideration of individual samples. The problem with this approach is that no one method can apply to all circumstances. The result can be inadequate sample retention, carryover to subsequent samples, and/or instrument downtime. MS data provides molecular weight information; the correct mass is often incorrectly taken as the correct compound. Advanced Chemistry Development has developed algorithms that predict retention times for new compounds under generic conditions. The approach uses training set selection based on structure similarity searches combined with physicochemical parameters in order to predict retention times. These predictions are used as the basis for selection between generic methods for each sample in a set. Retention times can be subsequently used as a structure verification filter to supplement other structure verification tools such as mass spectrometry, as well as being used for high-throughput method selection between generic methods.

    9:35 60 REACTOR: Software system for reagent selection, analysis, and inventory management
    Daniel A Gschwend, Research Informatics, ArQule Inc, 19 Presidential Way, Woburn, MA 01887, gschwend@arqule.com

    Elegant algorithms for library design have been published that incorporate a wide variety of factors to be considered in reagent selection. However, all of this work will be for naught if the original set of reagents to be considered in the virtual library is not amenable to automated high throughput synthesis. This presentation will describe the REACTOR reagent selection and inventory management system developed at ArQule to address this problem. Features of REACTOR will be presented that address chemical suitability of reagents, historic information regarding reagent utility, historic information of vendor reliability and current inventory status. Powerful analysis and query capabilities that enable chemists to use these and other reagent properties in their reagent selection and design will be described.

    10:05 61 Chemoinformatics tools for combinatorial chemistry
    M Karthikeyan, S Krishnan, and Deepak Uzagare, Information Division, National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India, Fax: +91-20-5893973, karthi@ems.ncl.res.in

    Chemoinformatics plays a major role in the drug discovery process, by eliminating poor choices quite early and helping to focus on good candidates. Development of chemoinformatics tools to stream line the combinatorial chemistry research is presented. Development of automation technology for encoding, decoding chemical structures using commercial barcodes for inventory and search applications. In house developed structure based electronic laboratory notebook (D-LAN) for chemistry and allied field environment to preserve organizational knowledge and to assist intellectual property activities is presented. Module to assist combinatorial chemistry interface will assist to guide, collect and store in proper format along with structural information and predicted properties will ease the inventory and reproducibility in research. Virtually genrated very large molecular collection with predicted physico-chemical properties of organizational interest and its interface with combi-chem research is explored.

    1:05 62 Combinatorial informatic systems at the NIST Combinatorial Methods Center
    Cher H. Davis1, Wenhua Zhang1, Alamgir Karim1, Eric J. Amis2, and michael J Fasolka1. (1) Polymers Division, National Institute of Standards and Technology, 100 Bureau Dr, MS 8542, Gaithersburg, MD 20899, Fax: 301-975-4924, (2) Polymer Division, National Institute of Standards and Technology

    Combinatorial methods involve automated sample-array preparation, computer-driven characterization and analysis, and overwhelming amounts of data. In order to coordinate this automation and accommodate this data load, an informatics project has been established at the NIST Combinatorial Methods Center (NCMC). The core of this informatics effort is a scientific database system. This database will provide a central and secure environment for data storage that is specifically geared to accept materials research data from a variety of sources, structured for scientific aims, and allows for selective, intelligent retrieval of data through scientist-mediated and automated routes. By interfacing the database with instrumentation and data analysis tools throughout the NCMC laboratories, it will help implement a longer range design-of-experiments (DOE) plan. With these connections established, the system will enhance the design/refinement of complex experiments; help streamline, document and organize research activities; enable the seamless cross-correlation of data-sets produced across materials disciplines, and facilitate new experiments based upon such comparisons. As data handling and analysis routines will be automated, time-consuming data maintenance chores will be eliminated. As the database content grows it will increasingly serve as a library useful to new experimental design and providing feedback for experimental refinement, dramatically reducing time spent on trial and error. The NCMC database system is being built upon open-source code, and will be supported by web-based interface software. In this presentation, we will discuss the details and logistics of our growing project, including protocols we are developing to standardize experiments, data, and procedures, to make them more easily accommodated by the database and more comparable to each other.

    1:35 63 Data-management system for catalyst discovery via combinatorial techniques
    George Fitzgerald, Jorg Hill, Georg Lowenhauser, Joe Tucker, and Michael J. Doyle, Accelrys, 9685 Scranton Rd., San Diego, CA 92121, Fax: 858 458 0136, gxf@accelrys.com

    Long established in the pharmaceutical industry, combinatorial techniques are rapidly becoming de rigueur in the development of new materials. However, owing to the variety of elements, wide range of synthesis techniques, and general lack of detailed structural information, data management can be far more challenging than for organic molecules. We have initiated a project using high throughput techniques to develop new catalytic materials for the reduction of NOx in automotive exhaust, with the goal of meeting the "Tier II" emissions standards that will be effective in 2007.

    While synthesis, screening, data analysis, and modeling all play roles in the catalyst discovery process, we will focus on analysis and modeling. In particular, we will discuss the ability of the data management system to: (i) incorporate any user-defined processing operation in synthesis; (ii) store, retrieve and analyze data in a way meaningful to the experimental end-user; and (iii) support non-collocated teams.

    2:05 64 Data storage and evaluation tools for high-throughput experimentation applied to heterogeneous catalysis
    Wolfgang Strehlau, hte Aktiengesellschaft, Kurpfalzring 104, Heidelberg 69123, Germany, Fax: +49 (0) 6221 7497 134, Wolfgang.Strehlau@hte-company.de

    High Throughput Experimentation is the rapid completion of two or more experimental stages in a concerted and integrated fashion. It typically comprises four interconnected stages, e.g. “Design”, “Make”, “Test” and “Model”. This cycle applies equally to the discovery and development of drugs, heterogeneous catalysts, or other materials. The data relating to and produced by all of these operations are housed in the MatInformatics system. The “Design” step leverages various computational tools, such as factorial design and other design of experiment (“DOE”) protocols, the evaluated results of past rounds of experiments, information already available from other sources, and the insights and intuition of the project team. Design of experiment (“DOE”) tools support the choice of which experimental points to sample in a complex parameter space. Full coverage of the parameter space defined by just the compositional dimensions of a multi-element inorganic system would require an infinite number of experiments. Thus the practitioner need decide (i) how many experiments to perform; (ii) at what increments each variable is sampled. The DOE tools provide an aid in the design process, with a value that can increase as understanding of the parameter space is accumulated in successive iterations through the HTE cycle. The catalyst testing profiles defined in the Design stage are typically applied in a parallel reactor system (“Test” stage). For data evaluation and mathematical modeling as well as search strategies dedicated to reduce the number of data sampling points a variety of different techniques are currently under discussion. The successful use of each of these mathematical approaches depends on the specific problem to which the algorithm is applied. A prediction of what data evaluation algorithm should be used to meet the research goals most economically and how the interaction between DOE tools and evaluation algorithm can be implemented most efficiently is difficult. The presentation illustrates some the design and evaluation tools mentioned above by means of practical examples derived from recent research programs.

    2:35 65 Rational design: An alternative to the combinatorial explosion
    François Gilardoni, Alasdair Graham, Ben McKay, and Brown Brown, Avantium Technologies B.V, Zekeringstraat 29, 1014 BV, Amsterdam, Netherlands, Fax: +31 (0)20 586 8085, Francois.Gilardoni@avantium.nl

    Automated and parallel methods are rapidly growing in chemical process research and development, initiated by the extensive implementation of combinatorial techniques in medicinal chemistry. Combinatorial chemistry libraries are generated by systematic permutation of the structural parameters of constituent building blocks. The diversity of these libraries cannot be exploited even with very high-throughput experimental platforms. Avantium has developed an alternative, called “Rational Design” (Figure 1), that maximizes diversity in the least number of experiments to create a performance-based model. This cost-effective approach for high throughput experimentation combines clustering, molecular modelling, statistical design of experiment and multivariate statistics. A model correlates properties, or descriptors, of a catalyst or a formulation component and process conditions, to its end performance, without requiring a complete mechanism. Avantium is actively involved in implementing further developments to render this technique faster, more cost-effective and integrated into an HTE platform, with tools and techniques to design “Rational Libraries”.

    3:15 66 Increasing the efficiency of high-throughput experimentation by use of experimental design and data-analysis techniques
    Arne L. Ohrenberg, and Andreas Schuppert, Bayer Technology Services, Bayer AG, Leverkusen D-51368, Germany, Fax: +49-214-3064801, arne.ohrenberg.ao@bayertechnology.com

    Experimental design is a powerful method to improve the efficiency of HTE to discover new materials, drugs or catalysts. The parameter space of screening experiments is usually high-dimensional and the variables are possibly discrete. The response surface of the screened systems can be very rugged, characterized by smooth planes as well as steep and narrow ascents of abundant suboptima. These conditions make the exclusive use of the classical statistical experimental design and data analysis inappropriate. Evolutionary strategies, neural networks and data mining may be an efficient alternative. On various examples, we show the practical benefit of design strategies which combine different techniques. The selection of the methods depends on the nature of the respective HTE-problem. An optimal design strategy makes HTE more efficient, and reduces research costs and time to market. Furthermore, the early application of a design strategy enables reliable statements about the feasibility of the research project.

    3:45 67 Iterative experiment design
    Steven G. Schlosser1, Alan J. Vayda1, Erik J. Erlandson1, Maureen Bricker2, Ralph Gillespie2, and J. W. Adriaan Sachtler2. (1) NovoDynamics, Inc, 123 N. Ashley, Ann Arbor, MI 48104, Fax: 734-205-9101, steve@novodynamics.com, (2) UOP LLC

    Materials discovery is an iterative process. Experiments are designed, materials are synthesized and tested, analyses are performed, and new experiments are designed. The combinatorial approach and specialized high-throughput equipment speeds up the process but does not solve the fundamental problem of how to drive the discovery process. Traditional statistical experiment design approaches are better suited to single experiments. Traditional analysis approaches are not easily linked to experiment design tools. A new approach is described in this paper which utilizes highly integrated experiment design and predictive modeling tools. The experiment design tool features an optimal coverage algorithm for placement of experimental points within complex multidimensional regions of iterest while taking into account previously tested points. The predictive modeling tools operate on data with arbitrary point placement and do not require regular placement along the various axes. These integrated tools have been applied to the discovery of heterogeneous catalysts.

    4:15 68 Machine-learning models for high-throughput materials discovery
    Gregory A. Landrum, and Julie E. Penzotti, Rational Discovery LLC, 555 Bryant St. #467, Palo Alto, CA 94301, Landrum@RationalDiscovery.com

    In order for any model building methodology to be useful in high-throughput materials discovery, it is essential that it be both flexible enough to handle the complexity of the problems at hand and fast enough to not create a bottleneck in the discovery process. Machine-learning techniques satisfy both of these criteria.

    We have developed an ensemble approach to model building which provides both high accuracy and confidence estimates for each prediction. The flexibility and efficiency of our approach have been validated on a number of materials, catalysis, and life-science problems.

    Here we present an interpretable machine-learning model for the prediction of ferromagnetism in binary transition metal alloys, and the results of applying our ensemble approach to the prediction of Tc values in superconductors and Tg values in polymers. We will also discuss the selection of descriptor sets which enable high computational throughput for these problems.

    4:45 69 Computer-aided discovery of compounds with combined mechanism of pharmacological action in large chemical databases
    Alexey A. Lagunin1, Oleg A. Gomazkov1, Dmitrii A. Filimonov2, Nina I. Solovyeva2, and Vladimir V. Poroikov2. (1) V.N. Orekhovich Institute of Biomedical Chemistry of Rus. Acad. Med. Sci, Pogodinskaya Str., 10, Moscow 119121, Russia, Fax: (7-095) 245-0857, alex@ibmh.msk.su, (2) Institute of Biomedical Chemistry of Russian Academy of Medical Science

    The prediction of the spectra biological activity spectra for substances has been studied as a tool for the search of compounds with dual mechanisms of action in large chemical and combinatorial databases. Biological activity spectra of substance including pharmacological effects, mechanisms of action, mutagenicity, carcinogenicity, teratogenicity and embryotoxicity are predicted by computer program PASS (http://www.ibmh.msk.su/PASS) on the basis of their structural formulae. Relationships between pharmacological effects and molecular mechanisms of actions are identified with computer program PharmaExpert. The data about mechanism-effect relationships and prediction results of biological activity spectra allow user to quick select compounds with possible combine mechanism of action causing specific pharmacological effect. The search for potential antihypertensive compounds with dual molecular mechanisms of action in databases of commercially available compounds (AsInEx and ChemBridge, totally about 200000 compounds) is presented as example of highthroughput computer-aided drug discovery. Four substances, potential inhibitors of angiotensin converting enzyme (ACE) and neutral endopeptidase (NEP) were selected. The experimental testing of these compounds confirmed that they are inhibitors of ACE and NEP with in range IC50 10-5-10-9 M.

    8:30 70 Citation linking: How important is it?
    Suzanne Fedunok, Coles Science Center, New York University Bobst Library, 70 Washington Square South, New York, NY 10012, Fax: 212-995-4283, suzanne.fedunok@nyu.edu

    Conventional wisdom is that the more citation linking available to a reader of a scientific paper, the better. Publishers representatives are quoted to say "the publisher with the most links wins." This paper reports on a study of a small set of chemistry journals to determine the proportion of cited references hyperlinked and their importance to the understanding of the paper.

    9:00 71 Global submission and validation of experimental thermodynamic data using Guided Data Capture software: Benefits to authors, journals, and data users
    Robert D. Chirico1, Vladimir V. Diky1, Randolph C. Wilhoit2, and Michael Frenkel1. (1) Thermodynamics Research Center (TRC), National Institute of Standards and Technology (NIST), Mailstop 838.00, 325 Broadway, Boulder, CO 80305, Fax: 303-497-5044, chirico@boulder.nist.gov, (2) Texas Experimental Engineering Station, Texas A&M University System

    Guided Data Capture software (GDC) has been developed by TRC at NIST for mass-scale abstraction from the literature of experimental thermophysical and thermochemical property data. As of September 2002, the Editorial Board of the Journal of Chemical and Engineering Data established a new policy for submission and dissemination of experimental data with use of the GDC software at its core. Following the peer-review process, authors are requested to download and use the GDC software to capture the experimental property data accepted for publication. The output file from the GDC software is submitted directly to TRC. After additional consistency tests, the files are converted into an XML-based format (ThermoML) with software developed at TRC. Upon publication of the manuscript, ThermoML files are posted on the TRC Web site for unrestricted public access. Discussions are in progress for implementation of this process with other major journals in the field. Key features of the GDC software will be discussed together with benefits derived to authors, journals, and data users.

    9:30 72 Concept of metadata in scientific publications and the way from data to information
    Horst Bögel, Department of Chemistry, Martin-Luther-University Halle, Kurt-Mothes-Str. 2, Halle 06120, Germany, Fax: 49-345-5527664, boegel@chemie.uni-halle.de

    The amount of scientific data increases rapidly in many areas, e.g. in chemistry an bio-sciences. By new computer and communications technologies we have online access to world wide data collections, databases and online journals in full-text and we can search in the world wide web. Most of these resources we can access from our desk in the office and without any delay by ordering documents. So we are able to solve problems in much shorter periods of time. Sometime we have a very huge number of hits of quite similar data and it's not easy to find to that information we were searching for. A few questions should be raised: 1. Do these documents have a convenient structure for efficient and successful searching 2. Do they have the original data in a representation for an efficient re-use 3. Do they support automatic data transfer into databases and archives 4. Do they support the generation of multipurpose interoperability of data There are several models for associating resources and metadata. In text documents on the Web, descriptive information is most commonly embedded in the documents by using the META tags of the Hypertext Markup Language (HTML). These metadata can be created by the author itself or by the publisher. Creating and managing these metadata is often labour-intensive, and semi-automated sophisticated procedures are in progress. Usually the data and the metadata are combined and transported over the Web to the user. The browser at the client side of the network displays the data in a given layout. Publishers may want to use metadata in order to make the contents in their restricted resources and services visible to searchers. Extended Markup Language (XML) gives the possibility to separate content from layout which is collected in the Document Type Description (DTD) and Cascading Style Sheets (CSS). XML has a modular approach; an application is built from components. Chemical Markup Language (CML) and others (MathML) are useful for the representation of specific contents and to provide a more universal infrastructure for publishing. At the moment XML is increasingly widely accepted as an information infrastructure.

    10:00 73 Now that everything can be published, should we really publish everything?
    Anthony W. Czarnik, Sensors for Medicine and Science, Inc, 12321 Middlebrook Road STE 210, Germantown, MD 20874, awczarnik@s4ms.com

    If the term "publication" literally means, 'making information public,' then the digital age heralds a time when anyone can publish anything at anytime. When everything CAN be published, what SHOULD be published? The arbiters or these decisions- editors- have a more important role today than yesterday. They must decide not 'What is available to read' but rather 'What is important to read.' There's a big difference, and while the ultimate power of editors is now lessened the responsibility upon editors is increased. Editors who 'profess' what their professions believe, as codified by professional organizations, will have influence proportional to that of the brand of the organization. Publication in the Digital World will necessarily become a more pluralistic process.

    10:30 74 Chemistry journals: How I want to read in 2012.
    Steven M. Bachrach, Department of Chemistry, Trinity University, 715 Stadium Drive, San Antonio, TX 78212, Fax: 210-999-7569, sbachrach@trinity.edu

    Electronic media offer an opportunity for radically restructuring the way chemists communicate. As of 2003, the majority of STM publishers have only scratched the surface of its potential. In this talk, I will present a vision of the future of the chemistry journal, highlighting how technological innovations will dramatically enhance the information content of the chemistry article, enabling scientists to more effectively communicate and more efficiently assimilate information.

    10:00 75 Some stumbling blocks on the road to publishing chemistry on the web
    David P Martinsen1, Lorrin R Garson1, and Joseph E. Yurvati2. (1) ACS Publications, American Chemical Society, 1155 16th Street NW, Washington, DC 20036, d_martinsen@acs.org, (2) Journal Publishing Operations, American Chemical Society

    The publication of chemistry on the web allows a number of features to be included which are impossible to render in the print version. However, using these web-enhanced objects is not without difficulty. This paper will examine some of the problems encountered in receiving electronic documents, chemical structure files, animations, and VRML and the impact of these on both the review process and the publication process. Implications of the publication of these new types of objects on the long-term archive of the manuscripts will also be addressed.

    2:00 76 Nanoworld in chemical abstracts.
    Felix S Sirovski, Laboratory of Fine Organic Synthesis, Zelinsky Institute of Organic Chemistry, 47 Leninsky pr, 119991 Moscow, Russia, Fax: 7-095-135-5328, sirovski@gol.ru, Nadezhda Krukovskaya, Information Department, Zelinsky Institute of Organic Chemistry, and Valentina Efremenkova, Methodological Department, VINITI

    The intensive exploration of nanoworld began at the end of the last century. The works of Kroto and Iijima gave rise to “Sturm und Drang” in the field of carbon nanomaterials. That is illustrated by the graph below. The communication is devoted to pecularities of indexing of works devoted to nano-technologies in CA.

    2:30 77 Building an Internet chemistry business.
    Scott G. Hutton, ChemNavigator, Inc, 6126 Nancy Ridge Drive, San Diego, CA 92121, Fax: 858-625-2377, shutton@chemnavigator.com

    ChemNavigator is a growing company providing chemistry and cheminformatics services facilitated greatly by the Internet. Founded in 1999 as a pure e-commerce chemistry company, ChemNavigator has learned a great deal about what works and what doesn’t work with chemistry businesses on the Internet. This presentation will cover the key business and technical issues which ChemNavigator has learned play critical roles in the success of a chemistry business on the Web. Both ChemNavigator’s and ChemNavigator’s key client perspectives will be covered.

    78 Tools of Research course for chemistry graduate students.
    Patricia Muisener, and Katherine M. Whitley, Department of Chemistry, University of South Florida, 4202 E. Fowler Avenue, Tampa, FL 33620, Fax: 813-974-1733, muisener@chuma1.cas.usf.edu, kwhitley@lib.usf.edu

    Chemistry graduate students at the University of South Florida can prepare for many aspects of their research careers in this new required course. Co-instructors are the Chemistry Department’s Assistant Chair for Graduate Concerns and the Chemistry Librarian. First semester integrates training in information retrieval using specialized databases and web sites, in protecting intellectual property, in locating funding sources, in writing grant proposals and journal articles, in reviewing their colleagues’ work, in oral presentation, and in discussion for journal clubs. Guest speakers included a former NSF program officer, a sponsored research specialist, a patents specialist, and an officer from the Institutional Review Board. Second semester exposes the students to major instrumentation resources they will need for their research. Each week, expert guest lecturers will discuss and demonstrate an instrumental technique. A familiar comment from many professors is “I wish I’d had a course like that when I was a graduate student!”

    3:30 79 Classification of mass spectra using fuzzy logic inference engine.
    Jill R. Scott, Timothy R. McJunkin, and Paul L. Tremblay, Department of Chemistry, Idaho National Engineering and Environmental Laboratory, 2525 N. Fremont Ave., MS 2208, Idaho Falls, ID 83415, Fax: 208-526-8541, scotjr@inel.gov

    Previously, we automated our imaging internal laser desorption Fourier transform mass spectrometer. Mass spectral data are acquired at a rate of approximately 7200 files/hour. Manual analysis of these files would take several weeks for a trained operator; therefore, we developed an inference engine to automate the data analysis. The inference engine software is a fuzzy logic expert system designed to simulate the analysis that a human operator would perform. The cues that a human operator uses to classify mass spectra have been encapsulated into the fuzzy rule base. The inference engine can analyze 7200 files in approximately 20 minutes and prepare the output in a format for any commercial graphics program. A second inference engine is used to help refine the rule base for mass spectral assignment by gathering statistics on ions not currently part of the rule base, but may be candidates for making the rule base more robust.