#220 - Abstracts
ACS Chemical Information Division (CINF)
Fall, 2000 220th National Meeting
Washington, DC (August 20-24)
A. H. Berks, Program Chair
Grand Hyatt -- Franklin Square
|Virtual High-Throughput Screening - Receptor-based approaches - I|
|O. F. Güner, Organizer; M. Waldman, Presiding|
Progress toward a protein-ligand scoring function for fast docking.
Marvin Waldman, Paul Kirchoff, Jeff Jiang, and C.M. Venkatachalam, Molecular Simulations Inc, 9685 Scranton Road, San Diego, CA 92121, email@example.com
A novel scoring function for protein-ligand interactions developed to predict the binding affinity of drug candidate compounds docked into an active site will be presented. The function has been parameterized using a wide variety of protein-ligand systems and can predict binding affinities for systems for which other scoring functions perform poorly. Although it uses far fewer parameters than previously developed scoring functions, equivalent accuracy has been achieved with improved parameter transferability. The function is rapid to compute while accounting for steric or van der Waals interactions, complimentary electrostatic interactions, and desolvation effects. Comparison with prior literature scoring functions will be presented along with examples illustrating applications to compound prioritization and selection.
Computational geometry analysis of protein-ligand complexes.
Alexander Tropsha, and Jun Feng, School of Pharmacy, University of North Carolina at Chapel Hill, CB # 7360, Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204, firstname.lastname@example.org
Delaunay tessellation (DT) has been applied to a diverse set of x-ray characterized high resolution structures of ligand receptor complexes. Using nonhydrogen atom coordinates, DT generates an aggregate of space filling irregular tetrahedra with atoms as vertices. Tetrahedra formed naturally at the ligand-receptor interace incorporate both ligand and receptor atoms. Four body statistical potentials and a scoring function for ligand receptor recognition has been developed using various nuber of atom types for bot ligands and receptor proteins. Preliminary studies indicate that this novel scoring function affords a high correlation between experimental and calculated binding constants: the correlation coefficient is 0.7 for 57 diverse complexes, and 0.85 for 16 serine protease-inhibitor complexes. This scoring function can be successfully used to rank compounds resulting from high-throughput docking studies.
Putting the horse before the cart: Analysis and optimization of structure-based virtual screening protocols.
Andrew C. Good, Daniel L. Cheney, William E. Harte, Yi Li, Stanley R. Krystek, Donna A. Bassolino, John S. Tokarski, Terry R. Stouch, Yaxiong Sun, Malcolm E Davis, Deborah Loughney, Jonathan S. Mason, and Doree F. Sitkoff, Structural Biology and Modeling, Bristol-Myers Squibb, 5 Research Parkway, Wallingford, CT 06492, Fax: 203-677-7702, email@example.com
There has been much research into the development of new scoring functions for structure-based virtual screening. While some advances have been made in improving virtual screening results through this approach, in general progress has been limited. Here we highlight results obtained from DOCK studies designed to improve virtual screening via analysis of the screening phases that occur before scoring. Conformational analysis studies were undertaken on diverse PDB ligands using a variety of techniques. Search methods were compared via their ability to reproduce conformers close to the bioactive structure at sampling levels typically employed in virtual screening. In addition, 5 different target proteins each with associated active compound data sets were used to analyze the effect of docking variables such as ligand flexibility, site point definition and node sampling levels. The ranking of these active compounds when combined with a set of ~10000 "noise" compounds was used to compare screening enrichment levels and hence better determine optimum DOCK search paradigms. The results of these studies are discussed and their implications for the direction of future virtual screening research are highlighted.
Virtual Optimization of Chemical Libraries using Genetic Algorithm.
Alfonso Pozzan1, Andrew Leach2, Aldo Feriani1, and Mike Hann2. (1) Medicinal Chemistry Computational Chemistry, GlaxoWellcome S.p.A., v. A. Fleming 2, 37135 Verona, Italy, 37135 Verona, 37135, Italy, (2) Computational Chemistry, GlaxoWellcome, Gunnels Wood Road, Stevenage Herts SG5 2NY, UK. Email: firstname.lastname@example.org
One of the essential points in combinatorial library design concerns the selection of the monomers to be used as building blocks for the combinatorial synthesis of the final molecules. Currently, public databases like the ACD consist of many thousands of molecules suitable as monomers to react under combinatorial chemistry condition. Considering that the number of available monomers is increasing and that combinatorial chemistry technology is giving access to more and more chemical reactions, one of the major tasks for library design is to select the best set of monomers out of a large number of potentially reactants. For this reason we have developed in house a program called VOLGA (Virtual Optimization of chemical Libraries using Genetic Algorithm) which allowed us to optimize the design of a wide class of chemical libraries by choosing among different fitness functions. When VOLGA was planned, particular attention was paid to obtaining a program that could use any fitness function defined by the user. Fitness functions that have been successfully used to date include: 3D pharmacophore fitting, 2D similarity/dissimilarity measures, drug like profiles and QSAR derived models. The program allows optimization of libraries ranging from few tens up to 10000 molecules. Optimization can be run by starting from potentially huge virtual libraries ranging from a few thousand to several millions molecules (i.e. all those that could be generated by combinatorial explosion of all the reactants considered in the design model). The aim of this paper is to critically analyze the different methods and scoring functions that have been used along with details on how classical GA theory was adapted in order to optimize combinatorial libraries. Advantages and drawbacks of this method are discussed.
Use of Markush Structure Analysis Techniques for Rapid Processing of Large Combinatorial Libraries.
John M. Barnard1, Geoff M. Downs1, and Robert D. Brown2. (1) Barnard Chemical Information Ltd, 46 Uppergate Road, Stannington, Sheffield, S6 6BX, United Kingdom, (2) MSI Molecular Simulations Inc., 9685 Scranton Road, San Diego, CA 92121-3752. Email: email@example.com
A Markush structure is an extremely compact way of representing a large virtual combinatorial library, in which common parts of the individual product molecules are shown only once. Using extended versions of algorithms originally developed for storage and retrieval systems for Markush structures from chemical patents, we have written software to generate structural fingerprints for the molecules in a library, by direct analysis of a Markush representation. This can speed up the analysis process by orders of magnitude, as compared with approaches based on emumeration of the individual molecules, and the program can be linked to routines for fast clustering of library members, and calculation of numerical diversity measures. The principles behind the algorithms used will be described, and results obtained using the software for analysis of libraries will be presented. Issues concerning the optimisation of Markush representations for this type of analysis will be discussed (especially where "non-regular" libraries and variable scaffolds are involved), and current work on building such representations from input based on sequences of reactions and precursor molecules described. Opportunities for use of these techniques for rapid generation of additional descriptor types will also be mentioned.
Penalty-biased diversity. Design of diverse, drug-like libraries.
Moises Hassan and Marvin Waldman. Molecular Simulations Inc., 9685 Scranton Road, San Diego, CA 92121. Email: firstname.lastname@example.org
Diverse libraries in which molecules are restrained to exhibit properties similar to those of known drugs are expected to find a higher percentage of active compounds in lead discovery programs which will prove more suitable as viable drug candidates. Diverse, drug-like libraries are designed by optimizing R-group fragments to simultaneously maximize the molecular diversity and minimize a penalty function based on the specified properties of the products. Two types of penalties are implemented. The first uses property ranges, penalizing molecules when their calculated descriptors are outside desired ranges. The second is based on a property distribution (profile) of the library, penalizing a library when the profile for a given property differs from the desired one. Several applications of this approach to library design are presented, including biasing libraries to satisfy Lipinski-like rules, focusing libraries to exhibit properties found in molecules with a specific biological activity, designing libraries that exhibit a desired property profile, such as a uniform molecular weight distribution to facilitate identification by mass spectroscopy, and combinations of these approaches.
Lead-Hopping and Library-Hopping by Topomer Shape Similarity Searching of Vast Virtual Libraries.
Katherine Andrews-Cramer and Richard D. Cramer. Tripos Inc., 1699 South Hanley Road, St. Louis, MO 63144. Email: email@example.com
Using the ChemSpaceTM technology, seven libraries containing 3.8 x 1012 virtual molecules were searched, using query structures that were chosen from each of 34 articles published in the Journal of Medicinal Chemistry in 1998, in order to represent a diverse set of lead structures active toward different known targets. The results of the searches will be considered from several perspectives: 1) How often are similar structures identified? 2) Are hits which are both novel and intuitively convincing obtained? 3) How shape similar are the hits to the query structure, when an alternative shape assessment tool is used? 4) What can be said about the potential biological activity of the hits, based on those found in the literature? 5) From the hitlists, can libraries be designed for lead follow-up synthesis which are amenable to high-throughput synthesis and combinatorial chemistry?
Creating maximal diversity in a HTS screening library: a statistical approach.
Jan T. Pedersen, Anne Marie Munk Joergensen, and Peter Faester Nielsen. Acadia Pharmaceuticals, Fabriksparken 58, Glostrup, DK-2600, Denmark. Email: firstname.lastname@example.org
The Acadia in-house HTS screening library currently contains ~120,000 compounds. We have attempted to build a library with maximal diversity and increased information content for receptor screening and profiling. The library contains both a diverse set of compounds from the ``known'' chemical space together with a large set of common drugs. The diversity measures that we use are based solely on structure (2D) and physical chemical properties. A fast graph-theoretical comparison algorithm is used to evaluate structural similarities and structural properties. Distributions of these properties and the correlation between different distributions are used to compare and evaluate compound collections that are potentially included in the screening library. We have used a conditional probability formalism, where a ``random library'' is the common reference state for comparison of libraries and evaluation of their diversity. We have evaluated this library in a large number of GPCR screenings and analyzed the data using phylogenetic clustering of the screening hits. The phylogenetic clustering uses the formalism of phylogenetic comparison from sequence analysis. The basis of the phylogenetic tree is in this case not sequence similarity but similarity of the chemical graphs. This appears to be a simple and efficient way to evaluate large numbers of screening hits and an efficient way to identify unique HTS hits. The basis of the phylogenetic clustering will be outlined and demonstrated on a recent dataset. It will also be demonstrated how this evaluation method can be used to automatically classify clusters of HTS hit structures according to known drugs.
A simple method to simultaneously increase diversity and favorably enrich the content of chemical libraries.
Ryan T. Koehler, Steve L. Dixon, and Hugo O. Villar. Computational Chemistry Laboratory, Telik Inc., 750 Gateway Blvd, South San Francisco, CA 94080. Email: email@example.com
To streamline pharmaceutical discovery, chemical libraries employed for routine screening should be both diverse and enriched with "drug-like" compounds. We describe a simple new algorithm for simultaneously addressing both objectives, providing a means of strategic compound selection to expand screening libraries. The algorithm exploits differences in descriptor distributions associated with different chemical libraries to identify those additional compounds that are most different from compounds currently comprising a screening library and most similar to compounds comprising a library to be emulated. Tests with publicly available compound databases (ACD, CMC, NCI) demonstrate method behavior and effectiveness. Results of spiking experiments, in which "drug-like" CMC compounds are spiked into sets of ACD compounds then ranked for selection, are presented. The algorithm performs substantially better than random. Our algorithm is general in principle, operating with any set of descriptors, similarity measure, and specification of reference libraries.
Integrated Informatics for Library Design and Analysis.
Tim Mitchell, Cambridge Combinatorial Ltd., The Merrifield Centre, Rosemary Lane, Cambridge CB1 3LQ United Kingdom. Email: mailto:firstname.lastname@example.org
The Atlas Informatics system developed by Cambridge Combinatorial is a set of integrated tools to support library design, control of automation for synthesis, analysis and purification, registration and reporting. Most of these processes are designed to be performed by chemists at their desktops. In the cases where specialised skills are required, data and information exchange is designed to be seamless. The library design process involves template and precursor selection, virtual library enumeration, registration and profiling. Precursor selection is critically dependent on the amount of information available about the target biological receptor. Diversity assessment can be used in both precursor and product selection, but is it usually far more productive to profile the library in terms of descriptors of physico-chemical properties (e.g. LogP, Hydrogen bonding, potential toxicity, solubility). If the library in being designed around a Pharmacophore, then the Pharmacophore content of the virtual library also needs to be confirmed. Most importantly, the computational design of a library has to be compatible with the practical considerations of synthesis and analysis-fully enumerated libraries, 96-well format etc. The Atlas Informatics system provides tools for the profiling of a virtual library by a wide range of descriptors. Profiling of the virtual library products allows for the rapid identification of desirable and undesirable monomers and the rapid optimisation of focussed library design.
Convention Center Room 220
|Integration of Primary and Secondary Literature on the WWW|
|C. Huber, Organizer, Presiding|
The new chemical information environment.
Harry F Boyle, Product Marketing, ACS Chemical Abstracts Division, P.O. Box 3012, Colmubus, OH 43210 and Susan A Barclay, New Product Development, ACS Publications Division, 1155 16th Street, NW, Washington, DC 20036. Email: email@example.com
The evolution of indexes, computer services, and web technology has made it progressively easier for scientists to browse published information broadly or pinpoint specific items of interest. Traditional print publishers and patent offices are offering electronic versions of their documents and traditional information providers are acting as document aggregators. A working alliance of the ACS Publications and CAS divisions, other scientific journal publishers, patent offices, and the STN partner organizations is now opening a path to the next level of literature exploration and acquisition. A new environment is taking shape, largely unconstrained by the traditional boundaries that separate one publisher from another, primary sources from secondary ones, and in-house holdings from external sources. This paper will discuss the emergent chemical information environment of the future.
Linking between content providers: the ISI experience.
Chris Leonard, New Product Development, Institute for Scientific Information, 3501 Market Street, Philadelphia, PA 19104. Email: firstname.lastname@example.org
A researcher in the digital environment expects that relevant information regardless of its location or structure will be brought to the desktop. In response to this expectation, content providers have created partnerships that facilitate links between various forms of information from different organizations. Still in the initial phases, these alliances are the basis of integrated content in digital libraries. ISI Links is one of these partnership initiatives. With the Web of Science® forming the basis for effective retrieval and navigation, ISI is working with publishers to build links to a variety of content including full text, chemical structures and patent information. Consequently, ISI has faced some of the significant issues that all content providers face in building the linked environment. What elements affect the success of linking content? Which partnerships will add value to proprietary content? What impact do format considerations, data transfer processes, and administrative issues have on linking?
Creating and maintaining dynamic links between database citations and their corresponding fulltext files.
Margery Tibbetts, California Digital Library, 1111 Franklin Street, Oakland, CA 94607-5200. Email: email@example.com
This presentation will discuss how links are being created and maintained between articles in the California Digital Library (CDL) hosted databases and their corresponding full text files. The CDL maintains some full text files of its own but the focus of this presentation will be on linking to full text files maintained at the publisher's site. Issues to be covered in detail include how the CDL linking system is designed, the various linking algorithms used (SICI, DOI, etc.), access issues, and some of the problems we have encountered while developing the system. The early experience of the CDL with article images and the general architecture of the CDL system will covered briefly.
LitLink: dynamic linking of the primary and secondary literature.
Steven Young, MDL Information Systems, Inc., 14600 Catalina Street, San Leandro, CA 94577. Email: firstname.lastname@example.org
Dynamic linking of the primary and secondary literature offers many advantages over static linking. Static linking normally requires that all the necessary linking information be stored and maintained in a centralized database. The databases used for static linking are typically limited to selected sources of the primary and secondary literature. Dynamic linking offers the advantage of interlinking any primary or secondary literature source. LitLink, an example of a dynamic linking system, uses a citation as input to automatically generate and submit it as a query to the appropriate literature sources. Approaches and applications utilizing LitLink as an electronic article broker will be presented.
Integrating primary and secondary literature - patents versus journals.
Breda F. Corish, Product Development, Derwent Information, Holbrook House, 14 Great Queen Street, London, WC2B 5DF, United Kingdom and Jeff Clovis, New & Corporate Products, ISI (Institute for Scientific Information), 3501 Market Street, Philadelphia, PA 19104. Email: email@example.com
Derwent Information and ISI specialise in the creation of value-added secondary databases focusing on patents and journals, respectively. Both companies see the same demand for access via standard Web browser technology to their value-added secondary databases with seamless linking to the corresponding primary level data. In meeting this need, a common problem lies in the fact that not all primary data sources are available in a suitable electronic format. From a commercial perspective, providing access to primary patent documents is relatively simple as this material is already in the public domain. For journals, this is complicated by the need to have separate business agreements with each of the primary publishers who hold journal copyright. These topics to be explored with reference to: ISI's "Web of Science" (WOS); links from WOS to journal full text; links from WOS to "Derwent Innovations Index" (DII); development plans for linking DII to patent fulltext.
Authors' e-mail address and URL be added to Chemical Abstracts.
Shu-Kun Lin, firstname.lastname@example.org, http://mdpi.org/lin/, MDPI, Molecular Diversity Preservation International, Sangergasse 25, Basel CH-4054 Switzerland. Email: email@example.com
It is suggested that CAS add authors' e-mail addresses, if available in the original publications, to the Chemical Abstracts entries. Authors' URL or website addresses also can be included. These may be treated as an important part of a full address. MDPI's journals Molecules ( http://mdpi.org/molecules) and Entropy ( http://mdpi.org/entropy) publishes authors' e-mail address, URL, telephone and fax numbers, in addition to their full surface mail address. E-mail address is normally concise, particularly useful and should be included in abstracts. To include e-mails will be of great convenience for readers to request for reprints and other convenient contacts with the authors or for discussions. Old e-mail address might be used even if you move to a new place. E-mail is very fast. It is the least expensive way of communication. Here, I have successfully put my e-mail firstname.lastname@example.org and URL http://www.mdpi.org/lin/ in the author's address of this abstract and hope the modulators do not delete them. Some other arguments and a summary of the discussions at CHMINF-L mailing list (CHMINF-L@LISTSERV.INDIANA.EDU, http://listserv.indiana.edu/archives/chminf-l.html) during February 1999 will be presented.
Convention Center Room 217
|Combinatorial Chemical Information|
R. Snyder, Organizer
T. Wright, Presiding
Novel methods for assessing and comparing the diversities of chemical libraries.
Robert S. Pearlman1, Xiao C. Wang2, Ying Su2, and Michael Green2. (1) Laboratory for Molecular Graphics and Theoretical Modeling, University of Texas, College of Pharmacy, Austin, TX 78712, (2) Trega Biosciences, Inc., 9880 Campus Point Drive, San Diego, CA 92121. Email: email@example.com
Standard methods for assessing diversity and comparing libraries are based on nearest-neighbor statistics computed using Tanimoto "distances" between molecular fingerprints. However, such distance-based methods yield relatively crude information. We will present several cell-based methods which offer substantial advantages. We will introduce the concepts of "library fingerprints" and "library vectors." We will indicate how the distributions of compounds in two libraries can be compared using the well-known Carbo and Hodgkin indices computed from library vectors and we will also indicate how library fingerprints can be compared using the Tanimoto index and novel binary forms of the Carbo and Hodgkin indices. Finally, we will introduce the concept of "fraction overlapped" as an ideal and intuitive approach for library comparisons.
Combinatorial library design and diversity analysis.
Xiao Chuan Wang1, Ying Su1, and Mike Green2. (1) Computational Chemistry, Trega Biosciences Inc., 9880 Campus Point Dr., San Diego, CO 92121, (2) Chemistry, Trega Biosciences Inc. Email: firstname.lastname@example.org
Combinatorial chemistry is speeding up the process of drug discovery. How can we design a drug like combinatorial library with good diversity? How should we compare two libraries to avoid redundancy in library production and to increase the potential of finding active compounds from Trega libraries? In order to answer these questions we have designed and developed a strategy called Quasi-Virtual-Library-Cherry-Picking (QVLCP) to assist Trega chemists in library production and to ensure the intra-library diversity. We have collaborated with Prof. Robert Pearlman and applied the new measure, percentage of overlapped cells (POC) within All Drug Space in our inter-library diversity analysis to compare how different one library is from others and to design new library. Finally, we have developed the Trega Diverse Bundle strategy to generate subsets of each library that are representative of the full diversity present in the library.
Integrated structural, synthetic, and analytical combichem informatics.
David Chapman, Afferent Systems, Inc., 1550 Bryant Street, Suite 760, San Francisco, CA 94103. Email: email@example.com
Combinatorial synthesis produces a deluge of data, including compound structures, synthetic protocols, sample information such as vessel locations and synthetic history, and analytical information (spectra and chromatograms). I will describe a chemistry knowledge base system that integrates, organizes, and makes sense of these divergent data types, and interfaces with both synthetic and analytical instruments.
SLIMS - A web-based solution for sample, structure and spectral management.
Antony John Williams, Val Kulkov, and Alexey Karezin. Advanced Chemistry Development, 133 Richmond Street West, Suite 605, Toronto, ON M5H 2L3, Canada. Email: firstname.lastname@example.org
Laboratory Information Management systems are essential to allow corporate-wide access to analytical information. A number of efforts have been made over the years to implement flexible LIMS but, in general, these have failed to address the flexibility of interface and features required in analytical and R&D environments that require access to molecular structures and graphics intensive spectral displays. We have developed a web-based system for managing sample, spectral and associated molecular structure information, SLIMS. This user-friendly system links a unique sample identifier to sample information, a chemical structure, associated spectra and final reports of analysis. This full-featured sample manager allows desktop access to sample information as well as access to a structure database for accessing historical reference data. The system has been configured to allow full integration with desktop helper applications including standard desktop structure drawing packages and spectral display packages. We will report on our continued advances in this area.
Combinatorial Chemistry: integration with the research environment.
Maurizio Bronzetti, MDL Information Systems, 14600 Catalina Street, San Leandro, CA 94577. Email: email@example.com
The adoption of high speed technologies in Genomics, Chemistry and Biology has pushed research organizations to explore new ways of capturing data, organize results and samples, avoid duplication of effort and emphasize data rationalization. Personal and Team productivity together with economics and patent regulation, are often the criteria that drive these changes.Combinatorial Chemistry especially has challenged data management by causing proliferation of chemistry, samples and analytical data: chemistry (classical and combinatorial) should be captured correctly and consistently if data have to be searched and mined later. Moreover, the relationship between reactants, samples, products, batches, side products and protocols, should be carried along with the synthesis experiment through purification and analysis. This presentation will introduce a new scalable system designed to manage compound libraries and classical synthetic experiments in the context of multiple project teams.
Convention Center Room 220
|Alternative Careers in Chemistry|
|A. Twiss-Brooks, Organizer, Presiding|
Employment and marketability: ACS Career Services and you.
Jean A. Parr, Department of Career Services, American Chemical Society, 1155 16th Street, NW, Washington, DC 20036. Email: firstname.lastname@example.org
No one can accurately predict tomorrow's economy, but recent data about careers in chemistry tell us that: the market will remain tight; chemists will make more frequent job changes; chemists will apply their knowledge and skills to a wider range of professions and industries. This presentation will discuss these trends and offer recommendations for staying marketable. ACS career services, designed to help members address these issues, will be outlined with special emphasis on the new online job service available to members.
Strategic partnering for knowledge management.
Suzanne P. Cristina, UTC Information Network - Hamilton Standard, United Technologies, One Hamilton Road, Windsor Locks, CT 06096. Email: email@example.com
Chemists/ Chemical engineers will realize significant time and cost savings partnering with information professionals (librarians) to facilitate knowledge management within their organizations. Information professionals can leverage expertise in information organization and use their understanding of database content and structure to identify the information needs of their organizations and make specific recommendations for internal and external databases to be shared over the organization's Intranet. As an integrated partner on project/research teams, the information professional is able to contribute proactively not reactively, anticipating and analyzing information specific to a research project. The roles of the Research Analyst, Information Manager and Knowledge Analyst will be described in detail as well as the strategic significance of the MLS (Master of Library Science) combined with a technical/chemistry degree. Several Intranet projects will also be discussed.
The study of applied organic chemistry in graduate school and at a remote university.
Forrest S. Schultz, Chemistry Department, University of Wisconsin-Stout, Jarvis Hall, SW303D, Menomonie, WI 54751. Email: firstname.lastname@example.org
This presentation explores an alternative pathway for the graduate study of organic chemistry. In particular, the study of the interface between organic chemistry, materials chemistry, and chemical engineering will be presented. Career opportunities and possibilities will be presented. The presentation will also discuss the importance of online information when different fields of study are brought together. The necessity of online information by an applied chemist at a remote university will be explored.
Managing dynamic chemical information environments in industry.
Keith P. Schreiber, Business Information Services, Procter & Gamble, Ivorydale Technical Center, 5299 Spring Grove Ave., Cincinnati, OH 45217. Email: email@example.com
Combinatorial chemistry, proliferation of chemical publications, accelerated product development cycles - industrial R&D is increasingly dependent on effective use of information. Chemical information professionals draw upon expertise in chemistry and in information tools and environments to maximize this effective information use. The result: personal involvement across a vast array of projects, an opportunity to work with a tremendous variety of individuals, and participation in one of the most dynamic fields around at a pivotal moment of the information age.
Look! Up in the sky! It's a chemist! It's a librarian! It's both!
F. Bartow Culp, Mellon Library of Chemistry, Purdue University, West Lafayette, IN 47907-1538. Email: firstname.lastname@example.org
In the Internet age, isn't the concept of a librarian outmoded? If easy and almost unlimited information access is available to anyone at the click of a mouse button, why should a chemist consider librarianship as a career? There are lots of reasons, including excellent job prospects, a high degree of career satisfaction, plus the chance to be a central player in the current redefinition of how science is done. The fundamental skills of a librarian have always been the ability to organize knowledge and make it available to others. And for most of the history of the science, those same skills were an integral part of a chemist's profession. In this age of high-entropy information, the felicitous combination of abilities that chemist/librarians bring to their jobs does not simply have the power to organize and access chemical information; it can also enhance the value of that information and improve the entire communication process itself. We will present examples of how chemist/librarians are integral participants in the advancement of both of their professions.
From laboratory to law office: a career as a patent attorney.
Anita Varmas, Foley, Hoag & Eliot LLP, One Post Office Square, Boston, MA 02109. Email: PXC@FHE.COM
The field of patent law provides opportunities to chemists seeking a career outside of the laboratory that allows them to utilize and apply their scientific knowledge. Opportunities exist to practice in the Patent Office, in companies and in law firms. Many of the skills that scientists use in their scientific endeavors translate well to the practice of law, including organizational and analytical skills and problem-solving ability. This presentation will address the various types of opportunities available in this exciting field, and some tips for determining whether it may be right for you.
|4:05||Division Business Meeting|
|4:30||Open Meeting: Committees on Committees on Publications and on Chemical Abstracts Service|
Convention Center Room 217
|Combinatorial Chemical Information|
R. Snyder, Organizer
R. Delmendo, Presiding
Automated laboratories for high-density microplate screening: Merging novel and traditional technologies.
Franz E. Leichtfried, Robocon GmbH, Davidgasse 85 - 89, Vienna A-1100 Austria. Email: email@example.com
Over the last three years a new trend for assay miniaturization caused by the desire for faster identification of drug leads at lower costs per test has gained considerable momentum. High-throughput screening in 384-well microplates, which was unheard of just a few years ago, has now become routine in automated screening laboratories generating up to 100.000 data points per day. Such laboratories use novel readers and other work stations, which have adapted standard technology to the 384-well plate format. As screeners move to even higher microplate well densities and novel "screening chip" technologies, nanoliter pipetting devices, imaging readers and other novel devices will have to be merged with instruments, which have traditionally been used for automated microplate processing. In some cases, traditional and novel methods can both be applied to reach the same goal. In this paper traditional and new technology building blocks are investigated as to how they can be fitted into automated laboratories reaching data outputs of a quarter million data points per 24 hours and beyond.
Application of Version Spaces to the Analysis of High-Volume Structure-Activity Relationships.
George S. Cowan, Jr. and C. John Blankley, Sr. Parke-Davis Pharmaceutical Research, 2800 Plymouth Road, Ann Arbor, MI 48105. Email: firstname.lastname@example.org
In the new world of high volume screening and combinatorial chemistry, new methods are needed to rapidly analyze the results of such assays for SAR information. We describe the application of version spaces, a machine learning methodology described originally by Mitchell, et al. (1978, 1997) to this problem. This method organizes all possible "concepts" that agree with a given set of data. In this case, data is taken to be biological activity and concepts are taken to be sets of structural fragments associated with active compounds. Concept models are expressed as fingerprint-like bit strings with three possible values for each bit: 1=required present, 0=required absent, and #="don't care". Training is done on both active and inactive compounds and unknowns can then be compared to the various concept models to see if they are examples of active or inactive molecules. The fragments responsible for the classification are also identified.
High throughput screening software tools for analytical spectroscopy.
Antony John Williams, Advanced Chemistry Development, 133 Richmond Street West, Suite 605, Toronto, ON M5H 2L3 Canada. Email: email@example.com
High throughput screening by Mass Spectrometry and Tubeless NMR have become the techniques of choice for the analysis of combinatorial libraries. Coupling automation with flow NMR and MS technology now allows spectra to be acquired from a combinatorial plate in only a few hours. This routine acquisition of large amounts of data can indeed increase the rate of throughput for such analyses but the technology can lead to an inordinate amount of data with no appropriate manner to track the information in a facile manner. Since the chemist can often offer suggestions for the structures expected for each vial on the plate it would be appropriate to attempt to relate the experimental spectra to those predicted for the structure. The development of software to allow the databasing of MS and NMR spectral curves associated with molecular structures, and the application of NMR prediction algorithms to allow comparison of experimental and predicted spectra will be discussed.
Managing Combinatorial Data in Excel.
Harold Helson and Michael Swartz. CambridgeSoft Corporation, 100 Cambridge Park Drive, Cambridge, MA 02140
Most combinatorial chemists manage their experiments with spreadsheet programs such as Microsoft Excel. In most cases, these chemists must manage their chemistry tasks such as the enumeration of product molecules outside Excel and then import this data into Excel. CambridgeSoft has developed solutions that provide for combichem data management directly inside Excel. This makes it possible for users to integrate combichem specific chemistry intelligence directly inside the spreadsheet experiment managers they have already developed. This presentation will focus on a sample application that shows users how they can manage their combichem data directly inside an Excel application which in turn integrates with ChemDraw, the drawing program preferred by most chemists.
Intelligent data visualization for large sets of chemical structures and property data
Glenn J. Myatt, Paul E. Blower, Jr., Kevin P. Cross, and Wayne P. Johnson. Research and Development, Columbus Molecular Software, Inc., Business Technology Center, 1275 Kinnear Rd., Columbus, OH 43212. Email: firstname.lastname@example.org
Combinatorial chemistry and high-throughput screening has dramatically increased the speed and quantity of compounds that are made and tested for biological activity. We will present a chemically intelligent data visualization computer program designed to process and intelligently categorize the large volumes of chemical and biological screening data being generated. The program organizes sets of structures using a taxonomy of approximately 20,000 familiar drug-like features, such as heterocycles and topological pharmacophores. The sets of structures are graphically presented, for example, using histogram bars that represent the number of structures in the set. Sets containing a statistically high number of active structures are highlighted suggesting the common structural feature is highly correlated with biological activity. This presentation will introduce the program and demonstrate its application to drug discovery, combinatorial library design and diversity analysis.
Convention Center La Nouvelle Ballroom B/C
|Sci-Mix Poster Session|
|A. H. Berks, Organizer, Presiding|
|7:00 PM - 9:00 PM|
Information services on the intranet: where we are and where we want to go..
A Web-Based Engineering Chemistry Database.
Collaborative electronic notebook systems: A technical knowledge management paradigm beyond LIMS, Groupware, and the Web.
Electronic laboratory notebook systems for R&D and testing labs: Status of creation and acceptance in industry.
Patents in Combinatorial Chemistry.
ChemIDplus: an experimental public chemical information and structure search system.
Perlita M. Liwanag, Vera W. Hudson, and George F. Hazard, Jr. Division of Specialized Information Services, National Library of Medicine, Bldg. 38A, Rm. 3N-315A, 8600 Rockville Pike, Bethesda, MD 20894. Email: email@example.com
ChemIDplus is a web-based search system that provides access to structure and nomenclature authority files used for the identification of chemical substances cited in National Library of Medicine (NLM) databases. ChemIDplus also provides structure searching and direct links to many biomedical resources at NLM and on the Internet for chemicals of interest. The database contains over 349,000 chemical records, of which some 56,000 include chemical structures. ChemIDplus is searchable by Name/Synonym, CAS Registry Number, Molecular Formula, Classification Code, Locator Code, and Structure. The Locator Codes are hyperlinked at the substance level to biomedical databases at NLM and on the Internet and to the NLM Superlist compilation of chemical substances of interest to federal and state regulatory agencies. Ease of navigation from the system's Locator Display Page to the other web sites and vice versa is a characteristic feature of ChemIDplus. In addition to data queries, the system provides three types of structure queries: Substructure Search, Similarity Search, and Exact Structure Search. ChemIDplus facilitates structure searching by providing two options that eliminate the need to draw the structure queries to allow novice users to take advantage of the structure searching capability of ChemIDplus. One option "Use Structure for Query" pastes the retrieved structure from a previous query to the Structure Input Box while the other option "Use Structure for Similarity" starts an immediate search for similar structures. The database is maintained using ISIS(tm)/Host and Oracle® and uses Chemscape Server(tm) to integrate the retrieval of structural data with related textual information.
Bioinformatics in the CAS databases.
Leo W. Collins, Eva M. Hedrick, and Anish Mohindru. Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202. Email: firstname.lastname@example.org
Bioinformatics is generally regarded as the information of genomics research. Demand for bioinformatics has increased dramatically in recent years due to the advancement of the Human Genome Project, and other projects having the expressed objective of determining gene sequences. Since 1907, CAS has abstracted and indexed the scientific literature, including references and literature from genomic and other biologic sources. More than 37% of the abstracts in the CAS Chemical Abstracts are from biochemical sources. In addition, more than 18% of the CAS Registry File contains biosequences collected from the journal literature, patents, and the Human Genome Project. This vast collection of biosequences and related biological information makes the CAS databases a valuable source of information for biotechnology research and process development. This presentation will illustrate with examples the comprehensive content of the biosequences, patents, and related information in the CAS databases.
Modular chemical descriptor language (MCDL) and unique structure representation.
Michael N. Burnett, A. C. Buchanan, III, and Andrei A. Gakh. Chemical and Analytical Sciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, P.O. Box 2008, Oak Ridge, TN 37831-6197. Email: email@example.com
Several approaches exist for representing molecular structures with linear descriptors, such as the IUPAC and ACS nomenclature systems and the more computer-oriented Daylight SMILES system. All of these require relatively complex rules to create unique descriptors. A new simplified modular system has been developed for representing molecular structures uniquely. Molecules are described by their structural fragments (1st module) and the connectivity of these fragments (2nd module), and, if needed, a module providing the stereochemistry is included. For example, the unique descriptor of R-2-bromopentane CH3CHBrCH2CH2CH3 is CBrH;2CHH;2CHHH[2,4;3;5]/SA:1,Br,H,4,2/. The simplicity of the approach arises from its use of simple ASCII ordering (dictionary order in English) to prioritize structural features in place of complicated rules on the relative priorities of functional groups. Additional information about the molecule, such as atom coordinates and physical properties, can be included in the descriptor as a set of supplemental non-unique modules. [This research was sponsored by the U.S. Department of Energy Initiatives for Proliferation Prevention (IPP) program.]
Implementation of Chinese drug database searching system.
Min He and Jiaju Zhou. Laboratory of Computer Chemistry, Institute of Chemical Metallurgy, Chinese Academy of Sciences, Beijing, 100080, China. Email: firstname.lastname@example.org
Chinese drug has played an important role for Chinese people to treat diseases and protect health since ancient times. In the past thousand years, the use of Chinese drug has generated a great deal of information, which spreads around many categories of Chinese drug literatures and books. For absence of the scientific study of Chinese drug, the mode of action of the Chinese drug is not clear. To support scientific study on Chinese drug, we have developed a Chinese drug database searching system (CDDBSS). The platform of system is Windows NT, while database management system (DBMS) is Microsoft SQL Server 6.5. The information system of Chinese drug database consists of four parts: (1) the main information needed for Chinese drug mechanistic studies, such as physical and chemical properties, pharmacology, clinical application data, etc.; (2) chemical components; (3) molecular structures; (4) bio-activity data. All of information can be searched in specified mode by user. The transfer of 3D chemical structure is used by using chemical VRML files.
Recent advancements in the development of SENECA, a computer program for Computer Assisted Structure Elucidation based on a stochastic algorithm.
Christoph Steinbeck, Computational Chemistry Group, Max-Planck-Institute of Chemical Ecology, Tatzendpromenade 1a, Jena 07745 Germany. Email: email@example.com
Recent advancements in the development of SENECA, a new program package for Computer Assisted Structure Elucidation (CASE), currently being developed in our group, are outlined. Seneca is an object-oriented, platform-independent approach using the programming language Java. It features a client program for input or import of spectral data and for setup of the structure elucidation process, as well as a structure elucidation server that is distributed over a network of multiple machines of commodity type. Results are presented that demonstrate the promising performance of the stochastic algorithm implemented in SENECA. This algorithm optimizes a multi-parametric target function towards maximum similarity between the real and the back-calculated set of spectra.
Generation of VRML for use in 3D chemical structure display on the Internet.
Min He and Jiaju Zhou. Laboratory of Computer Chemistry, Institute of Chemical Metallurgy, Chinese Academy of Sciences, Beijing, 100080, China. Email: firstname.lastname@example.org
The Internet has been growing at an exponential clip, and chemistry benefits from the development of the Internet. On the one hand, HTML plays the important role in the rapid expansion of web technology on both the Internet and the Intranet. On the other hand, HTML is limited to a two-dimensional (2D) world. In this work, a program, VRMLMaker, has been developed for three-dimensional (3D) chemical structure display on the Internet in our Chinese Drug Database Searching System (CDDBSS). VRMLMaker can convert a molecular MOL2 or ML2 format file to VRML format files in four different styles, including wireframe, capped sticks, ball-and-stick, and a CPK space-filling model. Being a plain text (standard ASCII text) format file and unlike graphic files, such as GIF or JPEG, the VRML file of 3D chemical structure generated by VRMLMaker can be transferred in a compressed format and uncompressed automatically by a viewer. This reduces the time and the charge of transmission on the Internet. The images generated by VRMLMaker are "live", in that they can be magnified and rotated. It is suited for 3D chemical structure display of chemical database on the Internet. The VRML molecular model, generated by VRMLMaker, is used in our CDDBSS.
A water-quality information system for the Lower Mississippi River.
Boumediene Belkhouche1, James E. Bollinger2, and William J. George2. (1) Computer Sciences Department, Tulane University, New Orleans, LA 70118, (2) Division of Toxicology/Pharmacology Department, Tulane University. Email: email@example.comA major issue in monitoring and managing ecosystems is the lack of an integrated model. Consequently, we developed a water quality information system for the Lower Mississippi River that provides a uniform conceptual model of the ecosystem, integrates large amounts of heterogeneous data collected by various sources, and facilitates the analysis and interpretation of existing ambient water-quality data. We conceptualize a river as a an object-oriented model consisting of classes and relationships among them. The automated analysis process supports exploratory questions about the availability of data and their geographic distribution, the concentration levels and distribution of parameters, river hydrology, and the relationships among the individual variables. In addition to these design features, a strict quality control protocol has been implemented to document the flow of data beginning at the point at which data are obtained from their source, through a comprehensive validation process, until their upload into the database system.
TUESDAY AM / PM
Convention Center Room 220
|Skolnik Award Symposium: The Changing Chemical Information Scene: Keeping and Nurturing the Baby as the Bathwater Rushes By|
|S. Kaback, Organizer, Presiding|
Award Address. A 40-Year Countdown to the Millennium.
Stuart M. Kaback, Information Research & Analysis Group, Research Services Division, Exxon Research & Engineering Co., Clinton Township, Route 22 East, Annandale, NJ 08801. Email: firstname.lastname@example.org
Forty years have passed since this chemist elected to attempt to become an information chemist, thus pursuing a career path he had not heard of during his undergraduate and graduate education. Much has changed during that period. Punched cards, microfilm and microfiche, coordinated term indexes and more have come and gone. The US Patent and Trademark Office has issued three million patents, matching its total output in all the years that came before. Online database searching replaced prior reliance on printed indexes and classified card files, and now seeks to redefine itself to stand up against the juggernaut of the Internet. In the face of all that change the traditional abstracting and indexing function is still with us, though not without considerable reshaping. The author surveys this landscape of change and suggests that if we are wise, we will nurture this intellectual activity far into the future.
Chemical Registries -- in the fourth Decade of Service.
Markush Structure Searching Over the Years
Edlyn S. Simmons, Patent Department, Hoechst Marion Roussel, Inc., 2210 E. Galbraith Rd., Cincinnati, OH 45215-6300. Email: email@example.com
The indexing and retrieval of Markush structures has always been among the most problematic aspects of patent information and the most expensive. Indexing advanced from the simple classification systems of the 1950s to proprietary fragmentation systems, which were followed in the 1980s by topological systems. The cost of access to the latest indexing systems has varied widely over the years. In spite of improvements in indexing and less restrictive access conditions, comprehensive Markush structure searches remain the sole province of well financed organizations.
A history of cross-file and multi-file searching of online patent databases.
Nancy E. Lambert, Business Products and Services, Chevron, P. O. Box 1627, Bldg. 50-1214a, Richmond, CA 94802. Email: firstname.lastname@example.org
Patent searchers have long known that they must search, not just one database, but all relevant databases if they need to ensure as complete a search as possible. The challenge has always been to eliminate duplicate references found in the various databases, and to combine as much as possible the different indexing systems available on the different databases. The ideal situation, as envisioned by Stuart Kaback in 1982, is "super records" that will combine all the indexing from all patent databases. We haven't reached this yet, but we've made progress. This talk will trace the history of multi-file and cross-file patent searching and discuss how online search capabilities have evolved to permit some ingenious combinations of different databases.
End-user searching - the roads we've travelled and where we're headed now .
Patricia L. Dedert, Corporate Research/Information Research & Analysis Unit, Exxon Research and Engineering Co., Route 22 East, Clinton Township, Annandale, NJ 08801. Email: email@example.com
Seekers of chemical information were early beneficiaries of the online searching revolution, but bench chemists usually found that they had to relinquish control of the search process to professional searchers. Since the early days of online searching, many attempts have been made to re-empower chemists in the task of chemical information retrieval. The training methods and empowerment tools have evolved significantly over time, as have the attitudes of both chemical information professionals and their clientele. This paper will examine the history of end-user searching as practiced at Exxon Research & Engineering, a company interested in many types of chemical and technical information. The learnings acquired over twenty years of end-user programs and experiments will be explored, and I will attempt to define the current needs and wishes of our population of end-user searchers.
Exxon's Database for Organizing and Analyzing Patent Records
Sandra S. Unger, Information Research and Analysis Unit, Exxon Research and Engineering Company, Route 22 East, Annandale, NJ 08801. Email: firstname.lastname@example.org
This presentation gives an overview of Exxon's proprietary database system for electronically displaying and analyzing patent data, organized by its technical content. Using this system, several databases have been constructed, each focusing on one broad technical topic and containing both the Derwent abstracts, the corresponding US claims and EP claims and technical reviews. A customized hierarchy of subject categories may be populated by means of technical searches of the commercial databases or by intellectually reviewing each abstract and/or the corresponding full document. This methodology provides a custom database of thousands of categorized and evaluated patent records that can be used by scientists and legal staff at their desktop. Sophisticated reports based on the proprietary categorization, combined with commercially available data, provide unique capabilities for patent mapping across technologies and companies. These features are specifically claimed in a granted US patent.
The CAS database: Growing with the chemical sciences and electronic information technology.
Matthew J. Toussant and David W. Weisgerber. Chemical Abstracts Service, P.O. Box 3012, Columbus, OH 43210. Email: email@example.com
In striving to be a leader in meeting the chemical science information needs of scientists worldwide, Chemical Abstracts Service (CAS) has provided a family of diversified information services that have grown in size and utility along with the chemical sciences and electronic information technology. This paper will survey the growth in the chemical literature and recent enhancements in CAS database content and describe how CAS has responded to the need for more efficient and timely database creation and delivery. CAS production system approaches will be described. Key among these is the use of electronic workflow technology and electronic input from primary publishers and patent offices which have enabled CAS to create its databases with much greater timeliness, comprehensiveness, and increased quality. And on the delivery side, linking of the CAS databases to the full-text of the primary sources via the Internet has culminated in a much more timely and highly linked environment of chemical information resources.
Re-inventing the Derwent Abstract
Tim Miller, R&D, Derwent Information Limited, 14 Great Queen Street, London WC2B 5DF United Kingdom. Email: firstname.lastname@example.org
The Derwent patents abstract has been developed and refined, with considerable input from our customers, over many years. Why did Derwent decide to change it? Patent documents have changed over the years and the use to which patent information is put has changed and continues to change, especially as new methods for disseminating information become available. This paper will describe the thinking behind the new abstract format, the benefits which Derwent is trying to obtain, for its customers and for its internal processing requirements, and the lessons learned during its implementation.
Teaching computers to index
Darlene K. Slaughter and Harry M. Allcock. IFI/Plenum Data Corporation, 3202 Kirkwood Highway, Wilmington, DE 19808. Email: email@example.com
Computers have become indispensable tools for the indexing of chemical patents by IFI. Rather than replacing human indexers, however, they improve efficiency by generating descriptors that can be accepted, rejected or modified by the indexers. By automatically performing routine indexing tasks, the computer gives the information chemists more time to analyze and interpret the new technology described in the patents. IFI is combining the strengths of machine indexing with the power of human comprehension to increase productivity while maintaining quality.
An economic analysis of trends in the chemical information sector
Robert J. Massie, Chemical Abstracts Service, P.O. Box 3012, Columbus, OH 43210. Email: firstname.lastname@example.org
The Chemical Information sector has undergone not only a technical, but also a structural evolution in the past forty years, as commercial forces and realities have increasingly influenced its course. This evolution has been marked by the shift to consolidation and funding through world capital markets, and the increasing dominance of corporate interests where entrepreneurial, family-business, not for profits and government entities once ruled. The impact of the Internet has accelerated this evolution. This presentation discusses the major developments in the Chemical Information sector from a business and economic standpoint, noting among other trends: - the emergence of for-profit acquisition strategies aimed at vertical integration and market dominance; - the increasing importance of investment and scale economies in technical infrastructures for data collection, storage and manipulation; - the role of government entities, especially patent offices, in providing taxpayer subsidized free information; - the stresses on and evolution of the academic sector; - the dramatic potential of electronic journals and other primary information available online, and potentially interlinked in the Web environment. An updated version of this talk was given at the International Chemical Information Conference in Annecy, France.
Convention Center Room 220
|Recent Developments in Markush and Patent Searching|
|A. Trippe, Organizer, Presiding|
Searching Markush Structures in the MARPAT Database
G. Kenneth Ostrum, Marketing, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43210. Email: email@example.com
This presentation will describe the techniques and benefits of searching for Markush structures in MARPAT, a CAS database that complements the CAplus and Registry files on STN. The emphasis will be on evaluating answers in MARPAT and discussing their value as an enhancement to the chemical literature and patent information available in CAplus and Registry.
MMS, the Markush structure file for the chemical patents community
Philippe Borne, DDI, Institut National de la Propriété Industrielle (French Patent and TradeMark Office), 26bis rue de Saint-Pétersbourg, PARIS cedex 08, 75800, France; Michael P. O'Hara, Millennium Information Services (INPI North American Representative), 215 12th Street, SE, Washington, DC 20003-1427. Email: firstname.lastname@example.org
INPI (The French National Institute of Industrial Property), and Derwent Information Ltd, have decided to merge their Markush structure databases to create a new structural database, MMS (Merged Markush Service) which became available in June 1998. MMS covers all chemical patents from January 1987 to the present, with the coverage for Pharmaceutical patents going back to January of 1984. MMS currently contains over 700.000 structure records, which represents a total of approximately 250 millions single prophetic structures. The file is being indexed both forwards and backwards in time. This paper will concentrate on the current status of MMS and on the development plans. Special emphasis will be placed on the backfile indexing.
Markush patents at the start of the 21st Century - doing it the Derwent way.
G Cross, P Sayer, and T J Miller. Derwent Information, 14 Great Queen Street, London, WC2B 5 DF, United Kingdom. Email: mailto:email@example.com
Markush structures have been included in patents for many years, owing their name to a US patent applicant. More recently, Combinatorial libraries have been patented, needing similar handling techniques. Derwent has provided indexing and searching services for Markush patents since the 1960s, through abstracts, punch cards, manual and fragmentation codes. Since 1987, they have also been searchable as structures on the Markush DARC system. Traditional searchers have learned the complex systems that enable them to retrieve this vital information. However, in this Internet era, the demand is for user-friendly systems providing high-quality information very rapidly. There is also a need for such systems to be available outside the traditional online hosts, enabling companies to manage their own combinatorial collections. In this paper, we will look at how Derwent is approaching the task to make Markush and combinatorial patent information more accessible to traditional and newer users.
Color coding system for simplifying IFI chemical fragmentation code searching.
Anthony J. Trippe, Procter & Gamble Co., 8700 Mason-Montgomery Rd., Mason, OH 45040. Email: firstname.lastname@example.org
The IFI comprehensive file is one of the few electronic sources that allow for a form of chemical structure searching back to the 1950's. This chemical structure searching takes the form of a chemical fragmentation system which allows for the searching of generic or prophetic chemical substances within granted US patents. While powerful, the system is perhaps underutilized since chemical fragmentation coding systems are difficult to use and learn. This presentation will focus on a method for creating IFI fragmentation code queries that takes advantage of the IFIREF file and a color coding scheme which makes generation of these strategies easier to keep tack of.
Finding Markush structures using IFI fragmentation
Darlene K. Slaughter and Harry M. Allcock. IFI/Plenum Data Corporation, 3202 Kirkwood Highway, Wilmington, DE 19808. Email: email@example.com
IFI's fragmentation coding system is applied to all claimed Markush structures in U.S. patents, and provides a fast and comprehensive method of retrieving all structures (including prophetic substances) with specified characteristics. Searchers using IFI's system can retrieve references to patents issued as early as 1950. Both the CLAIMS Uniterm and CLAIMS Comprehensive databases offer access to fragmentation coding, but IFI subscribers to the Comprehensive database benefit from greater precision in retrieval. For searchers who do not use IFI fragmentation codes frequently, the recently enhanced CLAIMS PC Reference software simplifies the process of building Markush search strategies.
Patents in Combinatorial Chemistry.
Andrew H. Berks, Merck & Co., 126 E. Lincoln Ave, Rahway, NJ 07065-0900 Email: firstname.lastname@example.org
Patenting activity in combinatorial chemistry will be discussed, including bibliometric parameters such as leading companies and growth in patenting activity. Also discussed will be patents claiming various technologies used in combinatorial chemisty, such as lead generation, synthetic methodologies, and claims to libraries. Online search strategies for locating and monitoring combinatorial chemistry technology will be presented.
Convention Center Room 220
|Management of Reaction Information for the Synthetic Chemist|
|G. Grethe, Organizer, Presiding|
Reaction information for the practicing synthetic chemist: data, problems and solutions.
Guenter Grethe, Product Development / Scientific Applications, MDL Information Systems, Inc., 14600 Catalina Street, San Leandro, CA 94577-7409. Email: email@example.com
Synthetic chemists in today's competitive research environment require fast and easy access to information ranging from new methodologies for the synthesis of new compounds or compound libraries in solution- or solid-phase to the availability of starting materials or new reagents. Fortunately, the amount of information available electronically inhouse or online from large databases combined with data from smaller specialty databases has increased dramatically. But on the other hand, this information becomes increasingly difficult to manage by the enduser chemist. Providing effective post-search management of search results and an user-friendly environment is mandatory to entice infrequent users to effectively utilize the wealth of available data. Based on examples we will discuss some of the problems and their solution, including reaction classification, clustering of data and linkage to the primary literature.
Tracking reaction pathways in the published chemical literature.
Alexander J. Lawson, Director of R&D, Beilstein Information Systems, Theodor-Heuss-Allee 108, Frankfurt a/M D-60486 Germany. Email: firstname.lastname@example.org
Synthesis is arguably the highest art in organic chemistry. Historically, the many efforts of computational chemists to vie with human ingenuity in this area by providing "expert systems" and "artificial intelligence" to aid in synthesis planning have often met with only lukewarm response from the researcher active at the bench. Paradoxically, it has been the relatively "dumb" systems based on large collections of single-step reaction reports taken from the primary literature (i.e. reaction databases) which have enjoyed more favor with the working chemist. The largest and most widely used of these is the Beilstein File under CrossFire, which currently operates principally on a "Point & Click" basis. This talk will give an overview of the progress in now extending this paradigm to reaction pathways, thus cutting across the boundaries of individual publications while still retaining the natural simplicity of the navigation method.
Insight, access and content - schemes for making the most of reaction-based chemical information.
Julian Hayward and Keith A Harrington. Synopsys Scientific Systems Ltd., 5 North Hill Road, Leeds, LS6 2EN, United Kingdom. Email: email@example.com
Over the past 20 years or so, corporate databases have focused on the storage of individual compounds, along with their molecular properties and biological test data. However, the ability of a reaction to convey so much more information to an organic chemist (selectivity, reagents, conditions etc.), as well as requirements related to combinatorial library synthesis, has led to a fundamental reappraisal of the way in which corporate data is registered and stored. With reference to a number of commercial reaction databases produced by Synopsys, along with some new reaction retrieval tools in Accord, the author will discuss features which increase the value of reaction databases to the chemist. The rationale behind the concepts and design of new reaction databases will also be highlighted, focusing in particular on retrieval mechanisms for Metabolism data and on the content of the new Synthons and 'Failed Reactions' databases.
Integrated Protocol Management in Combinatorial Synthesis
J. Christopher Phelan, Director, Product Management, Afferent Systems, Inc., 1550 Bryant Street, Suite 760, San Francisco, CA 94103. Email: firstname.lastname@example.org
The increasing use of combinatorial chemistry and parallel synthesis has put a new burden on chemists. Suddenly so much data can be generated in one experiment that the bench chemist's job involves learning and using many different kinds of information handling software. We will present reaction management software that represents synthetic methods in a versatile instrument-independent way, enabling its use for manual parallel chemistry in glassware as well as high-throughput automated synthesis. We will also discuss our combinatorial enumeration module based on the chemically intuitive "virtual chemistry" paradigm, on-line access to analytical data (e.g. MS, LC, and LC/MS), and new complex search capabilities for exploitation of these data structures by the synthetic chemist. All of these functionalities are combined in an integrated package with a user interface that is intuitive to the bench chemist, minimizing the tasks of software training and data handling and putting the chemist back in the laboratory.
Finding the winning reactions in reaction databases
Robert L. Swann, Director of Research, Information Systems, and New Product Development, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505. Email: email@example.com
With the increasing availability of electronic information, chemists can seek reaction information and access relevant literature articles more rapidly and efficiently than ever before. As the size of reaction databases grows, reaction database providers must work to ensure that these end-user chemists are able to obtain germane answers to their reaction questions. This talk will discuss some of the approaches being taken to deliver precise reaction information to chemists.
The distribution of synthetic techniques in the chemistry literature.
Synthetic information in patents - an underused resource
D G Penn, P Sayer, G Cross, and T J Miller. Derwent Information, 14 Gt Queen Street, London, WC2B 5DF, United Kingdom. Email: firstname.lastname@example.org
The recent growth in the number of Reaction Databases has meant synthetic chemists have more electronic information resources at their disposal than ever before. Coverage from Chemical Journals is comprehensive; however, the Patent Literature is less well covered. Patents are an important source of synthetic information. Because of the legal requirement for a Patent to be granted the technical details must be fully disclosed and this detail should be sufficient to replicate the invention. This means that patent specifications can contain considerably more information than a corresponding journal article. We shall compare and contrast these two sources of synthetic data and give examples of retrieval strategies. We shall discuss the systems used for indexing reactions on Derwent World Patent Index at present and the enhancements currently being developed.
Convention Center Room 216
|Numeric Chemical Information|
|S. Heller, Organizer, Presiding|
Uniformity and the New Protein Data Bank.
Gary L. Gilliland1, Phoebe Fagan1, John Westbrook2, Helen Berman2, Phil Bourne3, and Peter Arzberger3. (1) National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899, (2) Rutgers University, Rutgers, NJ, (3) UC - San Diego, San Diego, CA. Email: email@example.com
The Protein Data Bank (PDB) is an international repository for macromolecular structure data, generated experimentally by X-ray crystallographic and NMR methods, or from theoretical modeling. On October 1, 1998, the Research Collaboratory for Structural Bioinformatics (RCSB), became responsible for the management of the PDB. The RCSB has three member institutions: the Biotechnology Division of the National Institute of Standards and Technology (NIST), the Department of Chemistry at Rutgers, the State University of New Jersey, and the San Diego Super Computer Center at the University of California. The new resource is committed to providing efficient deposition and processing of data, versatile query and reporting capabilities, and reprocessing of legacy data to create a uniform archive. A one-year transition period was allotted for the transfer of the PDB to the RCSB. The new systems are in place ( http://www.rcsb.org). Since January 27, 1999 RCSB has processed all the new depositions. The new query system, Searchlite, has been released to the public. A clean-up process for the legacy data has begun. The RCSB is using mmCIF based dictionary and tools to transform the flat file format of the PDB structure files into a relational database that will provide controlled access to all the data in the PDB files. Collaborations are underway to clarify nomenclature and clearly define fields such as the classification, name and source of the molecule to enable more reliable searches. The uniformity (clean-up) process will be addressed in some detail. The original PDB files and format as well as mmCIF based files will be preserved. The RCSB will ensure that the community has extensive input into the PDB archival activities. The RCSB vision is to enable new science by providing accurate, consistent, well annotated structural data delivered in a timely manner to a wide audience. The first step in this process is to provide a channel of open communication. The PDB is funded by the National Science Foundation, the National Institutes of Health (NIGMS & NLM), and the Department of Energy.
Evaluation of the NIST/EPA/NIH Mass Spectral Library
S E Stein1, P Ausloos1, C L Clifton1, J K Klassen1, S G Lias1, A I Mikaya1, O D Sparkman1, D V Tchekhovskoi1, V Zaikin2, and Damo Zhu3. (1) Physical and Chemical Properties Division, NIST, 100 Bureau Dr Stop 8380, Gaithersburg, MD 20899-8380, (2) Topchiev Institute of Petrochemical Synthesis, Moscow, Russia, (3) Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China. Email: firstname.lastname@example.org
The NIST/EPA/NIH Mass Spectral Library contains mass spectral information on over 100,000 compounds and is used for fingerprint mass spectral matching. The confidence in correctly identifying a compound by matching its spectrum with a reference library spectrum depends directly on the quality of the library. Since it has become clear that automated quality control algorithms are not reliable, a spectrum by spectrum evaluation of the NIST/EPA/NIH Mass Spectral Library has been undertaken. The archive has been exhaustively examined by individuals well trained in mass spectrometry. Because of unavoidable uncertainties in judging the quality of a spectrum, an important requirement has been the agreement on both the analysis and the remedy for each spectrum by at least two individuals. An exact record of any modifications to the spectra has been maintained. Several factors pertaining to the evaluation of the data will be discussed along with examples of difficult evaluations.
Web-Based Access to Structure Based Prediction and Databases for Spectroscopy and Physical Propereties.
Anthony J. Williams and Valery Kulkov. Advanced Chemistry Development, 133 Richmond Street West, Suite 605, Toronto, ON M5H 2L5, Canada. Email: email@example.com
The Interactive Laboratory, ACD/ILab, offers a universal Web-based gateway to various chemical information resources, property prediction programs and chemical databases. ACD/ILab utilizes Java-based structure drawing and spectral display applets to provide structure submissions for prediction purposes and display of predicted spectra. Currently, the following database searches and property predictions available at ILab include HNMR spectrum prediction and searching of 82,000 assigned chemical structures, C13 NMR spectrum prediction and searching of 67,000 assigned chemical structures, pKa prediction and pKa database search (over 9000 structures) , LogP prediction and LogP database search (over 3500 structures). Other structure based databases are also available and will be discussed.
Generating numeric chemical information from chemical structures during chemical registration
Christopher S. McKenna and Phil McHale. Product Marketing, MDL Information Systems, Inc., 14 Walsh Drive, Parsippany, NJ 07054. Email: firstname.lastname@example.org
Chemists and biologists are increasingly looking for more descriptive properties that can be used in decision-making and analysis. This trend is due in large part to the increasing numbers of compounds and screening data that need to be analyzed for lead finding, optimization, and candidate selection. In this session we will discuss and demonstrate a new chemical scripting language from MDL for producing numeric information from chemical structures. That numeric information can be registered along with chemical structures through chemical registration processes, and then analyzed in spreadsheets or interactive charting tools to aid in decision-making.
Numerical Data In the Beilstein File under CrossFire
Gabriele Ilchmann, Alexander J. Lawson, and Huyen Nguyen. Beilstein Information Systems, Theodor-Heuss-Allee 108, Frankfurt, D-60486, Germany. Email: email@example.com
As well as being one of the world's major abstracting and indexing services to the chemical primary literature, the Beilstein File is also the world's largest collection of experimentally measured property data on organic chemicals. Many of these data are numerical, such as melting point, boiling points/pressures, refractive indices, optical rotation, thermodynamic values etc. and this aspect of Beilstein has been widely used by chemists for many generations : the Beilstein Handbook has always been highly valued as a source of characterising data in its own right. A less well-known aspect of the Beilstein File under CrossFire is the use of numerical data in another context : as search filters, for instance in the restriction of reaction conditions to a particular temperature range. The release of the EcoPharm database under CrossFire now greatly increases the numerical data content of the Beilstein File in the key areas of ecological and pharmacological data (see Figure), and this is accompanied by the ability to use new numerical filters (such as physiological activity) to arrive quickly at highly specific and relevant data. This talk will discuss the variety of numerical data in Beilstein under CrossFire including the EcoPharm database, and will illustrate search techniques for searching with numerical values with range-searching in this important new data collection.
Dimensionality and classification considerations in pattern recognition: demonstration of a novel efficient procedure
Norman J. Santora, Chemical Forecasting And Searching Technology, 1323 Partridge Road, Roslyn, PA 19001-2807. Email: firstname.lastname@example.org
Pattern recognition consists of two distinct steps:(a)Preprocessing; wherein, a data matrix is operated upon in order to reduce its dimensionality; and (b)Classification; wherein, the data elements are placed into discriminate property classes. An application of the procedure developed will illustrate the classification of therapeutic agents using novel effective preprocessing and classification procedures on a data matrix comprised of organic structural information.
Convention Center Room 220
|Web-Based Deployment of Info Management Tools|
|O. Guner, Organizer, Presiding|
Cheminformatics and the Internet
Web-based technology for cheminformatics
The intranet at the interface between computational and synthetic chemistry
Searching NMR databases and predicting NMR spectra over the Web.
Valeri Kulkov and Antony Williams. Advanced Chemistry Development, Inc., 133 Richmond Street West, Suite 605, Toronto, ON M5H 2L3, Canada. Email: email@example.com
Searching and sharing spectral information in the networked environment has been a traditionally challenging task. Lack of cross-platform, embeddable software tools for visualization and manipulation of complex objects such as spectra and chemical structures still presents a burden for effective information interchange. On the example of ACD/ILab, http://www.acdlabs.com/ilab/, a Web-based gateway to chemical information resources, we will describe our approach to handling spectral information on the Web. H/C/F/PNMR databases on the ILab are searchable by chemical shifts, structure, substructure, formulae and molecular weight. Simulated H/C/F/PNMR spectra for a known or unknown structure can be obtained by accessing a corresponding server-based prediction engine. For manipulation and visualization of spectra and chemical structures, viewing peak assignments to the corresponding atoms in a molecule we developed a set of Java applets. We will present an XML-based approach to sharing of spectral and structural information on the Web.
A Web-Based Engineering Chemistry Database
Xue-Liang Fang, Wei Zhang, Hao Wen, and Zhi-Hong Xu. Institute of Chemical Metallurgy, Laboratoy of Computer Chemistry, Chinese Academy of Sciences, P.O. Box 353, Zhongguancun, Beijing, China, Beijing, 100080, China. Email: firstname.lastname@example.org
Many databases and computer programs have been developed since last 20 years to match the requirements of data in chemical process developments. However, some databases and programs can only work on the individual computer, which is difficult for the people who want to find online data from another computer. Internet can provide the possibilities to get the data from database, to do calculations, to draw the figures on the client-end computers. A methodology study is performed in order to get access to databases and calculations via Internet. The data retrieved from a database or calculated from a program can be converted into HTML documents using the HTML extension files (*.htx) as the template and sent back to the web browsers. As an example, a database and a program package for thermodynamic and equilibrium properties data retrieval and calculations in engineering chemistry are developed on the web-server ( http://mole.icm.ac.cn).
The versatility of a web-based spent nuclear fuel database
Luis R. Canas, Savannah River Site, Westinghouse Savannah River Company, Aiken, SC 29808. Email: email@example.com
The Spent Fuel Storage Division of the Westinghouse Savannah River Company, principal operations contractor for the Department of Energy's Savannah River Site chemical-nuclear complex, has developed a prototype Spent Nuclear Fuel Database (SNFD) with a web-front interface for smart retrieval of a wide variety of technical data from any personal computer on the corporate intranet. The SNFD resides in a Microsoft Access file in a dedicated Windows 95 platform. The web interface is managed by the WebSite commercial web server in concert with a custom Visual Basic script and a library of HTML and graphics files. The server transmits static (directly from HTML files) or dynamic (custom HTML composed by the VB script with embedded data from queries on the Access file) pages to a remote user's web browser in response to requests for particular information.
Convention Center Room 220
|A. H. Berks, Organizer, Presiding|
Collaborative electronic notebook systems: A technical knowledge management paradigm beyond LIMS, Groupware, and the Web
R. Lysakowski, The Collaborative Electronic Notebook Systems Association, Woburn, MA 01801. Email: firstname.lastname@example.org
Collaborative Electronic Notebook Systems (CENS) are sophisticated systems for technical knowledge management that integrate electronic recordkeeping and records management systems, LIMS, groupware, document management, the Web, databases, instrument systems, and many desktop and server applications that scientists and engineers routinely use. They are also the first major technical software applications to take advantage of handheld, wireless computer hardware. The Collaborative Electronic Notebook Systems Association (CENSA) is now leading the paradigm shift from paper-based to fully electronic recordkeeping systems. This paper will provide: 1) an overview of CENS technologies and systems; 2) a discussion of the legal, regulatory, technical, and business imperatives that must be addressed to implement successful systems in regulated industries where patents are generated; 3) an overview of CENSA and its projects with industrial companies and regulatory agencies worldwide to drive creation and acceptance of electronic recordkeeping systems worldwide.
Electronic Laboratory Notebook Systems for R&D and Testing Labs: Status of Creation and Acceptance in Industry.
R. Lysakowski, The Collaborative Electronic Notebook Systems Association, Woburn, MA 01801. Email: email@example.com
Collaborative Electronic Notebook Systems (CENS) will eventually replace traditional paper-and-pen-based recordkeeping systems with fully electronic, legally-defensible, multimedia, multiuser systems that offer MANY advantages over paper. How soon will they be on the market? What's being done now to make them available to scientists and engineers? How will these recordkeeping systems integrate with existing data management systems, such as LIMS, instrument data management, combinatorial chemistry, and high-throughput screening applications? What about the various wireless, handheld notebooks, PDAs, and other portable devices -- how will their ephemeral datasets be transported and secured in an emergent recordkeeping infrastructure? The Collaborative Electronic Notebook Systems Association (CENSA) is an international professional and trade association formed in late 1996 to answer these questions and many more. This presentation will cover the mission, objectives and progress of CENSA in its research and product development programs for industry and government.
A water-quality information system for the Lower Mississippi River .
Boumediene Belkhouche1, James E. Bollinger2, and William J. George2. (1) Computer Sciences Department, Tulane University, New Orleans, LA 70118, (2) Division of Toxicology/Pharmacology Department, Tulane University, 1430 Tulane Ave., New Orleans, LA 70112. Email: firstname.lastname@example.org
A major issue in monitoring and managing ecosystems is the lack of an integrated model. Consequently, we developed a water quality information system for the Lower Mississippi River that provides a uniform conceptual model of the ecosystem, integrates large amounts of heterogeneous data collected by various sources, and facilitates the analysis and interpretation of existing ambient water-quality data. We conceptualize a river as a an object-oriented model consisting of classes and relationships among them. The automated analysis process supports exploratory questions about the availability of data and their geographic distribution, the concentration levels and distribution of parameters, river hydrology, and the relationships among the individual variables. In addition to these design features, a strict quality control protocol has been implemented to document the flow of data beginning at the point at which data are obtained from their source, through a comprehensive validation process, until their upload into the database system.
MolBank: preservation and publication of chemical reaction data
Shu-Kun Lin, Molecular Diversity Preservation International, Sangergasse 25, Basel CH-4054 Switzerland. Email: email@example.com
Molecules (http://mdpi.org/molecules/, ISSN 1420-3049) publishes in the section of MolBank (http://mdpi.org/molbank) very short notes of experimental data records for individual molecules. Any scattered, unassembled experimental data for individual compounds which is conventionally not publishable is particularly welcomed, to be published as one-paper one-page for one structure and given special page numbers (M1, M2, etc.). They have been published in HTML format, with at least a formula of the target molecule. MDL MOL file is also included for every MolBank short notes. All papers submitted for consideration and publication in this column of "MolBank" have been refereed and the accepted papers edited (English corrected and format unified). The related chemical samples are in most cases available and the availability information is also published. All papers published in the MolBank section have been indexed and abstracted by several leading indexing and abstracting services, including Chemical Abstracts; CAPLUS; Science Citation Index Expanded; SciSearch, Research Alert; Chemistry Citation Index; Current Contents/Physical, Chemical & Earth Sciences. This is the first online publication of experimental chemistry. I will report the experience and the planned improvement of the MolBank section and the journal Molecules.
Handling stereoisomerism and adding alternative, CAS based, ring system nomenclature into organic compound names generated algorithmically directly from a connection table: AutoNom(TM) approach
Janusz L. Wisniewski, Beilstein Information, Theodor-Heuss-Allee 108, D-60486 Frankfurt/Main Germany Email: firstname.lastname@example.org
Design and practical implementation of algorithms and routines for generation of stereochemical descriptors for organic stereoisomers, directly from their connection tables, is discussed. Techniques and methods for unambiguous and efficient calculations of spatial distribution of atoms with reference to a double bond (E,Z) and with reference to a chiral centers (R,S) are described and demonstrated within organic nomenclature generated automatically by the Beilstein's newly upgraded (Version 4.0) AutoNom system. Inclusion, into AutoNom naming procedure, IUPAC-sanctioned CAS ring system nomenclature, as alternative (or additional) to the "native" Beilstein ring system nomenclature, is discussed, evaluated, and illustrated by various names generated for sample organic compounds.
Data Management for High Performance Computing Users
Kerstin Kleese, High Performance Computing Initiative Centre, CLRC - Daresbury Laboratory, Keckwick Lane, Warrington, WA4 4AD, United Kingdom and Lois Steenman-Clark, Reading University. Email: email@example.com
The demand for data storage has exploded in the last few years. Whereas ten years ago we still measured storage space in Mbytes, today's wellestablished national facilities offer very much increased disk and tape storage capacities, but the existing storage space is already filling up quite rapidly and it is anticipated that this trend will increase. Obviously this trend has provoked many questions: What are the reasons behind this development? Is it really necessary to keep all this data ? For how long is the stored data valuable for us? Who are the main producers? Are we making the best possible use of this data? This paper will concentrate on the data management issues of users of national High Performance Computing facilities and address some of the strategic questions posed.
Information services on the intranet: where we are and where we want to go.
Kerryn A. Brandt and Joanne L. Witiak. Information Services Dept., Rohm and Haas Company, P.O. Box 718, Bristol, PA 19007 Email: firstname.lastname@example.org, Email: email@example.com
Searching the web has become an additional aspect of many chemical information searches. However, web and intranet technology itself can be exploited by information professionals to deliver search results more effectively. The web can also be a valuable marketing tool. For example, by publishing profiles of key competitors on our intranet, we showcased the value we add by collecting, organizing and summarizing information. This information was rapidly and simultaneously available to our global customer base. We will discuss examples of how we have used our intranet to interact with remote customers, integrate external and internal information, provide enhanced context, navigation, and management of online search results, offer customized views of the same data to different clientele, close the gap between secondary sources and primary information, and generate continually updated searches personalized to customer needs. We will explain where we would like to go with these approaches in the future and raise some issues that challenge our progress.