#226 - Abstracts
ACS National Meeting
September 7-12, 2003
New York, NY
Final Program, 226th ACS National Meeting, New York, NY, September 7-11, 2003
Titles link to slides when available. Please note: Presentations given at CINF symposia have been posted to the CINF website with express permission granted by the authors who retain the original copyright. These presentations are for information purposes only and cannot be further disseminated without the author's prior written permission.
Technical intelligence from patent information
Donald Walter, Customer Training, Thomson Derwent, 1725 Duke Street Suite 250, Alexandria, VA 22314, Don.Walter@DerwentUS.com - SLIDES
If the purpose of Technical Intelligence is to discern patterns in the marketplace and R&D programs, then what better place to look for it but in patents? Patents are where companies and inventors disclose what they are thinking about doing. To gain meaningful intelligence, one ideally should have reliable and organized data to analyze and tools to analyze them. The purpose of this talk is to review some of Derwent's data and tools, and see how they may be applied to real marketplace cases.
Paper Withdrawn - Technology intelligence for decisions
Merrill S. Brenner, Air Products and Chemicals, Inc, 7201 Hamilton Blvd., Allentown, PA 18195-1501, Fax: 610-481-2727, firstname.lastname@example.org- SLIDES
Researchers and technology managers are inundated with an exponentially growing amount of data. Most of us learn to filter and organize these data into coherent information, but we still feel overwhelmed by the facts--who, what, when, and where--while we lack what we really need to make a decision. For resolving issues, choosing alternatives, and making better decisions, we need to analyze that vast amount of information to gain the insights that will give us advantages, to truly understand the how, the why, and the direction of events.
Technology intelligence provides early warning systems to anticipate new developments and the analytical tools to make sense of the incoming information, to determine trends, and to evaluate alternatives. Good technology intelligence is not rocket science--it is about early identification of opportunities and threats; it is about capturing knowledge that is often already available within the organization and leveraging those insights; it is about getting people to talk about the right things in a systematic learning approach that drives to agreement on the conclusions, implications to the organization, and future actions. Good intelligence makes implicit assumptions explicit and balances the biases we all bring with external information and perspectives.
This presentation will describe the technology intelligence you should be demanding to help you succeed and cost effective early warning and analytical systems.
Managing the pharmaceutical technical intelligence puzzle
Sara Furie, BioData, 25 Jakov Cohen st, Ramat - HaSharon 47213, Israel, Fax: 972-3-5472193, email@example.com - SLIDES
In this era of globalization, rapid technological change and significant raise in knowledge creation, effective technical intelligence management is crucial for cutting edge business decisions. Pharmaceutical /biotech industries are information intensive. Heavily dependent on research for the creation of new technologies/products, the long time between discovery and commercialization constantly conceives new scientific, regulatory patent and also business data. To stay ahead companies need to install on going technical surveillance and impact analysis programs. This process is key for improved market insight and business strategies evaluation. The presentation will highlight info gathering formula for effective technical profiling in the pharmaceutical industry. Technology / knowledge push of regulatory, patents and scientific data provide for in house database establishment that functions as a central resource for business decisions support and product portfolio management. Analytical model for database information weighing to achieve strategic excellence will also be discussed.
Competitive intelligence from internal data sources
Michael P. Bigwood, International Technology Information, P. O. Box 58, Oreland, PA 19075, Fax: 215-884-9373, firstname.lastname@example.org - SLIDES
If members of your organization monitor techology developments, read the trade literature, attend conferences and trade shows or meet with customers, you already have a lot of information about your competitors. This is particularly true for technology driven organizations, because so much technology related information is available. The challange is to raise people's awareness of the CI value of that information, and to gather it all in one place where that value can be realized. During this presentation, we will review soures of internally generated competitive data, discuss ways of capturing that highly fragmented data and convert it into useful knoweledge, and finally share a few thoughts on how to get a CI program started without creating a full blown CI department.
Competitive technology profiling
John C. Blackburn, TECHFISH, LLC, 109 Smith Street, Charleston, SC 29403, email@example.com - SLIDES
A key component to effective competitive intelligence is a sound understanding of the technologies that are positioned against you in the marketplace. Examples are provided from a range of industries where developing a profile of the competitive technology provided a market advantage.
QTIP: Quick technology intelligence process
Alan L. Porter, R&D, Search Technology, Inc, 4960 Peachtree Industrial Blvd, Norcross, GA 30071-1580, Fax: 770-263-0802, firstname.lastname@example.org, and Merrill S. Brenner, Air Products and Chemicals, Inc - SLIDES
This National Science Foundation sponsored SBIR project sought to devise a way to provide useful technology intelligence on 24-hour turnaround. Georgia Tech had developed such a "HotTech" process to compile information from selected databases and to respond to 15 questions concerning a selected emerging technology. Initially serving users who are satisfied with reports, we now additionally provide dynamic capabilities for those who want to interact with the information.
We share results of this experimental effort to mine publication (EI Compendex) and patent (MicroPatent) abstract record searches with the aid of VantagePoint software. Results suggest value in two different approaches – 1) provision of a technology profiling service providing passive information or 2) easy-to-use user software to facilitate active technology information analysis to create intelligence.
Anticipating competitors' product launches
Estelle Metayer, Competia, 1250 rene Levesque West, Suite 20022, Montreal, QC H3B 4W8, Canada, Fax: (514) 270 5223, email@example.com - SLIDES
Abstract text not available.
Hard and soft acids and bases: Analogy in relationships of a science librarian and an academic system
Svetlana Korolev, UWM Libraries, University-Wisconsin, Milwaukee, 2311 E. Hartford Ave, Milwaukee, WI 53211, firstname.lastname@example.org - SLIDES
The changes may happen by choice or by force. Individuals try to alter their career paths in response to the changes. After receiving a chemistry degree and realizing a strong interest in the scientific literature, someone may chose a career as a science librarian in academe. And yet, there is no definite rule for career placement, but a unique combination of choices and forces. One title “science librarian” covers a wide range of specific jobs. Just for the purpose of this talk as a qualitative prediction, relationships between a science librarian and an academic library will be viewed in terms of the Hard and Soft Acids and Bases (HSAB) principle. Specialized academic libraries (small radius, high positive charge, cannot be distorted easily) are like Hard Acids accepting librarians (Hard Bases) of high level of expertise (low polarizability), while large central libraries are like Soft Acids preferring science librarians (Soft Bases) who can apply their skills (donate electrons) to cover a broad range of subjects (high polarizability). This talk discusses the job activities of a science librarian in two academic libraries. The opportunities and obstacles will be outlined. A special remark in areas of concern to foreign-born individuals will be given. And as HSAB principle indicates - the adjectives hard and soft don’t mean the same as strong and weak.
Chemical information careers in industry
Pamela J. Scott, Groton Laboratories, Pfizer Inc, Groton, CT 06340, Fax: 860-715-7353, Pamela_J_Scott@groton.pfizer.com - SLIDES
Chemical information was not my first career path, it was not explored until later in my chemistry career. I will retrace my steps in arriving in chemical/patent information, and explore the values and benefits derived from the course I have taken. I will also show some retrosynthetic analysis of my path, and look for process improvements. My interests in professional development and obsession as a continuous learner will also be explored.
The role of non-profit medical society information centers in facilitating access to consumer health information
Claudia Lascar, Science/Engineering Library, City College of New York (CUNY), Convent Avenue at 138th Street, New York, NY 10031, Fax: 212-650-7626, email@example.com - SLIDES
The demand for consumer health information has surged during the past decade. Consumer health information includes information to help consumers stay well, prevent and manage diseases, and make decisions related to health and health care. For many decades the non-profit medical societies, such as The National Multiple Sclerosis Society, American Cancer Society and American Medical Association, have promoted research into their specialties and organized a wide range of programs for their membership, including education and advocacy. To continue to remain successful, these societies have long recognized the value of the Internet in the dissemination of health information to both health professionals and consumers. The Medical Library Association's Consumer and Patient Health Information Section (CAPHIS) has taken a keen interest in supporting the activities of health sciences librarians working in this specialized field of librarianship. The presentation will examine: 1) the information specialists' and librarians' role in providing consumer health information within a non-profit medical society setting, 2) some of the characteristics of quality health information, 3) the full range of services and activities available on web sites developed by the non-profit medical societies and 4) the CAPHIS's guidelines and resources.
Careers in chemical and patent information
Andrew H. Berks, none, 74 Ramapo Ave, Suffern, NY 10901, firstname.lastname@example.org - SLIDES
Chemical information, patent searching, and patent information management are critical but little known job functions in research based organizations. This talk will present an overview of this field, including required and desirable skill sets, common responsibilities of patent information professionals, typical work assignments, training opportunities, and migration paths.
Supplying brainpower to the braintrust
Mary Ellen Teasdale, James G. Gee Library - Science Reference, Texas A&M University - Commerce, 2600 South Neal, Commerce, TX 75429, Fax: 903-886-5723, email@example.com - SLIDES
Traditionally the term 'chemist' conjures up images of laboratories, labcoats, wild hair, and explosions. Additionally, chemists are seen as highly intelligent persons who are unable to converse with neighbors on their block. Often the gap between chemist and neighborhood can be bridged by an editor or a writer whose job it is to make technical knowledge accessible to the public. Sometimes seen as esoteric, nitpicky, witty, and devoid of real technical expertise, editors are a group of advisors that mold and shape an author's manuscript for market. They are the braintrust behind the author. The writer creates and/or revises manuscript based on inputs received from the scientist or by examining the product itself. A writer is not an author, but a true technical writer possesses both the technical knowledge of a specific field and a talent for writing. So if you are a chemist by training and looking for a career outside the laboratory, why not consider a career in publishing? This talk explores the bookmaking process and employment as an editor/writer for textbooks and technical publications.
A career in computational chemistry
Barbara Charton, Bobst Library, New York University, 70 Washington Square South, New York, NY 110012, Fax: 718-722-7706, firstname.lastname@example.org - SLIDES
Computational chemistry is an umbrella term used to categorize a number of approaches to understanding some chemical phenomenon. A number of different computational methods and a growing number of software packages are available. These are used to examine varying aspects of chemical phenomena such as physical properties, chemical and biological reactivities and their relations as a function of molecular structure. The calculations are useful in the prediction of chemical properties and the design of new materials. The varying approaches, software available and applications to a number of areas will be discussed.
Patinformatics: Tasks to tools
Anthony Trippe, Vertex Pharmaceuticals Inc, 130 Waverly St., Cambridge, MA 02139 - SLIDES
With the increased proliferation and variety of software tools for performing Patinformatics (patent analysis) a discussion aligning common patinformatics tasks to these tools seems appropriate. This presentation will focus on examining a number of common patent analysis tasks and describing which software tools are most likely to be useful in performing these tasks. This paper will focus on nine tasks that are commonly conducted by practitioners of patinformatics including: List Clean-up and Grouping, List Generation, Co-Occurrency Matrices and Circle Graphs, Clustering of Structured (Fielded) Data, Clustering of Unstructured (Text) Data, Mapping Document Clusters, Adding Temporal Components to Cluster Maps, Citation Analysis and Subject/Action/Object (SAO) Functions. Hopefully this examination will help practitioners make smarter selections when it comes time to invest in patinformatics tools.
Technical intelligence in a dot.com milleu using Dialog/DataStar
James J Heinis, Dialog Corporation, 11000 Regency Parkway #10, Cary, NC 27511, email@example.com - SLIDES
Technical intelligence in an Internet oriented environment requires that the user be aware of the role of information in a R&D / technically driven organization. This paper will describe integrated methods of scientific and intellectual property analysis using the Dialog/Dialog DataStar family of products. These Boolean based systems permit iterative search capability through a standardized search language, access to finder files which are derived from databases that are mounted on the system and centralized chemical name indexes from which synonyms and Chemical Abstracts Registry Numbers (CAS RN's) can be isolated. Data can be extracted from designated fields from one database through a MAPping procedure to create a temporary search save and then executed in another database. These techniques add to the power of Boolean searching. Scientific and intellectual property search techniques will be explained for both systems through flow diagrams. The merits of Dialog versus DataStar will also be discussed.
JNew methodology for analysis of patent white space
Mark A. Calcagno, Business Intelligence Services, Procter and Gamble, Health Care Research Center, 8700 Mason-Montgomery Road, Mason, OH 45040, firstname.lastname@example.org - SLIDES
"Patent White Space" may be described as the area where little or no patent activity has taken place. New methodology will be presented enabling one to "discover" potential patent white space using Derwent Manual Codes and BCE Chemical Fragmentation codes. Both pharmaceutical and chemical white space examples will be discussed. Scope and limitations of the method will also be discussed.
Real-world successes with patent analysis technologies
Cindy Poulos, Director Product Management, Thomson Delphion, 3333 Warrenville Rd., Suite 600, Lisle, IL 60532, Fax: 630-799-0688, email@example.com - SLIDES
This presentation offers a series of real-world examples that demonstrate how patent researchers are leveraging innovative analysis technologies to get remarkable results from their work. Among the examples: Partners at a law firm are identifying new business opportunities by using a technology that summarizes their work for current clients by IPC, and then leveraging that data to target prospects active in the same IPC. Analysts at a company are supporting M&A decisions by using linguistic analysis technology to perform due diligence on an acquisition target's patent portfolio. Patent brokers are finding new properties ripe for licensing by searching for technologies in their specialty area, and using a summarization tool to identify those patents without assignees. The presentation will show step-by-step how researchers are performing these tasks to gain insights that are helping drive key business initiatives.
The face of the patent is not the “whole story”: Optimal use of technical intelligence in determining the effective life of a patent
Anne Marie Clark, and Heidi M. Berven, Information Management, Pfizer Global Research and Development, 2800 Plymouth Road, Ann Arbor, MI 48105, Fax: 734-622-7008 - SLIDES
A number of legal and regulatory factors may influence the date on which a patent expires. These factors include whether: (a) required maintenance fees have been paid; (b) a certificate of correction has issued altering the patent priority date; (c) the patent term has been terminally disclaimed to overcome a non-statutory-type double patenting rejection; (d) the patent term has been extended under the Patent Term Adjustment Act; (e) an adverse outcome has occurred in an interference or litigation proceeding; or (f) the term of a patent has been effectively extended administratively, through the action of a government agency other than the Patent and Trademark Office (for example, the Food and Drug Administration). Information regarding whether any of these factors are operative in a particular situation is not typically provided on the face of a patent; and further, is not readily available from a single source. As a result, the term of a patent can not be reached algorithmically in some instances, but must be the result of a multi-source analysis. We provide an overview of the factors that may contribute to patent term modification, and strategies for ensuring that the term of a patent has been calculated accurately.
A new patent analysis technique for uncovering technological hot-spots
Anthony F. Breitzman Sr., CHI Research, Inc, 10 White Horse Pike, Haddon Heights, NJ 08035, Fax: 856-546-9633, firstname.lastname@example.org
Patent citation analysis is used in competitive technical intelligence, because it provides a means for identifying high impact technology among companies. However a common complaint about patent citation analysis is that it looks into the past rather than looking at the present or towards the future. Hot-spot clustering is a new technique used for identifying technologies that are currently “hot,” or in other words, to identify patented technologies that are having their biggest impact right now. Next generation patents are current patents that build upon hot-spot clusters. Analyzing hot-spots and next generation patents provides a new CTI measure that can be used to identify trends, competitors, and industry shifts. Examples will be taken from a study funded by the NIST Advanced Technology Program.
IP strategies for business advancement
Jason Resnick, Computer Patent Annuities Limited Partnership, 225 Reinekers Lane, Suite 400, Alexandria, VA 22314, Fax: 703-625-1406, email@example.com - SLIDES
The discuss of this paper will focus on how IP strategies can be re-engineered to meet the ever changing requirements of business managers. By utilizing data, tools, statistical analysis as well as other factors present within the realm of IPAM, businesses can change both the way they think and react to market, competitive and internal forces. Identification of new ways to not only protect, but commercialize and expand will be discussed. Further case studies on how these strategies can offer significant benefits will provide evidence that using patents as a means of answering business critical decisions is absolutely imperative to the modern day executive.
An application of text mining in chemistry?
Mani Shabrang1, Robert, J. Gulotty2, Bryan warner1, Angela Buske1, and Tim Waugh3. (1) Corporate R&D, Dow Chemical, 2020 Building, Midland, MI 48674, firstname.lastname@example.org, (2) Analytical Sciences, Dow Chemicals, (3) Abbott Labs - SLIDES
In today?s environment, the researcher is frequently saturated with information while starved for knowledge. The aim of text mining is to discover knowledge and patterns that are either non-retrievable or inefficiently retrieved by search tools or by database management tools. Text mining allows one to explore complex relationships among hundreds or even thousands of documents in a textual database by providing a visual interface to the documentation.
In this example, patent and literature work about a heterogeneous catalytic process is processed and analyzed. Visualization techniques, available in two commercial text mining products are used to demonstrate how research leaders in chemical industry can keep track of the technology and gain a bird's eye view of the technical landscape.
Introduction to PDAs (Personal Digital Assistants)
Nicole Hennig, Libraries, Massachusetts Institute of Technology, Building 10-500, 77 Massachusetts Ave, Cambridge, MA 02139-4307, email@example.com
This talk will consist of a brief introduction to PDAs and how they are used, covering questions such as: What is a PDA? What are some different types of PDAs (hardware and software)? What are some basic uses, i.e., note-taking, address book, datebook, wireless connections, and syncing with your computer. How widely are PDAs used? Are they here to stay or just a passing fad? Included will be a brief demo of a Sony Clié PDA (Palm OS) and how it is used in day to day life.
From the palm or pocket to the point of care or need
Helen-Ann Brown, Weill Cornell Medical Library, Weill Medical College of Cornell University, 1300 York Avenue, New York, NY 10021, Fax: 212-746-8364, firstname.lastname@example.org
Use of personal digital assistants (PDAs) is on the rise by all types of health professionals. A PDA held by a member of a patient care team allows valuable information resources to be portable. Patient profiles, medical textbooks, drug formularies, clinical-decision support tools, medical calculators, images of human anatomy, plus so many other resources via wireless internet access can travel as a peripheral brain to the point of care. Information to share, like lecture notes, practice guidelines, on-call schedules and lists of important phone numbers can be beamed from one PDA to another. This reasonably priced hardware and software cuts down on medical errors to improve health.
Making online chemical news portable
Melody M. Voith, Chemical & Engineering News, American Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8187, email@example.com
Chemical & Engineering News has been available online since August 1998. In the fall of 2001, C&EN Online introduced daily news updates. In July 2002, C&EN To Go was launched, allowing readers to download daily news stories to their PDAs. This feature is becoming more prevalent with niche online publishers and there are tools that make content distribution to PDAs relatively easy. The existence of C&EN To Go leads to some interesting questions for people who deal with technology and chemistry. Why would readers want to get C&EN on their PDA? What do publishers gain from offering this service? How does portable news from C&EN differ from that of more general news sources? How important is timely and flexible news delivery? What other information could or should be delivered to chemists in this way? How will publishers greet the next revolution in portable technology?
Mobile chemistry: Structure databases in your palm and your pocket
Antony John Williams, and Valery Tkachenko, Scientific Development, Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596, firstname.lastname@example.org
PDAs entertain many scientists nowadays…and they are either in their Palm or their Pocket. The utility of the PDA has been generally focused on providing access to personal information yet it is an ideal platform to access Chemical Information. Periodic Tables have been available for a while yet access to more general tables of chemical data have been lacking. In this presentation I will provide an overview of various types of chemical information which has been made available on both Palm OS and Pocket PC including the ability to display chemical structure databases. At present it is possible to carry a searchable database of >20,000 structures on an 8 Mbyte Palm computer. I will also provide an overview of the ability to scan 2D barcodes using either PDA and convert these barcodes directly into chemical structures for viewing.
Chemistry's first periodic table digital database calculator
Bert. Ramsay, Department of Chemistry, Eastern Michigan University, Ypsilanti, MI 48197, email@example.com
Chemistry databases are generally used for the information required to carry out some sort of calculation. Imagine Mendeleev running his fingers over his periodic table to extract the data needed to complete the calculation of the predicted atomic weight of Eka-silicon. The task had two rate-limiting steps: 1) the transfer of data, and 2) the manual completion of the calculation. The invention of the chemical slide rule in the early 19th century reduced the time needed for data transfer, but was limited by the number of chemical formulas that could be placed on the slide rule. Calculation times were speeded up with the introduction of computers and calculators – but again the “writing” of chemical formulas was limited by the “unnatural” QUERTY keyboard entry. Only recently has the periodic table been restored to its rightful place as the chemist’s digital database and calculation tool with the invention of the first handheld chemical calculator (U.S. patent 5,265,029)now available running under the Palm OS. Let me show you!
Capturing and harnessing chemical knowledge: accelerating the rate of scientific discovery
Richard D. Hull, Axontologic, Inc, 2646 Windsorgate Lane, Suite 200, Orlando, FL 32828, Fax: 407-208-0367, firstname.lastname@example.org
Much recent discussion has been given to representing biological and chemical concepts and relationships as ontologies. These ontologies have been used as controlled vocabularies, to improve the search and retrieval of biomedical data, to integrate information across domains and organisms, and in some cases to model the scientific discovery process. We present an approach that extends previous ontological models to include new concepts and relationships of interest to the scientific researcher, namely, their concepts and relationships. By expressing these elements more formally, we can automate many time consuming tasks and recognize new relationships between the chemical compounds the researcher is investigating and the biological context(s) in which they reside.
Clustering ambiguity and binary descriptors
John D. MacCuish, and Norah E. MacCuish, Mesa Analytics & Computing, LLC, 212 Corona St., Santa Fe, NM 87501, Fax: 509-472-8131, email@example.com
Clustering algorithms that operate on discrete descriptors, such as molecular descriptors in the form of binary fingerprints, are often confronted with decisions whose outcome is ambiguous. The accumulation of such decisions over the course of a specific run of the algorithm can impact the results, where the resulting clustering may be only one of a large number of distinct clusterings. These clusterings may also differ widely. However, there is no way to tell how much inherent ambiguity is produced by the algorithm by inspecting the specific clustering results. We show this behavior using fingerprints (e.g., Daylight, MDL MACCS Keys) of varying lengths, with several common clustering algorithms in the field (e.g., Wards, Complete Link, Taylor?s). We then show how to quantify the ambiguity, and how such ambiguity measures can be used in level selection or thresholding techniques to lessen the ambiguity and produce more easily understood results.
Statistical analyses of peptide fragmentation pattern in a tandem mass spectral database
Yingying Huang1, Joseph M. Triscari2, Vicki H. Wysocki1, Ljiljana Pasa-Tolic3, Gordon A. Anderson3, Mary S. Lipton3, and Richard D. Smith4. (1) Department of Chemistry, University of Arizona, 1306 E. University Blvd., Tucson, AZ 85721, Fax: 520-621-8407, firstname.lastname@example.org, (2) Science Application International Corporation, (3) Pacific Northwest National Laboratory, (4) Biological Science Division, Pacific Northwest National Laboratory
A large peptide tandem mass spectral database of approximately 30,000 ion-trap peptide spectra, for which correct sequences are known, is built for statistical analysis of general fragmentation trends and preferential cleavage patterns. Statistical analysis and clustering analysis are used to extract chemically meaningful knowledge about peptides gas-phase dissociation mechanism. Novel of its kind, this database enables us to sort the sequences into various categories based on structural features (e.g., doubly-charged peptides terminating in arginine and containing no internal basic residues) and interrogate the data to elucidate how these peptide structural features lead to specific fragmentation behaviors. The results of the analysis, i.e. relative abundances of bond cleavages will be used to determine how fragmentation statistics change as peptide structure changes in gas-phase collision-induced dissociation (CID) experiments. Chemical interactions or residue combinations involved in promoting specific cleavage pathways are determined. The ultimate goal is to improve the general peptide fragmentation models.
Automating rule discovery from data
Susan I. Bassett, President, Bioreason, Inc, 150 Washington Avenue, Suite 220, Santa Fe, NM 87501, Fax: (505) 995-8186, email@example.com
Reasoning systems built to use rules or relationships among chemical structure and biological outcomes are a powerful tool for decision support in lead discovery and development, but the real challenge is in building the rule base or finding the SAR. In this talk, we will illustrate a method for automatically deriving SAR hypotheses or rules of association directly from screening data. Communication of these complex data-driven results to the chemist or biologist in an easily understood form is another key factor in their successful deployment. These concepts will be illustrated on publicly available screening datasets.
Prediction of peptide binding using Bayesian learning
Ton Van Daelen1, David Rogers2, and Robert D Brown2. (1) SciTegic Inc, 9665 Chesapeake Drive, Suite 401, Fax: 858-279-8804, firstname.lastname@example.org, (2) SciTegic, Inc, 9665 Chesapeake Dr, Suite 401, Fax: 858-279-8804, email@example.com, firstname.lastname@example.org
We have built predictive models for the binding activity of peptides from a statistical analysis of the peptide sequence. We used the MHCPEP library of thirteen thousand peptides with reported immunogenic potency (‘activity’) and binding to MHC class I or II molecules in human or mouse. Each peptide was encoded using a text ‘fingerprint’, comprised of consecutive amino-acid sequences of various lengths. We created additional fingerprints by using a translation of the sequence according to the polarity and chemical characteristics for each amino-acid. Bayesian models were then built from these fingerprints. Models for general activity and for individual classes of MHC molecules show significant enrichments in test sets held back from model building (ROC=0.80 to 0.88). Fingerprints encoding fragments up to length 4 were found to be most effective. Analysis of the models suggests certain subsets of 3-4 consecutive amino acids seem to correlate with binding activity. These patterns may be used to design new classes of peptides having high probabilities of exhibiting the same activity.
New stochastic algorithm to determine drug-likeness
Anwar Rayan, Dinorah Barasch, Gadi Brinker, Ayelet Cycowitz, Inbal Geva-Dotan, Andrea Scaiewicz, and Amiram Goldblum, Department of Medicinal Chemistry, Hebrew University of Jerusalem, School of Pharmacy, Jerusalem 91120, Israel, Fax: 972-2-675-8925, email@example.com, firstname.lastname@example.org
A novel stochastic algorithm for detecting the global minimum of highly complex potential surfaces as well as any number of low energy structures above that minimum has been devised in our laboratory (Proc Nat Acad Sci USA 2002; 99:703-8). It is based on a random choice of variable values for constructing a full system configuration, evaluating its cost function, repeating this random choice and its evaluation many times, and eliminating variable values that contribute consistently to "expensive" configurations. We proceed iteratively through many cycles of eviction, to a point from which an "exhaustive" calculation of all remaining possibilities is feasible. This algorithm was recently applied to test the ability for predicting the difference between drugs and non-drugs, based on databases for the two types of molecules. We employed the Matthew's coefficient (1) as our cost function and found that Lipinski's "rule of five", which was originally devised for bioavailability, may be useful for differentiating between drugs and non-drugs, but with different ranges of the descriptors. In order to apply these rules, the upper and lower limits of Lipinski's variables (lipophilicity, H-bonding potential and Molecular Weight) were transformed into "combinatorial" variables and studied by our algorithm. The results are many options for "successive filtering" of a compound for decision making.
(1) TM Frimurer et al., J. Chem. Inf. Comp. Sci. 40, 1315-24 (2000)
Systematic analysis of large screening sets
Paul Blower1, Kevin P. Cross1, Glenn Myatt1, Chihae Yang1, Michael Fligner2, and Joseph Verducci2. (1) LeadScope, Inc, 1245 Kinnear Rd, Columbus, OH 43212, email@example.com, (2) Ohio State University
We have developed a novel, systematic process for analysing the structure-activity relationships (SAR) of large, heterogeneous data sets. It leaverages existing techniques as components in a common, overall analysis process that can be applied differently in individual cases. We first group active compounds into structurally meaningful categories, then perform an outlier analysis of the classification results. For each active class, we identify key macrostructural features by reassembling common structural building blocks in the class. The algorithm can be parameterized to meet differing objectives: (1) features that discriminate for biological activity, (2) scaffolds for R-group analysis, and (3) features that discriminate for membership in the class. The macrostructures that discriminate for activity are useful descriptors for building local prediction models; others provide the basis for R-group analysis to further refine the SAR within the class. The local SAR models can then be used to efficiently select new compounds for follow-up screening.
Get dynamic! - E-education tools and e-services to reach the users
Martin Braendle, and Engelbert Zass, Chemistry & Biology Information Center, ETH Zuerich, ETH Hoenggerberg - HCI, CH-8093 Zuerich, Switzerland, firstname.lastname@example.org - SLIDES
Since the impact of electronic information such as online journals, databases and e-books is still growing, the use of information sources has largely shifted from the library to the workbench of the scientist. As a consequence, a user may require less of a librarian’s support to access sources. We feel, however, that user’s searching skills must continuously be improved. To investigate the information needs of the scientists and how they adopt electronic information, we have recently performed a survey among PhD students, graduate scientists and staff members of the Chemistry Department of ETH Zurich. The results lead to envisage two main strategic directions: first, focus is directed to chemical information courses and general library instructions which are accompanied by e-educational content. For this, virtual learning units for chemical information (general library instructions, electronic journals, database searches) are presently being created within the project „Networks for Chemistry Education“ . Second, electronic services and electronic content with enhanced value to users by subject control and personalization are provided. Our in-house library information system CLICAPS  is at the core of these services, and examples are shown elaborating the steps to be taken for a web portal.
Using a chemistry subject web page as an information marketing tool
April M. Love, Science Library Reference, University of California, P. O. Box 19557, Irvine, CA 92623-9557, Fax: 949-824-3114, email@example.com - SLIDES
Experience has shown that students and faculty at academic institutions are not aware of the efforts of librarians and information subject specialists to create web pages that offer "one-stop-shopping" for research support. The Chemistry Subject Page at the University of California Irvine Libraries' http://www.lib.uci.edu/online/subject/chem.html is an entryway for classroom instruction and an introduction to the chemical literature, designed to support all levels of students and faculty. This gateway concept also supports instruction for chemical engineering, materials science and physics. It is a handy resource for library staff, many of whom have little or no chemistry subject expertise, who provide chemistry assistance at reference service points, including virtual reference. This presentation will show how to use and promote subject pages to market library electronic resources in instructional settings.
Confusion or convenience: How can the librarians help the library users to access electronic journals?
Song Yu, Libraries, Purdue University, Purdue University Libraries CHEM, 504 West State Street, West Lafayette, IN 47907, firstname.lastname@example.org - SLIDES
While librarians strive to allocate their limited budgets to provide more online electronic content to their users, they face another problem: how to teach the users to locate these resources.
Full-text electronic journals, for example, can be found from various places: the publishers°¯ web sites, such as ACS Publications; full-text links in a database, such as SciFinder Scholar; electronic journal portals, such as HighWire Press; etc. It is difficult even for a veteran librarian to keep track of every possible way to find a full-text article that is available from the library°¯s collection, let alone those inexperienced users. This presentation will show several techniques that librarians have used in library instruction classes that train the users how to access the e-journals and how to utilize the services from the e-journals other than getting the full-text articles.
Federated searching and academic libraries: One size fits all?
Sarah Chandler, and Nancy C. Hyland, Catherwood Library, Cornell University, 239D Ives Hall, Ithaca, NY 14853, Fax: 607-255-9641, email@example.com, firstname.lastname@example.org - SLIDES
Advances in database storage and retrieval have made it possible to access and retrieve data and information from varied remote resources. Cornell University is introducing federated searching as part of its existing e-Reference Collection (http://www.library.cornell.edu) using the ENCompass database system in May 2003. Following the launch of the new system, "Find Databases/Find Articles," analyses of the data logs will be run. The central goal is to analyze behavior of users interacting with the new system through the ENCompass interface. Project investigators will perform this analysis by extracting session log data from the Oracle tables and the Apache web log, storing the data in a MySQL database and running queries. From this analysis, we hope to gain a realistic picture of how users are using the e-Reference system. In addition, we envision using this analysis to help develop a methodology for evaluating user behavior. The authors will present preliminary findings of these analyses.
John P. Ochs, Director of New Product Development, Publications Division, American Chemical Society, 1155 Sixteenth Street, Washington, DC 20036, Fax: 202-452-8913, email@example.com - SLIDES
Due to the rising percentage of online materials in library collections, interest in developing common standards for the collection and distribution of vendor-based online usage statistics is becoming increasingly important both for publishers and for their library customers. Publishers need consistent usage information for a variety of reasons such as marketing their services to authors and informing their editorial and product development policies, while libraries need this information for an equally important set of reasons including collection development and institutional budget requests. Although there have been a number of independent projects in this area, most notably the pioneering work of ICOLC and the ARL’s E-Metrics project, COUNTER (Counting Online Usage of Networked Electronic Resources) was the first international initiative organized to try to harmonize the work of these various initiatives to create a single, internationally accepted, extendible Code of Practice that will allow publishers to provide usage statistics in a way that all parties can trust to be consistent, credible and compatible. An overview of COUNTER along with some insights into current work in progress and planned future developments will be presented.
NIST's first data oriented eBook: Handbook of basic atomic spectroscopic data
Shari L. Young, William C. Martin, and Jean E. Sansonetti, Technology Services / Measurement Services Division, National Institute of Standards and Technology, 100 Bureau Drive Stop 2310, Gaithersburg, MD 20899-2310, Fax: 301-926-0416, firstname.lastname@example.org - SLIDES
The National Institute of Standards and Technology (NIST) has released its first data-oriented eBook, Handbook of Basic Atomic Spectroscopic Data. This eBook contains a selection of the most commonly used spectroscopic data for neutral and singly ionized atoms of 99 elements (H through Es). The process of creating the eBook required several steps: evaluating data, designing a database, importing the evaluated data into the database, adding HTML markup, exporting the data from the database into files formatted to meet the eBook requirements and finally loading the files to the eBook. This process was modified slightly to create a Website, http://physics.nist.gov/Handbook, containing the same atomic data. The data are now available in three formats: an eBook version, a formatted Web version and an ASCII Web version. Having these different formats provides the user community the flexibility of selecting the desired format that best meets its needs. However, each of the different formats had its unique constraints that had to be dealt with. This talk will describe the process of creating these different file formats as well as the problems that were encountered and how they were solved.
Towards a universal physical property data index
Peter J. Linstrom, Physical and Chemical Properties Division, NIST, Building 221, Room A111, 100 Bureau Drive, Stop 8380, Gaithersburg, MD 20899-0830, Fax: 301-896-4020 - SLIDES
Physical property data can be found in many sources: journal articles, databases and books. It is a challenge for researchers to screen all these resources to find data of interest. This process is time consuming and potentially quite expensive. The difficulties in locating such data affect both consumers and vendors of data products. Consumers suffer because they must search many sources to find what they are looking for and have no idea whether the data they need exists in the first place. Vendors suffer because potential consumers are not aware that their products contain the data they need.
This talk will discuss a potential solution to this problem, an index of physical property data sources organized by chemical species and data type. This index would provide publishers and database vendors an outlet to indicate the potential consumers the data they have available and would provide consumers with a directory of sources for the data they need.
Emetrics: Lessons learned from the ARL Emetrics project, challenges and opportunities
Martha Kyrillidou, Senior Program Officer for Statistics and Measurement, Association of Research Libraries, 21 Dupont Circle, Washington, DC 20036, Fax: 202-872-0884, email@example.com - SLIDES
ARL libraries are engaged in developing measures for measuring the extensiveness, cost and use of electronic resources across products and libraries in a consistent and systematic fashion. Libraries have been spending increasing amounts of money on electronic resources – for a typical ARL libraries this amount averaged 19% of the library materials budget in 2001-02. It is becoming increasingly critical to have good measures of what do libraries and users get from these e-resources. What they ‘get’ needs to be translated in terms of the actual resources they receive, in terms of what they spent for these resources, what they use, and what value they derive from them. This presentation will present the lessons learned so far from the ARL Emetrics New Measures Initiative and ARL’s decision to support collaborative efforts like Project COUNTER. It will also present some of the challenges and opportunities of trying to develop performance indicators for a highly volatile and changing environment.
Integrating content for an improved customer experience
Martin Tanke, Chemistry and Chemical Engineering, Elsevier, Amsterdam, Netherlands, M.Tanke@elsevier.nl
On-line Chemistry Information resources are available on numerous different platforms and accessible via at least as many different business models. Integration and linking between these resources are often anything but seamless. Elsevier offers a large collection of Chemistry resources varying from peer-reviewed full-text journal content on Science Direct to factual/bibliographic databases like Crossfire Beilstein and community services such as ChemWeb. In this presentation the relationship between these resources is explained and insight is given into Elsevier’s strategy towards enhancing the interoperability between its various platforms with the ultimate goal of improving the customer experience.
Leveraging information from enzyme superfamily studies
Scott C-H. Pegg, and Patricia Babbitt, Department of Biopharmaceutical Sciences, University of California, San Francisco, Genentech Hall, N476, 600 16th Street, San Francisco, CA 94107, Fax: 415-502-1411, firstname.lastname@example.org
Our ability to engineer new function in enzymes requires a better understanding of how protein function is determined by sequence and structure. Nature, having often evolved several different functions from a single protein ancestor, provides us with a set of example solutions to enzyme engineering problems. We have examined very distantly related protein sequences and structures for clues to understanding how structural scaffolds associated with some specific protein superfamilies evolved to deliver common elements of function as well as specificity. Results from analysis of several such superfamilies suggest that chemistry, rather than ability to bind a specific substrate type, is the critical determinant in the evolution of new enzyme functions within each superfamily. These results also suggest that new types of functional description, tuned specifically to the explicit mappings between conserved elements of structure and function, will be very useful in enzyme engineering, as well as important for the inference of function from sequence information in non-trivial cases. In an effort to leverage the information collected in our studies of enzyme superfamilies, we have created the Structure-Function Linkage Database (SFLD). This resource allows users to investigate enzyme function in many useful ways, including searching for structural scaffolds for common partial reactions and identifying the functionally important residues of new sequences.
Mining the MDDR database using TIMI as a way to find relationships between activity records
Robert P. Sheridan, Molecular Systems, Merck Research Laboratories, Rahway, NJ 07065, email@example.com
Mining the MDDR database using TIMI as a way to find relationships between activity records Robert P. Sheridan, Merck Research Laboratories, Rahway, NJ 07065
The MDDR (MDL Drug Data Report) contains a large number of diverse compounds compiled from the patent literature. Each structure has one or more biological activity labels associated with it, and there are approximately 700 unique activity labels over the entire database. Since the activity labels are curated by hand, they can be incomplete and/or inconsistent. TIMI is a method of relating words and chemical structures in a set of documents. Here we use TIMI to calculate similarities between the biological activity records based on correlated chemical descriptors and/or words. Some of these relationships are not obvious, in the sense that some of the most similar activities share very few compounds in common. Given the similarities, we can cluster the activity labels. In many cases the clusters associate broad therapeutic areas with specific mechanisms of action in a way generally consistent with pharmaceutical lore. One application where the similarity of activity labels is important is the mining of multi-activity substructures. The goal of that exercise is to find substructures that occur very frequently in the MDDR and that are associated with many different biological activities. Having similarities between activity labels allows us to determine the number of truly different activities associated with each substructure. We are able to identify many multi-activity substructures in the MDDR. Some, like the steroid nucleus and the tricyclic basic amines, are expected. Others, like adenosine and arylpiperazines, are not.
Knowledge mining in formulation databases
Elizabeth A Colbourn1, Raymond C Rowe2, and Stephen J Roskilly1. (1) Intelligensys Ltd, Belasis Business Centre, Belasis Hall Technology Park, Billingham TS23 4EA, United Kingdom, Fax: 011-44-1642-714305, firstname.lastname@example.org, (2) PROFITS Group, University of Bradford
Product formulation poses a number of challenges, since end-use properties are affected both by ingredients and by processing conditions. Frequently, formulations evolve over time, so it is possible that extraneous ingredients add to the cost, but not the performance, of the product. Cause-and-effect relationships connecting changes in the formulation to changes in the product properties are often known only anecdotally, and this problem is generally exacerbated because experimental data can be relatively scarce. In the work reported here, we have investigated use of techniques borrowed from artificial intelligence and evolutionary computing, to mine pharmaceutical and chemical databases in order to gain useful models and actionable rules. The pros and cons of the various methods have been investigated. Despite conventional wisdom that large amounts of data are required, we demonstrate that useful information can be extracted even from small databases.
Detecting novel therapeutic targets with in-silico homologs
Susan McClatchy, Alex Elbrecht, Bruce Bush, Payan Canaran, Jeffrey Yuan, and Richard Blevins, Bioinformatics, Merck Research Labs, 126 E. Lincoln Ave, Rahway, NJ 07065, Fax: (732) 594-2929, email@example.com
Genomic analysis estimates that the number of therapeutic targets range from 600 to 10,000. These numbers necessitate fast, automatic methods to scan genomic data for novel targets and to select the most promising for drug development.
We designed an automated method to create in-silico homologs of human therapeutic target proteins by searching a six-frame human genome translation with 377 known target sequences. In-silico proteins were assembled from hits to the genome according to chromosome number and orientation. Homology to generating sequence was checked by querying the Refseq protein database. Of 55,465 in-silico sequences, 1,344 identified their generating target as nearest homolog. After querying sequences against a human EST database, sequences having similarity to a human EST were submitted to the BLIS genomic viewer for placement within the genome. In-silico sequences distinct from their generating target and >45% identical to a human EST were further analyzed with EST and mRNA alignments
Drug Rings Database with web interface: A tool to aid in ring replacement strategies
Xiao Q Lewell1, Andrew C Jones1, Craig L Bruce1, Gavin Harper1, Matthew M Jones1, Iain M Mclay1, and John Bradshaw2. (1) Computational and Structural Sciences, GlaxoSmithKline Research and Development, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom, Fax: 44-1438-764918, Xiao.Q.Lewell@GSK.COM, (2) Daylight Chemical Information Systems Inc
The effort to replace chemical rings forms a large part of the medicinal chemistry practice. This paper describes our effort in developing a comprehensive database of drug rings supported by a web-enabled searching system. Analysis of the rings found in several major chemical databases will be described. The use of the database will be illustrated through application to lead discovery programs for which bioisosteres and geometric isosteres were sought.
Analysis of DNA simulation trajectories using relational database and web based tools
Surjit B. Dixit, and David L. Beveridge, Department of Chemistry and Molecular Biophysics Program, Wesleyan University, Lawn Avenue, Middletown, CT 06459, Fax: 860-685-2211, firstname.lastname@example.org
Simulation techniques such as molecular dynamics provide useful insight into complex dynamical properties of biomolecules. With the increasing availability of computational resources, their use in major research initiatives is becoming a viable option, making the issue of distribution and handling of the generated data a challenging informatics task. One such research initiative involves a collaborative effort of the "Ascona B-DNA Consortium", aimed at understanding the sequence specific structural aspects of DNA based on realistic simulation of B-DNA oligomers containing all unique tetranucleotide base sequences. The data generated during these simulations are several hundred gigabytes. We are developing a SQL based relational database system that aims to simplify and speed up the task of making complex queries into the repository for information on various subsets of nucleic acid segments. Queries can also be performed online through a dynamic web interface. This article deals with the informatics aspect of such an initiative.
Automated pharmacophore extraction: FlexX goes SQL
Holger Claußen1, Volker Apelt1, Marcus Gastreich1, Sally Ann Hindle1, Christian Lemmen1, and Jonathan Greene2. (1) Chemoinformatics, BioSolveIT GmbH, An der Ziegelei 75, 53757 St. Augustin, Germany, Fax: +49 2241 973 66 88, Holger.Claussen@biosolveit.de, (2) cambios computing LCC
Scoring is still an open problem for docking and an all-purpose scoring function is unlikely to be found at all. Thus, target specific scoring functions are quite attractive, especially as the available data for particular targets grows. In order to create such target specific scoring functions repeated detailed analysis of docking solutions is necessary.
We present an integrated docking workflow that allows for interactive analysis of docking results and rapid prototyping of scoring functions. All docking information is stored in a database. The data can be analyzed by interactive spread sheets, from which 2D and 3D viewers can easily be launched. New scoring functions as well as filters can be defined and tested in this environment.
As a proof of concept for this approach, we performed a docking study with a set of known CDK2 inhibitors with FlexX and derived pharmacophor constraints by statistical analysis of these results. Using these constraints during docking with FlexX-Pharm let to significantly improved enrichments.
 Hindle et al., JCAMD, 16, 129–149, 2002
How scientific information supports the research process, a scientist's perspective
John J. Talley, Microbia Inc, 320 Bent Street, Cambridge, MA 02141, Fax: 617-494-0908, email@example.com
Scientists in discovery research today are confronted by unprecendented challenges in the search for new drugs. Never before has so much information been available or so accessible. Getting the right infomation at the right time is paramount to the successful R&D project. Appropriate content and tools are required to enhance the productivity and creativity of the scientist, truly enabling them in the their quest for that "aha" moment. This talk will center on the important role of chemical information played in the discovery and development of cox-2 inhibitors.
Using voice of the customer to guide the development of the strategies, services, resources, and tools of a corporate information services organization? 3M's approach
Barbara J. Peterson, Library & Information Services, 3M, 201-1S-09 3M Center, St. Paul, MN 55144, Fax: 651-736-0902, firstname.lastname@example.org
Library & Information Services (LIS), has two major components – a Global Service Delivery Team (GSDT) aligned with 3M’s seven Businesses and a Global Resource Development Team (GRDT) that uses “voice of the customer” data to develop, deploy and market information tools and resources that enable 3Mers worldwide to more effectively create, share and share knowledge. The GSDT research teams are using state of the art data visualization and information management tools to meet client requirements. A globally deployed virtual library, supported with just in time education, provides desktop access to a vast array of electronic resources and enables 3Mers to also carry out their own research. The deployment of Six Sigma at 3M, has provided LIS with a variety of approaches for gathering the “voice of the customer” and using it in the development and/or improvement of information strategies, services, resources and tools.
3 Steps to Better Medline Searches
Soaring Bear, MeSH, NLM/NIH, 8600 Rockville Pike B2E17, Bethesda, MD 20894, Fax: 301-402-2002, email@example.com
Three easy steps to better Medline searches from an expert at National Library of Medicine. The information explosion requires prudent strategy for timely finding of the information gems you are seeking in the growing haystack (12 million citations in Medline now). A balance of widening (with OR terms) and narrowing (with NOT terms) can be facilitated with three tools provided by Pubmed: Details, Display Citation and Mesh Browser (http://www.nlm.nih.gov/mesh/2003/MBrowser.html)
Managing a customized information resource: an introduction to MyRensSearch at RPI
Evelyn C. Powell, Physical and Chemical Sciences Librarian, Rensselaer Polytechnic Institute, Folsom Library, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180-3590, Fax: 518-276-2004, firstname.lastname@example.org
MyRensSearch fits into the session topic entitled "Fishing for the right scientific information". MyRensSearch software is adapted at RPI from the original MyLibrary software created at North Carolina State University. A user survey at North Carolina State University had indicated that users liked doing their own electronic searching but sometimes needed the help of librarian. My RensSearch software allows a user to customize his or her information page with those databases, electronic journals, reference resources and local and distant web sites that are of interest. At the same time the services of a librarian are available any time - a nice combination for many students here at RPI. My talk will begin with an overview of MyRensSearch and conclude with a live demonstration of the system's capabilities.
Synergy of old and new information: Together in the CA/CAplus databases
Jan Williams, Chemical Abstracts Service, 2540 Olentangy River Rd, Columbus, OH 43202-1505, Fax: 614-447-5470, email@example.com
Since Volume 1 of Chemical Abstracts in 1907, CAS has been committed to partnership in the research process by providing unique and valuable information for scientists. Recently, CA/CAplus went "back to the future" by making the bibliographic and abstract information from records in the 1st - 7th Collective Index periods (1907-66) fully searchable. Furthermore, with other database enhancements, the overall subject and substance content in the CAS files now includes a sophisticated collection of new and enhanced data. While any of these enhancements will contribute to improved retrieval, they have special power when used in combination. For example, addition of the indexing for pre-1967 records is in progress, but the CA Lexicon on STN along with other thesaurus capabilities can successfully strengthen Basic Index searching for documents from these early years. Another example is the bridging of older information to the present, not only with search terms but also with citations. However, utilization of content depends on effective delivery platforms to maximize search strategies and to support unusual or unique exploration of the data. The access of information with both powerful content and tools can be illustrated by a synergistic circle in which searchers can make old and new data categories work together for more comprehensive results.
Comparing protein-bound ligand structures with in-silico generated conformations
Omoshile O. Clement, Swati Puri, F. Gliubich, Shikha Varma, Clive Freeman, Marvin Waldman, and Jiabo Li, 9685 Scranton Rd, Accelrys Inc, San Diego, CA 92121-3752, firstname.lastname@example.org
Catalyst, a pharmacophore modeling and 3D database mining program, uses conformations to explore the pharmacophore space occupied by any given ligand. For this study, we have investigated the ability of Catalyst conformation generation methods (FAST and BEST) to reproduce bound structures of 134 compounds for which crystal structures of the protein-ligand complexes have been published. The structures range in complexity with rotatable bonds in the range of 0 – 33. In general, RMS deviations between X-ray structure of protein-bound ligand, and those generated by Catalyst (heavy atoms only) range from 0.2 – 4Å. Using electron diffraction maps, we show that structures with large RMSD (>2Å) between X-ray data and conformation generated by Catalyst, can be overlaid with good fits, implying that the use of RMSD is not a sufficiently valid metric for evaluating in silico conformation generation software. The diversity and coverage of the pharmacophore space was also assessed with hypotheses generation and validated with standard metrics that measure selectivity (%Yield), coverage (%Actives), enrichment (E), and Goodness_of_Hit (GH), against databases enriched with known actives. The results show that Catalyst conformations and pharmacophore models derived from these conformations closely approximate binding feature space for ligand-protein interactions.
An investigation into analyzing patents by chemical structure
Mark A. Calcagno, Business Intelligence Services, Procter and Gamble, Health Care Research Center, 8700 Mason-Montgomery Road, Mason, OH 45040, email@example.com
Patent mapping usually involves an analysis of standard bibliometric measures such as inventors, patent assignees, patent countries, patent years or dates, and, of course, activities or uses found in the patents. In addition to these, of great importance to pharmaceutical chemists are the chemical structures patented and how they relate to the above measures. This poster will present our attempts to analyze pharmaceutical patents utilizing the chemical structure information provided in Derwent’s World Patent Index.
Application of novel molecular alignment method using Hopfield neural network to 3D-QSAR
Kimito Funatsu, and Masamoto Arakawa, Department of Knowledge-based Information Engineering, Toyohashi University of Technology, Tempaku, Toyohashi 441-8580, Japan, Fax: +81-532-47-9315, firstname.lastname@example.org
Comparative Molecular Field Analysis (CoMFA) is frequently used as standard QSAR technique, but some problems still remain. The molecular alignment is one of the key problems in QSAR study. In the CoMFA and most other 3D-QSAR techniques, a proper alignment between molecules is necessary. Recently, we invested and proposed the novel molecular alignment method with Hopfield Neural Network (HNN). This alignment method is based on methodology which solves the pattern-matching problem developed by Doucet et. al.. The molecules are represented by four kinds of chemical properties (hydrophobic group, hydrogen-bonding acceptor, hydrogen-bonding donor, and hydrogen-bonding donor/acceptor), and then those properties between two molecules are corresponded each other using HNN. Twelve pairs of enzyme-inhibitors were used for validation, and our method could successfully reproduce the real molecular alignments obtained from X-ray crystallography. In this study, we apply the molecular alignment method to three-dimensional quantitative structure-activity relationship (3D-QSAR) analysis. Two data sets (Human epidermal growth factor receptor-2 inhibitors and cyclooxygenase-2 inhibitors) were investigated to validate our method. The robust and predictive CoMFA models could be successfully obtained in both data sets.
Design of protein-ligand interactions using free energy analysis of conformational ensembles
Richard A. Bryce, and Pascal Bonnet, School of Pharmacy and Pharmaceutical Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom, Fax: 0161-275-2481, email@example.com, firstname.lastname@example.org
Quantitative prediction of noncovalent binding affinity between a ligand and receptor remains a challenge in computational biophysics and drug design. Recent developments in methodology, employing a hybrid molecular mechanical/continuum solvent potential for analysis of conformational ensembles, has demonstrated predictive power for a diverse range of molecular interactions. We have explored an elaboration of this approach, calculating binding free energies for multiple ligands based on a single reference trajectory, in particular accomodating large structural variation via a combined conformational search/minimization strategy. We examine the efficacy of the method for describing protein-ligand interactions and consider the dependence on perturbation protocol for a series of Influenza neuraminidase inhibitors.
Performance of a conformational space search method by Grid technology: Development of 3D-structure database for drug discovery platform
Goto Hitoshi1, Mitsuhisa Sato2, Taisuke Boku2, Umpei Nagashima3, and Hiroshi Chuman4. (1) Toyohashi University of Technology, Toyohashi, Japan, email@example.com, (2) University of Tsukuba, (3) National Institute of Advanced Industrial Science and Technology, (4) Faculty of Pharmaceutical Sciences, University of Tokushima
For a rational drug discovery by high-performance computing, we have been developing several programs such as MM for an exhaustive conformational search of drug molecules, a replica-exchange MD (REMD) and an ab initio fragment MO (FMO) for a huge biomolecular system. In this meeting, we report on a performance of an extensive conformational search method for a series of molecules by using high-performance techniques, especially, GRID technology. A huge number of the conformers found are register to 3D-structure database, and then, can be selected for the candidates as the initial structures for REMD and FMO calculations and the bioactive conformers with automatic docking simulation.
Investigation of the aromaticity of cyclic conjugated systems by global hardness obtained through novel general ABEEM model on the basis of maximum hardness principle
Yao Cong1, Zhongzhi Yang2, and Willy Wriggers1. (1) School of Health Information Sciences, University of Texas Health Science Center ¨C Houston, 7000 Fannin, Suite 600, Houston, TX 77030, (2) Department of Chemistry, Liaoning Normal Univeristy
A scheme for efficient calculation of global hardness is proposed within the novel general atom-bond electronegativity equalization method (ABEEM). Since the structure of double bond is explicitly considered in this method, it is especially suitable to deal with organic and biological molecular systems. According to maximum hardness principle (MHP), hardness is a good indicator of the stability of a system, which is also the theoretical index of aromaticity of cyclic conjugated molecules. Hardness values obtained through ABEEM method are reported for a number of conjugated hydrocarbons. Analysis shows that the global hardness within this method is a good indicator of the aromaticity, i.e. higher hardness value corresponds to higher aromaticity, and vise visa. Our result is in good agreement with experimental findings and other theoretical aromatic indications, such as Hess°¯ Resonance Energy Per Electron (REPE) and Dewar°¯s REPE. However, using our scheme to get the aromaticity index is more efficient and can avoid the reference structure dependence because, unlike these two theoretical methods, we don°¯t need to choose the reference structure and carry out the ab initio calculation.
Personal Electronic Chemical Reference Archive (PECRA) built on Microsoft Windows® Explorer
Y. Larry Zhang, Agricultural Products Group, FMC Corporation, Research and Development Center, P.O. Box 8, Princeton, NJ 08543, Fax: 609-951-3603, firstname.lastname@example.org
In light of the rapidly growing personal computer and information technology, almost all major scientific journals and periodicals are being published online through the Internet with articles available for fast downloading. It has become practicable that one can build up a PC-based reference archive to not just store all electronic documents (research articles, reviews, patents, presentations, etc) but also allow user to query with selected criterion, search on topic or keyword and retrieve documents by using certain desktop tools available from PC operating systems. In this regard, we have created a PC-based Personal Electronic Chemical Reference Archive (PECRA) system using Microsoft Windows® Explorer as a tool to serve on both archiving and searching purposes. We will discuss the details of this handy and useful system, from its creation to routine management across the concerns of related aspects, e.g. documentation strategy, name selection for files and folders, archive backup, search capability, etc. PECRA has served routinely as a significant supplement to the traditional hard copy/paper folder/metal cabinet archive system in our lab. The new system comes with attractive features like portability (may be loaded on a desktop or laptop PC or on a compact disc), high speed file saving, retrieving and duplicating, flexibility on file organizing and transferring, multiple choices for querying and searching, timeless quality in printing and on-screen reading. Nevertheless, the importance of PECRA will depend on and yet grow along with the availability of the web editions of the previously published scientific journal issues.
Microsoft Windows is a trademark of Microsoft Corporation ©FMC Corporation 2003
Platform for drug discovery by grid technology: Large-scale molecular calculations and utilization of 3D descriptors
Hiroshi Chuman1, Umpei Nagashima2, Takeshi Nishikawa2, Masakatsu Ito2, Hitoshi Goto3, Naofumi Nakayama3, Taisuke Boku4, Mitsuhisa Sato4, Cheng Feng5, and Yuichiro Inagaki5. (1) Faculty of Pharmaceutical Sciences, University of Tokushima, 1-78, Shomachi, Tokushima 770-8505, Japan, Fax: 81-88-633-9508, email@example.com, (2) National Institute of Advanced Industrial Science and Technology, (3) Toyohashi University of Technology, (4) University of Tsukuba, (5) Fuji Research Institute Corporation
For a rational drug discovery by High Performance Computing, we have been developing several programs such as MM for an exhaustive conformational search of drug molecules, a replica-exchange MD (REMD) and an ab initio fragment MO (FMO) for a huge biomolecular system. These programs are highly paralleled on a PC cluster. We are implementing Grid facility to these for the further efficiency. As the first step of screening, the extensive conformational search for a series of molecules is carried out, and then candidates as the bioactive conformer should be selected among a huge number of the resultant conformers. We demonstrated that the classification and Self Organization Map based on alignment independent 3D descriptors are useful in defining the bioactive conformation of HIV-protease inhibitors. After the above screening, REMD and FMO for the complex of the selected conformer with HIV-protease are executed to obtain its detailed energetic and structural profiles.
Report generator and analysis tool for Reciprocal Net crystallographers
Leah Sandvoss, Department of Chemistry, Indiana University, 1200 Rolling Ridge Way #1311, Bloomington, IN 47403, firstname.lastname@example.org, and Dennis Groth, School of Informatics, Indiana University
With increased demand for storage of scientific data comes a corresponding demand for efficient retrieval mechanisms necessary for analytical and reporting purposes. This is quite true for the Reciprocal Net, a database of crystal structure data from several participating universities. The current design of the database is focused on accurately capturing information as it is generated in the course of crystallography experiments. This system lacks an ability to get a high-level view of search results, which is essential for a thorough analysis of multiple samples with given attributes. The details of the major features and functions of ITCLA - an Informatics Tool for Crystallography Laboratory Administrators - are diagrammed and discussed within this presentation. ITCLA will save time for administrative users, who otherwise would have performed the tasks provided manually, allowing them to focus on more complicated statistical calculations. The output from the system is processed via an XSLT for presentation in a number of forms.
The Cambridge Structural Database (CSD) and its research applications in structural chemistry
Frank H. Allen, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44-1223-336033, email@example.com
Compilation of the CSD for small-molecule crystal structures began in 1965, with 4,000 published structures and 700 new structures per year. The recording of primary crystallographic results meant that the CSD was among the first numerical databases. Now, the CSD records numerical, chemical and bibliographic data for 300,000 structures and will add ~23,000 new structures in 2003. The CSD System: the database plus software for structure searching, information retrieval, data analysis and structure visualisation, is used by academics in 56 countries and over 120 commercial companies. Knowledge discovery using CSD data for molecules and their interactions has generated 1,000 research papers, including geometry tabulations, conformational analyses, reaction pathway studies, and information on hydrogen bonds and other non-bonded interactions. Important knowledge mining methodologies have also been developed. Now, CSD data underpins applications software in the life sciences and crystallography and structural knowledge bases are augmenting the distributed CSD System.
Data mining of crystallographic databases as an aid to drug design
Robin Taylor, Cambridge Crystallographic Data Centre, 12, Union Road, Cambridge CB2 1EZ, United Kingdom, Fax: 44 1223 336033, firstname.lastname@example.org
Crystallographic databases have long been used to assist drug design by providing information about conformational preferences and nonbonded interactions. Comparatively recently, value-added "knowledge bases" have been produced that provide highly automated access to specific types of molecular information mined from the primary crystallographic databases. A challenge now is to use these knowledge bases to drive applications that address problems of direct relevance to rational drug design, such as the location of binding points in active sites or the identification of the energetically-accessible conformations in which a ligand might bind. Another opportunity for exploitation of crystallographic data comes from the presentation of the data as objects in an OO scripting language such as Python. An example is Reliscript, a Python module which enables protein-ligand complexes to be analysed easily and flexibly. Developments like these promise to keep crystallographic data at the forefront of rational drug design.
The evolution of the Protein Data Bank
Helen M. Berman1, John D. Westbrook1, Philip E. Bourne2, Gary L. Gilliland3, Judith L. Flippen-Anderson1, and PDB Team4. (1) Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ 08854, email@example.com, (2) San Diego Supercomputer Center, University of California, San Diego, (3) National Institute of Standards and Technology, Center for Advanced Research in Biotechnology, (4) Rutgers, SDSC/UCSD, CARB/NIST
The Protein Data Bank is committed to providing well-annotated data about the three dimensional structures of biological macromolecules in a timely and efficient manner. There are many challenges that face the PDB. The growth rate and complexity of the data continues to increase. At the same time, the community has grown and changed as have their expectations of this data resource. The PDB systems are built on a framework that has anticipated both the growth and the changes. The ways in which systems have evolved for collecting, annotating, querying and distributing these data will be described.
The Protein Data Bank (PDB) as a research tool
Philip E. Bourne1, John D. Westbrook2, Helen M. Berman2, Gary L. Gilliland3, Judith L. Flippen-Anderson2, and PDB Team4. (1) San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, firstname.lastname@example.org, (2) Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, (3) National Institute of Standards and Technology, Center for Advanced Research in Biotechnology, (4) Rutgers, SDSC/UCSD, CARB/NIST
The PDB has been designed to enable scientific discovery by providing data, tools, and more generally a gateway (portal) to structural biology and structural bioinformatics. We will describe the capabilities of the PDB in this regard by way of several scenarios aimed at educators, structural biologists, and computational biologists engaged in activities ranging from a better understanding of structure-function relationships to drug design.
When can fractional crystallization be expected to fail? Information from the Cambridge Structural Database
Carolyn P. Brock, Department of Chemistry, University of Kentucky, Lexington, KY 40506-0055, Fax: 859-323-1069, email@example.com
Since fractional crystallization is the method of choice for purification of chemicals produced on a commercial scale, the possibility that impurities might be included in the recrystallized material is a serious concern. If fractional crystallization fails then at least some of the crystals in the batch may be disordered mixed crystals (or, solid solutions) or ordered stoichiometric compounds (or, cocrystals). Searches of the Cambridge Structural Database show that ordered compounds (other than solvates and racemic compounds) are rare unless there is complete or partial transfer of a proton or electron between the two components. Many of the known compounds that are formed from isomers and near isomers are quasiracemates in which the two components are related by an approximate inversion center. Another set of compounds is formed from relatively rigid molecules that have substituents (like hydroxyl groups) that are expected to form good hydrogen bonds.
Applications of the Cambridge Structural Database to molecular inorganic chemistry
A. G. Orpen, School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom, Fax: +44-117-929-0376, firstname.lastname@example.org
Applications of the data in the Cambridge Structural Database (CSD) to molecular inorganic chemistry are described. Various classes of application are identified, including the derivation of typical molecular dimensions and their variability and transferability; the development of methods allowing construction of knowledge bases of molecular geometry for transition metal complexes; the derivation and testing of theories of molecular structure and bonding; the identification of reaction paths and related conformational analysis based on the structure correlation hypothesis; the identification of common and presumably energetically favourable intermolecular interactions and their applications in supramolecular chemistry and synthetic crystallography. In many of these areas the availability of plentiful structural data from the CSD is set against the emergence of high quality computational data on geometry and energy of inorganic complexes.
Materials informatics: Knowledge acquisition for materials design
John R. Rodgers, Toth Information Systems, Inc, 2045 Quincy Avenue, Ottawa, ON K1J6B2, Canada, John.Rodgers@TothCanada.com
Combinatorial materials science is producing enormous amounts of experimental data that requires storage and analysis. Given the vast, available and existing resources of structure and property data it is possible to extract trends on the structure of materials and their properties. Using these results it is possible to guide combinatorial experiments by pointing the experimentalist to regions of chemical space where there is a high probability of finding the material of interest. This informatics approach, coupled with ab initio quantum mechanical software, provide many of the tools needed to guide combinatorial materials science experiments. This talk will provide an overview the CRYSTMET database for intermetallic compounds, give examples of the various computational and informatics methods for physical property calculations and illustrate the uses of correlation methods to populate property space.
First principles calculated databases for the prediction of intermetallic structure
Gerbrand Ceder1, Stefano Curtarolo1, Dane Morgan1, and John R. Rodgers2. (1) Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave Rm 13-5056, Cambridge, MA 02139, Fax: 617 258 6534, email@example.com, (2) Toth Information Systems, Inc
The prediction of structure is a key problem in computational materials science that forms the platform on which rational materials design can be performed. Traditionally, empirical rules have been extracted by observing trends in large amounts of experimental data. On the other hand, computational quantum mechanics is highly accurate in reproducing structural energy differences, but suffers from the difficulty that a global optimization can not really be performed in the physical space of atomic coordinates. In a departure from previous computational approaches, we have merged the ideas of empirical structure prediction methods, whereby historical knowledge is used to extract rules which can then be applied to new systems, with the predictive power of high-throughput quantum mechanical calculations. By data mining more than 10,000 first principles energy calculations in more than 60 alloys, we show that the energies of different crystal structures are strongly correlated between different chemical systems, and demonstrate how this correlation can be used to boost phase stability investigation of new systems. This approach leads to a better and more quantifiable extraction of information from ab-initio calculations, and ultimately to a more efficient microscopic description.
Chemical information integration: A changing perspective
Mitchell A. Miller, Manish Sud, and Darryl Leon, LION Bioscience Inc, 9880 Campus Point Drive, San Diego, CA 92121, firstname.lastname@example.org - SLIDES
The idea of integrating chemical information for pharmaceutical research has been talked about for more than a decade and the nature of the problem keeps changing. At one point, integration was simply putting chemical structures and biological assay data on the same screen. Nowadays, the problem is more complex. In order to design compounds with favorable potency, selectivity and specificity characteristics, one must take into account the chemical structure and both its predictive and experimental property information, plus information about biological assays, plus information about the targets used in the assays, plus available information related to protein target and compound family relationships. As the real-world situation grows more complex, the information systems that support it must also increase in complexity – but not at the sacrifice of usability. This talk examines trends in chemical information integration in terms of the applications built to support drug discovery scientists.
A coherent view of disparate data
Paul J. Kowalczyk, Computational Chemistry, Pfizer Global Research & Development, MS 8200-36, Eastern Point Road, Groton, CT 06340, Fax: 860-715-3149, email@example.com
Any number of decisions influence advancing a particular chemical series in a drug discovery setting, e.g., potency, selectivity, calculated and measured molecular properties. Rarely is the decision unequivocal - the most potent compound may not be the most selective, nor might it have ideal molecular properties. Decisions are a matter of compromise. We present a method of data visualization that allows one to view disparate data from multiple sources in one unified view. One is able to compare and contrast the profiles of series of compounds interactively. This method of series based data visualization is demonstrated with data available from the Genomics and Bioinformatics Group at the National Cancer Institute.
Integration in the 21st Century Enterprise
Thomas Blackadar1, Keith T. Taylor1, Timothy Shay2, and Phil McHale3. (1) Marketing, MDL Information Systems, Inc, 14600 Catalina Street, San Leandro, CA 94577, Fax: 510 895 4738, T.Blackadar@mdl.com, (2) Eastern Sales, MDL Information Systems, Inc, (3) Corporate Communications and Scientific Affairs, MDL Information Systems, Inc - SLIDES
Discovery organizations are challenging modern enterprise computing systems to provide better integration of disparate discovery data. The successful discovery informatics system must take advantage of the breadth of horizontal integration platforms and the specialized scientific services of vertical-market systems. No one software platform can deliver the entire solution. The challenge is therefore to adopt a number of strategic platforms based on open standards, and then to create linkages that deliver the scientists' needs. Depending on the level of integration required, three different strategies can be adopted: compatibility, system linkage, or technology adoption. Case studies will be presented to illustrate situations where each of these techniques can be used to best advantage.
Integrated high throughput workflows: Value and build vs. buy analyses
Peter E. Cohan, Discovery Tools, Symyx Technologies, Inc, 2163 East Arques, Sunnyvale, CA 94083, Fax: 408-773-4067, PCohan@Symyx.com - SLIDES
High throughput methods have now been successfully applied to the exploration of the physical and chemical properties of materials across a range of fields and applications, including drugs, polymers, chemicals and electronic materials. We will examine several such workflows and discuss the economics associated with implementing and running high throughput programs. What investment is required for and what return can be expected? The pharmaceutical industry offers an excellent basis for a model for materials science. We will examine the critical factors for measuring and managing expectations, resources, cultural change and gaining successful outcomes, based on our experiences at Symyx.
Informatics integration: The range of challenges within a global pharmaceutical company illustrated with specific project examples
Richard Lawson1, Bryan Takasaki1, and Bryn Roberts2. (1) Lead Informatics, Enabling Science and Technology, AstraZeneca R & D Boston, Inc, 35 Gatehouse Drive, Waltham, MA 02451, Fax: (781) 839-4590, firstname.lastname@example.org, (2) AstraZeneca - SLIDES
Integrating information and systems within a global pharmaceutical company is essential, but also challenging. Some of the technical challenges receive a great deal of attention: software protocols and standards (XML, SOAP, UDDI, etc.), database approaches (warehouses vs. federations), etc. In addition to the technology challenges, though, there are significant challenges having to do with differences across scientific domains (chemistry, pharmacology, etc.) and across organizational boundaries (IS, Informatics, Computational Chemistry, etc.). These challenges will be described both in general and in the context of specific projects.
Integration of chemical and biological data in discovery informatics
David S. Hartsough, Informatics and Modeling, ArQule, Inc, 19 Presidential Way, Woburn, MA 08101, email@example.com, and Daniel A Gschwend, Research Informatics, ArQule Inc
ArQule is making the transition from a chemistry services based company to a drug discovery organization. Although the information needs of these two businesses are similar in many respects, the drug discovery environment places heightened demands on accessibility to additional information including analytical data, biological data and discovery program related information. In addition to these increased data storage requirements, intense demands are placed on the ability to integrate and access this information in context. This presentation will describe efforts we have undertaken to enable this transition from an informatics perspective and the integration required to enable access to discovery information within project teams.
The need for scientific data annotation
Herschel J.R. Weintraub, IBM Life Sciences, Peoria, AZ 85383, Fax: (928) 438-4295, firstname.lastname@example.org - SLIDES
In the pharmaceutical and biotech industries today, the research and design of new drugs are carried out by multi-disciplinary teams that must analyze and interpret complex information from sources that include high-throughput screening experiments, clinical trials, patent information, and the scientific literature. Tools to aid in the data integration are therefore highly important, but the need extends beyond an ability to view all the data from a single source (be it federated or warehoused). Another challenge in this environment is that since the data span multiple domains, there is likely to be some level of misinterpretation. We have looked at the need for enhancing the data integration capability by capturing knowledge about the origins and history of the data, what analyses were performed, what correlations were identified, how decisions are made, and what the outcomes are. In this context, we will describe the use of a prototype annotation system that is self-describing, extensible and general, and supports both structured and unstructured annotation content. In summary, while explicit knowledge is currently well managed through relational databases, file systems and document management systems, our prototype demonstrates a unique tacit knowledge management framework for insights.
ANIML: Analytical information markup language for spectroscopy and chromatography data
Gary W. Kramer, Analytical Chemistry Division, National Institute of Standards and Technology, 100 Bureau Dr, MS8394, Gaithersburg, MD 20899-8394, email@example.com
SpectroML, a markup language for uv-vis spectroscopy data, has been developed as a "web-aware" mechanism for instrument-to-instrument, instrument-to-application, and application-to-application data interchange and archiving. As we were creating SpectroML, Thermo Galactic produced the Generalized Analytical Markup Language (GAML) as a general, but spartan, mechanism for representing many types of analytical instrument data. Given the complexity and breadth of today's analytical instrument data, neither SpectroML nor GAML offers a complete solution to the data markup needs of the analytical community. Accordingly, ASTM Committee E13 on Molecular Spectroscopy and Chromatography has established a subcommittee E13.15 on Analytical Data Management to develop the Analytical Information Markup Language (ANIML) that is based on notions from both SpectroML and GAML and borrows heavily from older interchange standards such as IUPAC's JCAMP-DX and ASTM's ANDI, from existing data dictionaries, and from other relevant markup language efforts.
CatML: A catalyst markup language
François Gilardoni, Predictive Technologies, Avantium Technologies B.V, Zekeringstraat 29, 1014 BV, Amsterdam, Netherlands, Fax: +31-20-586-8085, Francois.Gilardoni@avantium.nl, and Alexei Yakovlev, Laboratory of Inorganic Chemistry and Catalysis, Eindhoven University of Technology
CatML (Catalyst Markup Language) is an approach for managing catalyst information using recently developed tools for XML and Java. CatML relies on STMML and aims to ease system integration services to permit the rapid deployment of flexible and cost-effective solutions within an organization. The central means of adding semantic information is through dictionaries well suited to deal with synthesis recipes, characterization and performance of materials in catalysis. The structured semantic of CatML provides a dependable format to control applications and enable automated processing. Seamless portability of data between heterogeneous environments also rationalizes the development of data mining techniques by combining data standardization with new computational techniques. Collaborations with international bodies such as IUPAC and TopCombi, with catalyst suppliers and with information technology companies intend to ensure the development of a mature, comprehensive and usable XML standard for catalysts. CatML will be open-sourced to make it widely accessible.
Open standards for chemical information - The IUPAC chemical identifier and data dictionary projects
Stephen E. Stein, Stephen R. Heller, and Dmitrii V. Tchekhovskoi, Physical and Chemical Properties Division, NIST, Gaithersburg, MD 20899, firstname.lastname@example.org
IUPAC has long been involved in the development of systematic and standard procedures for naming chemical substances on the basis of their structure. The resulting rules of nomenclature, while covering almost all compounds, were designed for text-based media. IUPAC is now developing a means of representing chemical substances in a format more suitable for digital processing, involving the computer processing of chemical structural information (connection tables). This is being implemented in the IUPAC Chemical Identifier project, details of which will be discussed in this presentation. Progress in a related area, the translation of text-based IUPAC data and definitions to a well-structured data dictionary (extensible markup language, XML), will also be discussed.
Paper Withdrawn - Setting standards in a changing industry
Kirk C. Schwall, Manager, Authority Database Operations, Editorial Operations, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, email@example.com, Randall L. Cain, Manager, Information Technology, Chemical Abstracts Service, Jeffrey M. Wilson, Editorial Operations, Chemical Abstracts Service, and Steven W. Layten, Information Technology, Chemical Abstracts Service
Since its inception, CAS has been involved with setting industry standards. In particular, in order to identify all of the chemical substances reported in the literature, CAS had to develop a set of rules for depicting chemical structures identified in the wealth of journal and patent literature available. These rules have changed and continue to evolve, in keeping with the growth of chemical sciences. Using these rules, CAS scientists can ensure a unique substance representation in the CAS Registry File. This set of rules helped to shape the CAS chemical structure exchange format known as CXF. Today, CXF is used in CAS products and services and allows users of CAS information to query the vast wealth of the CAS Registry File via a chemical structure. The development of CXF, its use, and future use will be discussed.
ThermoML-new approach for thermodynamic data communications
Michael Frenkel, Robert D. Chirico, Vladimir V. Diky, and Qian Dong, Thermodynamics Research Center, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328, Fax: 303-497-5044, firstname.lastname@example.org
ThermoML is an XML-based approach for storage and exchange of experimental and critically evaluated thermophysical and thermochemical property data. Basic principles, scope, and description of all structural elements of ThermoML will be provided. ThermoML covers essentially all experimentally determined thermodynamic and transport property data (more than 120 properties) for pure compounds, multicomponent mixtures, and chemical reactions (including change-of-state and equilibrium). The role of ThermoML in global data submission and dissemination will be discussed with particular emphasis on the new cooperation in data processing between the Journal of Chemical and Engineering Data and the Thermodynamics Research Center (TRC) at the National Institute of Standards and Technology.
Transitioning to a structure based identification system
Peter J. Linstrom, and Dmitrii V. Tchekhovskoi, Physical and Chemical Properties Division, NIST, Building 221, Room A111, 100 Bureau Drive, Stop 8380, Gaithersburg, MD 20899-0830, Fax: 301-896-4020
The IUPAC Chemical Identifier (IChI) will provide a standardized way to identify chemical species based on their structure. Use of IChI's to index an existing database which is also indexed by non-structural identifiers may pose some problems. For example, a one-to-one mapping of structural identifiers to existing identifiers may not exist. Use of structural identifiers may also expose weaknesses in existing chemical identifiers including inadequate description of stereochemistry and multiple identifiers for the same compound.
This talk will discuss initial results from efforts at NIST to transition a chemical identification database from numeric to structural identifiers. This database was originally indexed based on numbers supplied by the database developers and third parties. The goal of this effort was to add support for structure-based identifiers while preserving compatibility with legacy identifiers and software systems. Problems encountered in this effort and their solutions will be discussed.
VERDI: An extensible cheminformatics system
W. Patrick Walters, Vertex Pharmaceuticals Incorporated, 130 Waverly Street, Cambridge, MA 02139-4242, Fax: 671-444-6688, email@example.com
As part of an informatics infrastructure which unites chemical, biological and intellectual property information, we have created VERDI - The Vertex Research Database Interface. VERDI is a cheminformatics system which provides an intuitive, user-friendly means of retrieving and analyzing chemical and biological data. The software employs a multi-tier client-server architecture which dramatically simplifies the integration of multiple databases with in-house and third party analysis components.
Structure alerts via Pipeline Pilot
Paul J. Kowalczyk, Computational Chemistry, Pfizer Global Research & Development, MS 8200-36, Eastern Point Road, Groton, CT 06340, Fax: 860-715-3149, firstname.lastname@example.org
A Pipeline Pilot workflow is presented to study relationships between structure and biological activity. Substructures present in a set of active compounds are compared and contrasted to substructures present in a set of inactive compounds. Substructures found more frequently in the set of actives may be used to define a set of target-specific 'keys,' useful for developing topological QSAR models. This analysis follows in the spirit of Klopman's biophores and biophobes, Blankley's Stigmata program and Cosgrove's SLASH program. When the target is a deleterious endpoint (e.g., P450 inhibition), substructures found more frequently in the set of actives (i.e., inhibitors) may be used to define structure alerts. Using this Pipeline Pilot workflow, we demonstrate how one might identify P450 structure alerts, based on data available in the Genetest database.
Global integration of pre-clinical chemistry and biology data: Challenges and benefits
Jayne Cartmell, and Randal Chen, Abbott Laboratories, 100 Abbott Park Road, Building AP6B-1, Abbott Park, IL 60064, Jayne.Cartmell@abbott.com - SLIDES
A major challenge for Discovery scientists in the pharmaceutical industry is to rapidly access current, accurate compound information. Abbott has integrated pre-clinical biology and chemistry data into the Therapeutic Area Database (TDB), from which scientists can readily search and retrieve compound and biological assay data using ISIS/Base and ISIS for Excel. Rollout of TDB to therapeutic projects at Abbott has consisted of customization of the Assay Explorer application to meet the scientists’ requirements for biology data entry, analysis, manipulation and viewing. Abbott’s Discovery scientists can now readily generate automatic SARs based on in vitro, in vivo and ADME data generated at three global sites. The deployment process and value for our scientists will be discussed.
Data and application integration through data pipelining
Mathew Hahn, Robert D Brown, and J R Tozer, SciTegic, Inc, 9665 Chesapeak Dr. #401, San Diego, CA 92123, Fax: 858-279-8804, email@example.com, firstname.lastname@example.org
Drug discovery is generating a vast amount of disparate data that must be captured and organized before it can be successfully exploited. At the same time the software industry is producing a large number of disparate applications to manage and mine the data. Data pipelining provides a new paradigm for integrating both the data and the various applications that act on it. Data pipelining provides a mechanism to federate data that can be easily modified as new or changed data sources become available. The federated data can then be manipulated on the fly or uploaded into a data warehouse (with pipelining providing the ETL capability). The method inherently captures best practice workflows making data and application integration solutions easy to maintain, share and document. This paper will discuss strategies for applying data pipelining to data and application integration projects. Data pipelining also enables workflows to be implemented that make novel joins between data from different disciplines. We will show examples that generate knowledge based on joining data flows from genomic and small molecule sources.
Case study in data integration: The whole is greater than the sum of the parts
Kirk Schwall, Jan Williams, and Kurt Zielenbach, Manager, Authority Database Operations, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: (614) 447-5471, email@example.com, firstname.lastname@example.org - SLIDES
Delivering important scientific information to support the diverse needs of discovery requires a unique fusion of intellectually synthesized content with a powerful delivery technology. Vendors in the information industry are all working hard to aggregate and integrate various content and technology under the wing of a single interface. CAS exemplifies a unique approach, beginning with an understanding of complex and large content aggregations from many different scientific disciplines and culminating in sophisticated information retrieval tools that are remarkably easy to use. The CAS phenomenon has truly established an integrated digital research environment, freeing the user to focus on the information retrieved while unhampered by the dictates of the software or databases.
The red pill
Matthew Stahl1, Geoff Skillman1, Roger Sayle2, and Robert Tolbert2. (1) OpenEye Software, 3600 Cerrillos Road, Suite 1107, Santa Fe, NM 87507, email@example.com, (2) OpenEye Scientific Software - SLIDES
Representing, storing and intercoverting chemical information in an accurate manner continually presents interesting challenges. A cheminformatics toolkit, OEChem, was designed to provide lossless data handling, facile data association, and seamless interoperability with other chemical software. Provided as an extension to an interpreted language, OEChem is extremely powerful and easy to use. A number of challenging cheminformatics issues and their solutions using OEChem will be presented.
Collaboratory for multi-scale chemical science
Theresa L. Windus, Molecular Science Software Group, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, MSIN: K1-96, Richland, WA 99352, Fax: 509-375-6631, firstname.lastname@example.org
The Collaboratory for Multi-scale Chemical Science (CMCS) is a DOE sponsored environment for enabling chemical information to be communicated, translated and annotated across several chemical scales. Enabling a dynamic environment in which to perform new informatics based manipulations is the ultimate goal of this project. The initial scales are the molecular (computational, ab initio data), thermochemical, kinetic, kinetic mechanism, and the numerical simulation scales (including computational and experimental data). This talk will present the data involved, the formats used to describe this data, the pedigree information associated with the data, and the collaboratory infrastructure and portal that enable researchers to access, annotate and manipulate the data. The chemistry communities piloting use of the CMCS will also be discussed.
ChemIDplus: A free, web-based portal to a variety of compound-based information
Mitchell A. Miller1, George F. Hazard Jr.2, Vera W. Hudson2, Christopher Hilt3, Jenny Fang4, David Mayer3, and Larry Callahan5. (1) LION bioscience, Inc, 955 Ridge Hill Lane, Suite 30, Midvale, UT 84047, Fax: 801 365 3949, email@example.com, (2) Division of Specialized Information Services, National Library of Medicine, (3) Altum, (4) Specialized Information Services, National Library of Medicine, (5) Cygnus Corporation - SLIDES
The ChemIDplus database was developed by the Specialized Information Services Division of the National Library of Medicine. It is the web successor to the previous Chemline and ChemID chemical dictionaries. ChemIDplus has been on-line for about 5 years at http://chem.sis.nlm.nih.gov/chemidplus/ providing researchers with access to information about pharmaceutical, industrial and environmental compounds. It covers over 360,000 substances, including more than 160,000 structures. Records may be retrieved by name, structure (including substructure and similarity), CAS Registry Number, molecular formula, and usage category. A new chemical spelling feature checks all failed name searches for possible correct matches in the database. Once a compound of interest is found, a searcher can browse not only the information in NLM biomedical literature and data resources but also related information at other on-line sources, such as EPA, NIOSH, ATSDR, NIST, and WHO/IARC. The system has recently been redesigned to: 1) take advantage of up-to-date technologies such as Java Server Pages and Oracle data cartridges 2) present a more streamlined user interface and 3) include additional compound data.
WebReactions for fast reaction searching
James B. Hendrickson, Department of Chemistry, Brandeis University, Waltham, MA 02454-9110, Fax: 781-736-2516, Hendrickson@Brandeis.edu, and Thomas Sander, Innovation Center, Actelion, Ltd
WebReactions is an easy, very intuitive program for searching reactions. It is based on a generalization of chemistry to codify reactions by the net bond changes at the reaction center. The program groups reaction entries together by their actual reaction change so as not to search through starting or product structures. This format delivers query responses instantly, much faster than other reaction searching programs. Currently available with 400,000 reaction entries (1975-1991), it can also be used to reformat other reaction databases. The program is available on the internet at WebReactions.net and will soon be accessible to all browsers.
Topology–based reaction classification: An important tool for the effective management of reaction information
Guenter Grethe1, Peter Loew2, Hans Kraut2, Heinz Saller2, and Heinz Matuszczyk2. (1) Scientific Affairs, MDL Information Systems, 14600 Catalina St., San Leandro, CA 94577, Fax: (510)614-3616, firstname.lastname@example.org, (2) InfoChem GmbH
Over the last few years the amount of reaction information available electronically inhouse or online from large databases has increased dramatically. This large amount of information becomes increasingly difficult to manage by the enduser chemists. To overcome this difficulty, database contents must be better organized and indexed to reduce the efforts of users to obtain relevant information and to minimize the amount of redundant information. Based on InfoChem’s mapping algorithm of organic reactions a classification program was developed that increases the efficiency of reaction information retrieval, facilitates query formulation for the enduser, and serves as a link between structure-oriented reaction information originating from different sources.
The program CLASSIFY evaluates the topology of the immediate reaction center and its environment and assigns three hash-coded numbers to each reaction, where the numbers represent a different level of expansion from the reaction center. These data have been used effectively in post-search management for clustering large hitlists according to their reaction type, as queries to retrieve reactions of the same type, and to link similar reactions from different sources, such as databases and major reference works. Furthermore, the codes generated for individual transformations can be used to assign keywords familiar to chemists to these reactions and generate a hierarchical thesaurus to provide an alternative for searching reaction databases.
In the presentation we will discuss and illustrate the underlying principles of the mapping algorithm and the generation and distribution of the classification codes.
A new generation of reaction indexing and searching methodologies
Lingran Chen1, James G Nourse2, Bradley D. Christie3, Burton A Leland2, David L. Grier4, and Keith T. Taylor5. (1) R&D, MDL Information Systems, Inc, 14600 Catalina Street, San Leandro, CA 94577, L.Chen@mdl.com, (2) R&D, MDL Information Systems, (3) R&D, MDL Information Systems Inc, (4) R&D, MDL, (5) Product Marketing, MDL Information Systems Inc
Chemical reaction databases are essential resources for modern drug discovery. With the tremendous increase of data in reaction databases in recent years, more accurate and faster retrieval of desired reactions from databases have become critical requirements for the modernization of reaction retrieval systems. Most reaction database search systems rely on the combination of a Reaction Substructure Search (RSS) algorithm and molecule/reaction keys for performing routine RSS tasks. The authors reported an example of such an RSS algorithm recently.
In this presentation, we will review the various RSS algorithms and reaction indexing methods developed at MDL. Then, a new generation of reaction indexing and searching methods based on a reaction hyperstructure concept will be described. This new technology shows a significant improvement of searching performance.
Uses of empirical reaction data in library planning and development
David S. Hartsough, Informatics and Modeling, ArQule, Inc, 19 Presidential Way, Woburn, MA 08101, email@example.com, and Andrew Smellie, Informatics and Modeling, ArQule Inc
Library design and synthetic planning are often carried out prior to the gathering of any reagents or initial experiments. If this planning is to be useful it must be based upon real experience under available laboratory conditions. Although numerous reference systems and compendia are available to predict synthetic outcomes, such systems offer little insight into the available reactive possibilities and probability of reaction success in the chemist's own environment. This paper will describe efforts we have undertaken to address these issues based upon the available knowledgebase of chemical reaction outcomes we have accumulated. Specific areas to be discussed will include the reactivity-based similarity of chemical reagents and potential schemes based on these reagent similarities that can be used to cluster reagents.
Searching and registration of multi-step reaction schemes
Keith T. Taylor, Product Marketing, MDL Information Systems Inc, 14600 Catalina Street, San Leandro, CA 94577, Fax: 510-614-3651, firstname.lastname@example.org, and Barry Peacock, Consulting, MDL Information Systems Inc
Reaction transformation databases normally contain single step reactions, for example: A -> B, and they are queried using single-step transformation queries. In many cases, however, the transformation may not be present as a single-step, for example, A -> X -> Y -> B. In this case, there is no entry corresponding to the one step transformation: A -> B, and even thought he database contains information about how the transformation can be achieved, no hits will be produced. Early systems partially addressed this problem by registering the overall transformation, provided the information was derived from a single article. Intermediate steps were not registered. Furthermore multi-step transformations where the individual steps derive from different articles were not identified.
An approach to the registration and retrieval of multi-step reaction schemes based on MDL(r) Relational Chemistry Server will be presented. The approach covers both explicit and implicit reaction schemes.
Classification and computer representation of enzyme reactions. Progress towards the development of MACiE
Gemma L Holliday1, Gail J Bartlett2, Peter Murray-Rust1, Janet M Thornton2, and John BO Mitchell1. (1) Unilever Centre for Molecular Informatics, University of Cambridge, University Chemical Laboratory, Lensfield Road, CB2 1EW Cambridge, United Kingdom, Fax: +44-1223-763076, email@example.com, (2) EMBL-EBI
We have developed an in-house database of well-characterised enzymatic reactions where the enzymes have crystal structures in the PDB. We demonstrate progress towards a fully searchable database for enzyme reaction mechanisms and other relevant enzymatic data. It has become clear, in this time of increasing knowledge about the structure and mechanism of enzymes, that the current classification system has some limitations. Whilst it does everything it was ever intended to do, we feel that there is a need for an alternative and complementary system which does more. Our database holds the reaction mechanism for individual enzymes; this includes the overall reaction and the multiple steps that may be involved. Data are currently being held in ISIS/Base. We are also developing the database in CMLReact, an XML application, and we aim to maintain the database in both environments.
Taking reaction searches beyond substructure queries: Integration with enhanced data sources
Matthew A. Kellett, Chemistry Editorial, Thomson Scientific, 3501 Market Street, Philadelphia, PA 19104, Fax: 215-386-6362, firstname.lastname@example.org
Chemistry reaction databases have provided valuable information for the research community for many years. Many of the queries employ simple reaction substructure searching to retrieve summary lists of candidate reactions with reference to or direct links to a primary literature source. While this method provides effective results, the availability of better integration to related information allows much more comprehensive searching and retrieval to support the chemistry researcher’s needs. Direct access to citation information allows rapid navigation to the prior art, as well as new directions for a particular methodology. Enhanced keywording and condition information related to catalyst types and experimental techniques provides the searcher the opportunity to include relevant procedure details in addition to structural data. With the increased integration and more robust data sources, the value of reaction databases is affirmed and their use will continue to be an important part of the overall chemical literature retrieval process.
Reaction information discovery using CAS' SciFinder and SciFinder Scholar
Kathryn L. Brannon, Roger J. Schenck, and Linda S. Toler, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202
Chemical reactions are an integral component of scientific information necessary for scientists dealing with substances in many different fields, such as drug discovery, materials research, and process chemistry. Today’s synthetic chemists work in highly competitive research environments and industries with staggering increases in information available. Efficient and effective access to reaction and substance information is imperative for the research scientist to function. With SciFinder, scientists have over 6 million reactions from almost a century of synthetic information found in journals and patents. Using powerful new tools found in CAS’ SciFinder and SciFinder Scholar the scientist can easily discover potential new pathways. This presentation will focus on these tools and techniques for solving synthetic problems.
Bias in blocking publications and ways to expose it
Lev Zlatkevich, Lume, Inc, 9200 Bustleton Avenue, Suite 2405, Philadelphia, PA 19115, Fax: 215-464-3584, email@example.com - SLIDES
Bias by reviewers and editors is more frequent than it is usually assumed but until recently means by which it could be exposed were limited. The situation has changed with the creation of preprint servers devoted to various sciences. In particular, the site dedicated to chemistry, ChemWeb.com, accepts any document about chemistry. Once the document is placed on the server, readers are able to comment on it, a process that can be described as a nontraditional form of peer review. The author of a rejected paper, who thinks his work was turned down unfairly, may present it along with the reviewer's comments and the rebuttal and let others be the judge. The case in point is the rejection of a series of my papers dedicated to various aspects of oxidation in organic materials. The papers were sent to the reviewers who were well aware of what was expected from them. Most of the reviewers' remarks were nothing else but personal attacks. In a few cases when some specific questions were raised, I was not given the opportunity to respond. Even without scientifically sound objections, but knowing that their suggestion will fall on a receptive ear, reviewers had no hesitation recommending rejection. The vicious circle was complete: the authors of phony arguments were protected by anonymity and, at the same time, the editor could claim he had no other choice but to follow (regretfully, of course) reviewers' recommendations. The papers in question can be found as chemistry preprints on ChemWeb.com.
Model selection strategy and uncertainty analysis for thermodynamic properties of organic compounds
Xinjian Yan, Qian Dong, and Michael Frenkel, Thermodynamics Research Center, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305, firstname.lastname@example.org
Thermodynamic property data are essential for science and industrial development. While the number of structurally elucidated compounds is much over 10 millions, the compounds with experimentally determined thermodynamic data are very limited. Therefore, the models for predicting thermodynamic properties are imperative for scientists and engineers to obtain necessary data; and many models have been developed. However, due to the high complexity and diversity of organic compounds, for a given property there is no model that can be fully superior over others. It is always a challenge for scientists and engineers to select the best model, or to acquire the most accurate predicted value and understand its uncertainty with confidence. In this project, we attempt to develop and evaluate schemes for acquiring most reliable estimations of thermodynamic property and uncertainty data, based on the information of similarity and complexity of organic compounds, as well as the analysis of weight factors. As examples, schemes for acquiring critical properties of organic compounds are examined and discussed.
XML for chemical information: Educational needs and examples from a student response analysis system
Daniel C. Tofan, Department of Chemistry, Stony Brook University, New York, NY 11794-3400, Fax: 631-632-7960, email@example.com, Troy A. Wolfskill, Center for Excellence in Learning and Teaching, University at Stony Brook, and David Hanson, Department of Chemistry, University at Stony Brook
With the development of National Digital Science Libraries, it is essential that standards be developed for the exchange of scientific information. Such standards should be intuitive, independent of both hardware and software, and support both professional and educational use. As part of the LUCID Project, which is developing an innovative web-based learning and assessment system along with materials for introductory college chemistry, we have developed XML formats for encoding a variety of chemical information. Examples include isotopic symbols, molecular formulas, Lewis structures, chemical reactions, and equations with units. Student responses to assessment questions are converted to these formats for storage, and standard XML parsing utilities are used to convert from XML to Java objects for analysis. These formats will be presented, compared to existing formats, and discussed in the light of their adaptability for both professional and educational use.
Paper Withdrawn - Hit-directed nearest neighbor searching
Veerabahu Shanmugasundaram, and Gerald M. Maggiora, Structural & Computational Chemistry, Pharmacia Corporation, 301 Henrietta Street, Kalamazoo, MI 49007, Fax: (269)833-9183, Veerabahu.Shanmugasundaram@pharmacia.com
Follow-up of initial hits resulting from HTS is crucial if the hits are ultimately to give rise to useful lead compounds. Several approaches may be employed to select compounds from the Research Compound Collection or from commercially available collections for follow-up screening. Similarity searching based upon the similarity of the molecular fragments possessed by the molecules, yields compounds that are similar in structure to the hits. Nearest-neighbor searching of BCUT Chemistry Space identifies compounds that have similar BCUT values and hence similar electrostatic, hydrophobic and hydrogen bonding properties. In contrast to molecular fingerprint based similarity searching that looks for similar scaffolds in molecules, nearest neighbor searching identifies isobiological molecular structures with significantly different molecular scaffolds. Several examples illustrating the application of this methodology will be presented.
Virtual screening using active set dependent optimization of dissimilarity metrics
Miklos Vargyas, Zsuzsanna Szabo, Gyorgy Pirok, and Ferenc Csizmadia, ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary, Fax: 361-453-2659, firstname.lastname@example.org
The efficiency of virtual screening in drug discovery greatly depends on three factors: (1) pharmacophore point perception (2) representation of molecular structures with a descriptor, (3) dissimilarity metric to capture matching patterns in the descriptors. In this presentation methods tackling all three key factors will be discussed.
Pharmacophore point perception relying on the calculation of the protonation state of atoms and the partial charges at a user-defined pH assigns generalized types to atoms. Topological cross-correlation of these generalized atom types provides a compact representation of pharmacophores, however, the flexibility and shape of molecular structures is poorly represented. To overcome this problem, fuzzy smoothing of descriptors is introduced.
Virtual screening calculates the dissimilarity between a pair of descriptors using various metrics. The use of metrics comprising numerous tunable parameters set by an optimization procedure can lead to 250-fold enrichment over random.
Examples and further possible applications will be discussed
How good is GOLD? An update on validation results, new features and current developments
J. Willem. M. Nissink, Jason C. Cole, Simon J. Bowden, and Robin Taylor, Cambridge Crystallographic Data Centre, 12, Union Road, Cambridge CB5 8QD, United Kingdom, Fax: +44 1223 336033, email@example.com
GOLD is an efficient docking program based on a genetic algorithm. The program has recently been extended to allow the user to write his own scoring functions through a dedicated programmer interface. Currently, GOLD features the GOLD  and Chemscore [2,4] scoring functions, both of which have been validated extensively using our new validation set comprising 305 test complexes [3,4]. For a representative set of 224 complexes, optimised GOLD parameters now lead to success rates of around 70% for medium-speed settings.
GOLD offers a range of positional constraints that can be applied during docking: covalent contraints (for covalently-linked ligands), distance constraints (limiting the distance between non-bonded protein and ligand atoms), substructure-based constraints (limiting distances between protein atom and a common ligand substructure for screening purposes), template-based constraints (that bias a ligand's donor and/or acceptor positions, or the general shape of a ligand to a given template in a binding site), and hydrogen-bond constraints (that require that a given protein group forms a hydrogen bond with a screening candidate if possible).
CCDC is currently developing a suite of molecular descriptors to be used in conjunction with the docking program GOLD to facilitate a posteriori analysis of virtual high-throughput screening (vHTS) results. Descriptors will be calculated during a vHTS run, and focus on unfavourable properties in solutions, such as, among others, occlusion of hydrogen bonding groups in the protein and the ligand, bad protein-ligand contacts, amount of exposed and buried hydrophobic surface, and voids at the protein-ligand interface. The user will be able to specify descriptor calculations flexibly through an XML-type script language.
 G. Jones, P. Willett, R.C. Glen, A.R. Leach, R. Taylor Development and Validation of a Genetic Algorithm for Flexible Docking J. Mol. Biol. 1997, 267, 727-748  C.A. Baxter, C.W. Murray, B. Waszkowycz, J. Li, R.A. Sykes, R.G.A. Bone, T.D.J. Perkins, W. Wylie, A new approach to molecular docking and its application to virtual screening of chemical databases, J. Chem. Inf. Comput. Sci. 2000, 40 254-262  J.W.M. Nissink, C. Murray, M. Hartshorn, M.L. Verdonk, J.C. Cole, R. Taylor, A New Test Set for Validating Predictions of Protein-Ligand Interaction, Proteins, 2002, 49 457-471  M.L. Verdonk et al., Proteins, accepted for publication.
Elucidating pharmacophore patterns of drugs that bind to P-glycoprotein
Zheng Hou1, Shikha Varma2, Adrea T Mehl2, and Omoshile O. Clement2. (1) Chugai Biopharmaceuticals, San Diego, CA 92121, Fax: 858-799-5100, firstname.lastname@example.org, (2) 9685 Scranton Rd, Accelrys Inc, San Diego, CA 92121-3752, email@example.com
We report the use of 3D pharmacophore fingerprints and common-chemical feature approach in elucidating binding requirements for drugs that bind to P-glycoprotein (P-gp). A “generalized” model for binding of substrates to P-gp, as well as requirements for binding at the Verapamil binding site of P-gp are described in the study. The two site models are shown to be mutually exclusive. The ‘generalized’ model contained 3 chemical features – a hydrophobic group (H), a hydrogen bond acceptor (A), and a hydrogen bond donor projection point (D-PP). Distance bins between these features are: dH-A=2 Å, dH-D(pp)=12.6 Å, and dA-D(pp)=13.4 Å. The model for binding at the Verapamil site contained six chemical features – two hydrogen bond acceptors (A), two hydrophobic groups (H), one ring aromatic (R) and one positive ionizable (P) feature. Our ‘generalized’ 3-pt pharmacophore model has a 67% prediction rate of correctly differentiating substrates from non-substrates in a test set containing 108 substrates and 76 non-substrates. The Verapamil binding model uniquely match ca. 70% of drugs with high affinity for the Verapamil binding site of P-gp, but does not match any of the 76 non-substrates in the test set. These results are compared to previous reports of pharmacophore models for P-gp substrate identification, and Verapamil binding site model.
Analysis of uncertainty assessments in experimental critical property data
Qian Dong1, Xinjian Yan1, Randolph C. Wilhoit2, Xiangrong Hong1, and Michael Frenkel1. (1) Thermodynamics Research Center, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328, firstname.lastname@example.org, (2) Texas Experimental Engineering Station, Texas A&M University System
The performance of the simulators and predictive methods is limited by uncertainty in the experimental data; therefore, they must be sifted and critically evaluated and the uncertainty information must be clearly communicated among experimentalists, database professionals, model developers, and process engineers. Furthermore, a prime challenge for database professionals is not only to be able to provide the best data but also capable of ascribing uncertainties and the pertinent knowledge, explaining "how good is the best data".
A preliminary study was carried out on critical constants collected in the NIST/TRC Source data system, aiming to review the emphases and preferences of experimentalists for estimating measurement uncertainties; to clarify interrelationship between individually claimed uncertainty estimation and the overall uncertainty of published data; to assess the implementation of uncertainty assignments in the data system, and to elaborate key issues in assigning scientific data uncertainty. The objective of this presentation is to delineate imperative needs for new frameworks to communicate the uncertainty information among academic and industrial communities.
Development and applications of a Hansch substituent constant predictor
Ting-Lan Chiu, and Sung-Sau So, Discovery Chemistry, Hoffmann-La Roche, Inc, 340 Kingsland Street, Nutley, NJ 07110, email@example.com
In an attempt to develop predictive models for Hansch substituent constants for novel compounds, neural network QSPR (Quantitative Structure-Property Relationship) studies were conducted to correlate Hansch substituent constants with two different molecular descriptor sets for hundreds of chemically diverse functional groups. The Hansch substituent constants under study were p, MR, F and R, describing the hydrophobic, steric, and electronic (field and resonance) characteristics of the substituents, respectively. For p and MR, E-state descriptors were used for correlation, while for F and R, the molecular descriptor set based upon the approach of Kvasnicka, Sklenak, and Pospichal (J. Am. Chem. Soc. 1993, 115, 1495-1500) was adopted. Both QSPR models demonstrated good predictivity in the test set. We demonstrate the applications of our Hansch substituent constant predictor in the QSAR studies of E. coli dihydrofolate reductase (DHFR) inhibitors 2,4-diamino-5-(substituted-benzyl) pyrimidines as well as HIV-1 reverse transcriptase (RT) inhibitors 1-[(2-hydroxyethoxy)methyl]-6-(phenylthio) thymine (HEPT) derivatives. Both data sets contain substituents of which the Hansch substituent constants (p, MR, F and R) could not be found in constant tables. We show that our predictor allowed us to obtain predicted p, MR, F and R values for all substituents in both data sets thus enabling the generation of easily interpretable QSAR models of comparable or better predictivity than previous QSAR models. As can be expected, the predictor is going to play an important role in assisting various functional groups in drug research and development in pharmaceutical industry.
Understanding quantitative structure-property relationships (QSPR) through chemical stoichiometry
Ilie Fishtik, and Ravindra Datta, Department of Chemical Engineering, Worcester Polytechnic Institute, 100 Instotute Road, Worcester, MA 01609-2280, firstname.lastname@example.org
An unusual analogy between the quantitative structure – property relationships (QSPR), stoichiometry, chemical thermodynamics and kinetics is presented. Namely, the conventional ordinary least square (OLS) QSPR analysis is modified so as to explicitly minimize the residuals of the species subject to a set of linear relations among the residuals. The ways the linear relations among the residuals are visualized and defined totally resemble the formalism of chemical stoichiometry and, therefore, were called isostructural reactions. It is further proved that the residuals may be uniquely partitioned into a sum of contributions associated with a set of isostructural reactions that have the same properties as the response reactions (RERs) previously deduced by us from chemical thermodynamics and kinetics. This finding is shown to be a useful tool for a deeper understanding of the QSPR. In particular, the isostructural RERs approach may be effectively used to detect the outliers.
Enabling technologies in the real world
Ramesh Durvasula, Scientist, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314 647 9241
The integration of discovery data has moved from a purely technological challenge to a business process challenge. With the implementation of multi-tiered informatics systems and the adoption of technologies such as Web Services, J2EE, data warehousing, etc., discovery organizations are now becoming capable of delivering all available experimental and computational data for a compound to a bench scientist. However, the scientist is now faced with the new problem of wading through the available data to identify correlations and insights that will lead to faster discovery of lead compounds. In this presentation, we will present lessons learned from our experiences in working with our clients to deliver integrated, global decision support systems. Business rules, technologies, change management, and other related topics will be discussed within the context of real world examples of data integration projects, including the informatics system deployed within our own Discovery Research laboratories.
OpenMolGRID, a GRID based system for solving large-scale drug design problems
László Ürge, Ákos Papp, István Bágyi, Géza Ambrus-Aikelin, and Ferenc Darvas, ComGenex Inc, 62 Pf.73, Bem rkp. 33-34, H-1027 Budapest, Hungary, email@example.com
The industrial challenge of drug design is to obtain ‘novel structures’ with ‘favorable targeted properties’. There are different strategies available to achieve these purposes. One of the ways to find ‘novel structures’ is to use parallel chemistry. However the explosion in the number of structures requests large scale computing, therefore some design strategies were simply not applicable in the past. Providing ‘favorable targeted properties’ can be supported with the use of prediction models, but many of them also require high-speed computer resources and improved data integration. GRID technology opens a new chapter in drug design, and OpenMolGRID (Open Computing GRID for Molecular Science and Engineering) is going to be one of the first realizations. The desired system will combine a structure generator engine and a property/activity predictor. Compared to techniques presently available, the generator engine will enumerate structures in an improved manner; it will be able to build molecules with multistep combination of generic fragments and reagents. The implemented two level filtering lets the user to decrease the number of candidate structures based on fragment descriptors as well as on predicted properties of the product structures (calculated by QSAR/QSPR models). The resulted candidate library will be fine-tuned towards the desired activity and/or property (e.g. toxicity). The OpenMolGRID system will also able to build predictive models providing the most popular building methods for both linear and non-linear models. A data warehouse module will collect information from geographically distributed resources, and thousands of molecular descriptors will be calculated and analyzed for each structure to get the best model. The development of OpenMolGRID is funded by the European Commission under the 5th Framework Project (IST-2001-37238).