#231 - Abstracts
ACS National Meeting
March 26-30, 2006
Social software: What, why, and how?
Beth Thomsett-Scott, Reference and Information Services, University of North Texas Libraries, P.O. Box 305190, Denton, TX 76226
PDF - PPT - MP3
The term “social software” has gained popularity in the last several years, although the idea of providing computer-based social interactions can be traced back to the 1940s. Today, social software is used to refer to any form of software that provides for or promotes “social” interaction through the Internet. Early group interactions through an electronic medium have been around since the early beginnings of e-mail, bulletin board services, and chatrooms. Today there are many more options for online group interactions. Some of these social softwares are being used for educational and research purposes, including RRS, IM, and wikis. This talk will provide a brief history of social software and offer an overview of the major softwares used in education and research. Information provided on each software will include how the software is being used and, where possible, some assessment of its functionality.
Weaving the Web 2.0: RSS and the future of chemical/science information
Teri M. Vogel, Science & Engineering Library, University of California, San Diego, 9500 Gilman Drive, #0175E, La Jolla, CA 92093
PDF - PPT - MP3
Though it is still on the “bleeding edge” for most web users, RSS has become one of the Web 2.0 technologies that are impacting how information is delivered, shared and received.
This presentation will: 1) explore the role that RSS is playing in an increasingly complex web landscape of blogs, wikis, search engines, tagging folksonomies, and more recently in the delivery of audio/visual content; 2) examine how libraries, publishers, A&I database producers, and other information providers are utilizing RSS to deliver content to students and researchers in chemistry and other sciences, and if those end-users are taking advantage of these opportunities; and 3) speculate on the effects that RSS-based technologies will have for library and research services, particularly in response to our users' evolving information needs and the new and changing devices they will be using to manage their information.
Innovative methods of course delivery in Chemical Informatics and Chemistry
Brian Maurice Lynch and Lai Im Lancaster. Department of Chemistry, St. Francis Xavier University, Physical Sciences Complex, 1 West Street, Antigonish, NS B2G 2W5, Canada
PDF - PPT - MP3
We will describe and illustrate changes in presentation for our junior level Chemical Informatics course, for the senior honours seminar presentations required of all students in conjunction with their graduation theses, and for some aspects of first year chemistry The changes include: 1. provision of feedback through iPod based audio recording of lectures ["podrecording"], synchronized to slide sequences. Recordings are accessible to course registrants by downloading to student computers or storage devices. 2. conversion of PowerPoint files describing search protocols by printing to "exact image" Pdfs to generate speakable files, conveniently retrievable and playable through Apple iPods. We will also give examples of generic audio/video recording of presentations by visiting speakers, and/or presentations at conferences. We hope to supply examples of recordings from this CINF session subject to speaker[and ACS]consent.
Open access and blogging: How academic research is transforming
Barbara A Greenman, Science Library, University of Colorado at Boulder, 184 UCB, Norlin Library, Boulder, CO 80309-0184
PDF - PPT - MP3
Although not widely embraced in the U.S., open access has become the mode of publishing for many academic authors worldwide, thereby providing free online access to their scientific research. Traditional venues for scholarly communication are undergoing fundamental change driven by two forces in particular: online publishing and blogging. These forces are transforming not only the academic publishing structure but also the configuration and the format of the research article itself. This presentation explores how the new culture of open access, coupled with the increase in blogging by students, faculty, and the general public, is impacting scholarly research.
On the go with CHM 125, ECON 210, PHYS 218, and BIOL 205: Coursecasting at a large research university
Jeremy R Garritano, Mellon Library of Chemistry, Purdue University, 504 W. State St., West Lafayette, IN 47907 and David B. Eisert, Teaching and Learning Technologies, Purdue University, 504 W. State St., West Lafayette, IN 47907
PDF - PPT - MP3
Considered one of the larger and broader coursecasting programs currently in the United States, Purdue University's BoilerCast system offers many challenges and opportunities for faculty, staff, and students on and off campus. The ease of accession of audio course lectures online and their integration with RSS feeds allow students to review lectures before exams, can supplement in-class talks, and even let faculty critique their own lectures. However, a podcasting or coursecasting service is not without its tribulations. For those exploring the possibilities of coursecasting, this paper will discuss the ongoing costs and benefits of a large-scale coursecasting system, lessons learned, and future directions. Reactions from both faculty and students will also be presented, focusing on Chemistry courses involved. The implementation of an audio tour of the Undergraduate Library to be used with circulating Apple iPods will also be discussed.
Blog applications in the classroom and beyond
Randy Reichardt, Science & Technology Library, University of Alberta, 1-26 Cameron, Edmonton, AB T6G 2J8, Canada
PDF - PPT - MP3
Weblogs, or blogs, are websites, which are regularly and frequently updated with new entries, links, documents, multimedia, graphics, and pictures. First appearing as online diaries or journals in the late 1990s, other applications evolved as the usability and robustness of blog software improved, making it easier for anyone to create and use a blog. In science and technology, discipline- and subject-specific blogs began to appear, attracting the interest of students, academics, and practitioners working in areas such as chemistry and engineering.
In 2003, the engineering librarian introduced blogs into chemical, material and mechanical design engineering classes. Students working in groups of four on capstone design projects in these subjects were given the option of using blogs as a project management tool. Instructions on blog creation and utility were written and distributed to interested student groups, who worked with the engineering librarian to create, upload and maintain a blog for each project. Certain Subject-specific databases such as Compendex (Engineering Index) now provide the ability to make use of blogging and RSS functionality, features which were available to the students as well.
The use and application of blogs will be discussed. Included will be a brief review of examples of blogs in chemistry as well as the author's professional blog, covering issues of interest to science and technology librarians. Use of blogs in chemical engineering design classes will be covered, as well as newer applications, such as the “Blog This” and RSS features now available in the engineering database, Compendex.
Wikipedia: Social revolution or information disaster?
Martin A. Walker, Department of Chemistry, SUNY Potsdam, 44 Pierrepont Ave, Potsdam, NY 13676
PDF - PPT - MP3
Wikipedia is an open, collaborative encyclopedia based on the World Wide Web with articles written by enthusiastic volunteers. It is currently the 38th most popular website on the internet (and growing), with over 1% of all web users accessing the site on any given day. A Google search on many topics often gives a Wikipedia article (perhaps from a mirror site) as the main reference source. But is it reliable? Is it destroying our students' interest in using "authentic" peer-reviewed chemical information? Or is it a revolution, delivering a high level of information to the masses? This presentation will give an insider's description of the software and the community that is Wikipedia, and describe the associated strengths and weaknesses. It will show how chemistry-related pages are organized, and also provide some insights into likely future developments in Wikipedia.
A case study: ACS BIOT web seminars
Jonathan L. Coffman, Drug Substance Development, Wyeth BioPharma, One Burtt Road, Andover, MA 01810
PDF - PPT - MP3
The Biochemical Technology Division of ACS has successfully programmed a series of web seminars. Our goal was to expand the number of people able to attend BIOT symposia, to provide educational experience for students in the biochemical sciences, and to provide a forum for new topics to be discussed. Initial programming featured the top presentations from our most recent annual meeting. These web symposia were well received, and indicate the important role web symposia will play in the future of BIOT.
Each web symposium typically had three speakers, each speaking for 20 minutes, allowing for 10 minutes of questions. Our audience has averaged 250 people, which was larger than the largest audience at our BIOT annual meeting. On-line surveys showed that 100% of respondents would return for another web conference. Industrial audience members paid fees to join, allowing academic members to attend with a full scholarship. The fee charged to each industrial site could have funded up to five scholarships. This fee structure will allow nearly unlimited programming in the future.
We will discuss how BIOT set up web symposia: choosing a technology provider, choosing programming material, extending the programming to include new topics, and how we set pricing. Since the cost of doing Web Seminars is low, we anticipate that many for-profit symposia organizations will begin doing web seminars soon. ACS, its divisions, and its publications must use web seminars as a key strategy in supporting the chemical profession, advancing the chemical sciences, and communicating the value of chemistry and chemical engineering to the public.
Chemist-librarian: The best of both worlds
F. Bartow Culp, Mellon Library of Chemistry, Purdue University, 504 West State Street, West Lafayette, IN 47907-2058
PDF - PPT
In the Internet age, isn't the concept of a librarian outmoded? If easy and almost unlimited information access is available to anyone at the click of a mouse button, why should a chemist consider academic librarianship as a career? There are many reasons, including excellent job prospects, a high degree of career satisfaction, plus the chance to be a central player in the current redefinition of how science is being done. In this age of high-entropy information, the unique combination of abilities that we chemist/librarians bring to our jobs gives us not only the power to organize and access chemical information; it can also enhance the value of that information and improve the entire communication process itself. We will present examples of how chemist/librarians are integral participants in the advancement of both of their professions.
Carcinogen, mutagen, teratogen, oh my: How I started a career in chemical information
Mary Talmadge-Grebenar, Information & Knowledge Integration, Bristol-Myers Squibb, Rt. 206 & Province Line Rd., PO Box 4000 J12-01, Princeton, NJ 08543
PDF - PPT
When every reagent bottle has the words carcinogen, mutagen, or teratogen on the label, it makes you rethink your career options. Moving from the medicinal chemistry lab to the world of chemical information was an easy choice. Chemistry and Library Science have many things in common and the skills gained in the laboratory can be translated to the information profession. This talk will cover why I choose this new career and the path that has been followed since the time of that decision.
Designing a postgraduate course on Cheminformatics
Patrick Joseph O'Malley, School of Chemistry, The University of Manchester, North Campus, Sackville Street, Manchester, M60 1QD, United Kingdom
To meet the demands of training people in cheminformatics skills we developed a postgraduate masters degree course in Cheminformatics at The University of Manchester. This was established to fill a perceived need for scientists equipped with the necessary skills in chemical information. Traditional chemistry undergraduate courses did not teach such skills and the course provides an opportunity for fresh undergraduates to learn these skills as well as providing an opportunity for more experienced personnel to retrain in these new skills. This talk will give an outline of our experience in designing such a course and address problems and accompanying solutions that we have learned. Career destinations of graduates will be examined and current trends on the training and need for chemical information specialists in the UK will be presented.
Continuing education for Biology and Life Science librarians in the post-genomic era: You can teach an old dog new tricks
Frederick W Stoss, Science and Engineering Library, University at Buffalo, 228-B Capen Hall, Buffalo, NY 14260-1672
PDF - PPT
The post-Genomic Era began with the completion of the Human Genome in 2003. This achievement was made under the auspices of the Human Genome Project (HGP), a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health. The goals of the HGP included identification of 20,000 to 25,000 genes encoded in human DNA, determination of sequences of chemical base pairs (~30 million) making up human DNA, storing this genomic data in specialized databases and developing and enhancing the databases and other tools for accessing and analyzing this data. In more recent years we have witnessed the emergence and ongoing evolution of intertwining disciplines in the biological, life, chemical, and computational sciences forming a New Biology of genomics, bioinformatics, proteomics, chemical biology, systems biology and other subdisciplines spinning off the branches of molecular and structural biology and genetics. Keeping abreast of the science and technology behind the New Biology is a daunting task. Simultaneously keeping abreast of the new and ever-changing developments in the data and information storage and delivery systems for the New Biology is an example of “information synergism,” begging the question, “How can science librarians and information specialists provide reference services and library instruction for the new and rapidly emerging fields of research and inquiry of the New Biology?” This presentation will discuss a variety of continuing education initiatives, including: the full-suite of education resources available from the National Center for Biotechnology Information's (a program within the National Library of Medicine) continuing education services and tutorials for librarians, information specialists, and researchers to library school bioinformatics, current awareness services, special journal issues, selected reference and book titles, and science education journals and periodicals. Recruiting science students into library, information, and data careers will be briefly discussed.
Career choices in intellectual property
Pamela J. Scott, Legal Division, Pfizer, Inc, Eastern Point Road, MS 8260-1611, Groton, CT 06340
PDF - PPT
Patents and patent information provide many career opportunities in today's marketplace. Career opportunities include government positions as patent agents. The academic and private sectors afford careers in patent law, education, and patent research, and finally careers as independent consultants will be discussed, touching any and all market sectors.
The role of chemists in the FDA drug approval process
M. Scott Furness, Office of Generic Drugs, Food and Drug Administration, 7500 Standish Place, Rockville, MD 20855
PDF - PPT
Working as a chemist at the FDA is probably one of the least understood career paths available to chemists today. This talk will provide a general overview of the FDA drug approval process with an emphasis on the chemist's role in the scientific reviewing divisions as well as their role in the evaluation of current Good Manufacturing Practices (cGMPs). As Regulatory Review Scientists, chemists evaluate the chemical sections of drug applications. This evaluation includes an assessment of the adequacy of the methods, facilities, and controls used for the manufacture of drugs. As Consumer Safety Officers (commonly referred to as investigators within FDA), chemists audit, review, and evaluate the manufacturing processes of products that the FDA regulates by inspecting manufacturing facilities within the United States and abroad as well as work as members of multi-disciplinary teams to assure efficient enforcement of the Food Drug and Cosmetic (FD&C) Act.
Eight hundred words by noon today, plus photos: Science writing for fun and profit
Nancy McGuire, Public Affairs, Office of Naval Research, 875 N. Randolph St., Arlington, VA 22203
PDF - PPT
Your list of journal publications is as long as your arm, but you've always wondered if you could write science stories for a magazine or newspaper. What skills do you already have? What will you need to learn? Will you write as a sideline or make it a full-time career? A lab scientist turned full-time science communicator tells of her mid-life career transition and shares a little of what she learned along the way, with a brief drive-by tour of other career options available to scientists who work with words.
Computational chemistry career opportunities
J. Phillip Bowen, Center for Drug Design, Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, 400 New Science Building, PO Box 26170, Greensboro, NC 27402-6170
Computational chemistry is accepted today as a specialized field of chemical or biochemical research. Computational chemistry methods are used in both industrial and academic environments worldwide to gain detailed insights into chemical and biochemical problems at the molecular level. Computational chemistry methods are used in many different areas of chemistry, ranging from polymer research to pharmaceutical design. This presentation will focus on discussing the many different career pathways available in computational chemistry, and the necessary background and communication skills necessary to be successful.
One view of writing a scientific paper
George M. Whitesides, Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138
This talk will discuss one (probably of many) style(s) useful in writing a scientific paper: that is, the one we use in our research group. It has the advantage that it integrates doing research, writing about research, and managing research; it has the disadvantage that it is very labor intensive.
Review process for the scientific paper: The journal editor's viewpoint
Willis B. Wheeler, Associate Editor, Journal of Agricultural and Food Chemistry, 4938 Hampden Lane, Box 298, Bethesda, MD 20814 and Heijia L Wheeler, Journal of Agricultural and Food Chemistry.
The purpose of the review process is to evaluate a paper's scientific merit, originality, clarity of presentation and importance to the field. The goals of the review are to ensure that the journal publishes only noteworthy papers and to provide authors with guidance so they can improve their manuscripts.
Reviewers are selected in a number of ways. Some journals request that authors suggest potential reviewers. Journals have databases of scientists that can be searched by areas of scientific interest. In addition, editors know potential reviewers from their own experience in the research field.
In terms of ethical considerations, scientists are obligated to review papers. If a scientist does not feel qualified to review, he/she should make that known to the editor and destroy the manuscript. Reviewers should be an objective judge of the paper and should be sensitive to potential conflicts of interest. Papers sent to reviewers must be considered as confidential. Reviewers should explain and support their evaluations.
Reviewers are expected to evaluate the quality of the work, its appropriateness for the journal, the technical quality, the clarity of presentation, and any ethical issues. Reviews should give specific and substantive evaluation of the strengths and weaknesses of the manuscript.
An editor's perspective on scholarly publishing: What to do, and not to do, as an author
Leonard V. Interrante, Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY 12180
As Editor of a large ACS journal (Chemistry of Materials) and a long-time author/reviewer of scientific papers, the speaker will give his view on scientific publishing in 2006 and beyond. From this perspective, he will attempt to answer some of the questions raised in the outline for this symposium, such as "what do editors want (and not want) in an article?", "how are reviewers selected?", etc. In addition, the enormous growth in submitted papers that Chem. Mater. has experienced in recent years has brought with it a number of problems, including reviewer overload and an increased frequency of violations in the "Ethical Guidelines" established by the editors of the ACS journals [see L.V. Interrante and E. Reichmanis, C&EN, Vol 83(6), p. 4 (2005)]. A major portion of this talk will be devoted to a discussion of these problems, and what we, and other journal editors, are doing to confront them.
Post- peer-review journal production: Transforming a manuscript for publication
Joseph E. Yurvati1, Terri K. Lewandowski2, and Anne C. O'Melia2. (1) Journal Publishing Operations, American Chemical Society, 2540 Olentangy River Rd, Columbus, OH 43210, (2) Journal Production and Manufacturing, American Chemical Society, 2540 Olentangy River Road, Columbus, OH 43054
This paper examines the steps an author manuscript undergoes after it has been accepted enroute to it becoming a published manuscript both on the Web and in print. While technological changes have introduced considerable automation into this phase of the journal publishing process, the basic purpose remains: transform a scientist's research findings into a medium that ensures long-term accessibility to the interested. scientific community. This examination will focus on the critical activities of a publishing operations keyed to ensuring fast and efficient high-quality journal products.
The Web of publishing
Evelyn Jabri and Sarah Tegen. ACS Chemical Biology, American Chemical Society, 1155 16th St NW, Washington, DC 20036
The Web has revolutionized the way we retrieve information and use scientific journals. Formerly, we had shelves of journals with tables of contents to help us select papers, and indices helped us find information later. Today, electronic TOCs and RSS feeds are pushed to the reader; we use online search engines and databases to collect information on topics; we carry information with us on our PDAs and iPods. A sometimes overwhelming amount of content can come our way, often getting lost in the mess on our desktops. So, how can we effectively organize and manage all of it? And what can publishers do to help? Publishers are learning tricks from places like Amazon, Google, Yahoo, and Apple. This talk will detail some of the innovations ACS Chemical Biology is using to help you organize and digest the scientific literature.
On-line mentoring with the WCC
Jacqueline A. Erickson, Sr. Analytical Scientist, GlaxoSmithKline, 1500 Littleton Rd., Parsippany, NJ 07054
No abstract available.
Mentoring in academia
Song Yu, Columbia University Libraries, Columbia University, 454 Chandler, 3010 Broadway, New York, NY 10027
How do academic librarians seek mentorship in their organizations, especially for those specialized in a filed of science, like chemistry? Some libraries and professional organizations have formal or informal mentoring systems. However, one has to be creative and take initiatives to learn and develop in a way that suites his/her own situation
Enhancing the association through mentorship
Leah Solla, Physical Sciences Library, Cornell University, 293 Clark Hall, Ithaca, NY 14850
Professional associations such as the American Chemical Society provide a wide variety of important career benefits and opportunities. In the Division of Chemical Information members can avail themselves of state-of-the art technical programs, resources for day-to-day work, and professional networking and educational opportunities. The best way to make the most of these opportunities is to get involved and work directly with other members and active membership is encouraged and further enhanced through mentoring within the division organizational structure. The CINF Education Committee incorporates several approaches to mentoring into committee procedures to encourage participation of both experienced and new members and build on the knowledge and enthusiasm of all involved.
The role of a mentoring program in the patent information science profession
Valerie A. Vaillancourt and Donna Kaye Wilson. Legal Division, Pfizer, Inc, Kalamazoo, MI 49007
There are many factors which may lead to a change in career path and for many scientists, a new career in information science is a very attractive alternative. But, how does one effectively acquire the skills necessary for this job? External vendors offer in-depth training courses, but this is just not enough. The leadership team of GLIST (Global Legal Information Science Team) at Pfizer believes that mentoring is one way to provide support to our members entering a new job. This presentation will discuss the role of a mentoring program in the patent information science field during job transition, including the advantages and challenges of such a program. Both the mentor and trainee will share their perspectives. And, they will highlight their thoughts regarding the characteristics that contributed to their successful mentoring partnership.
Structure and reaction based evaluation of synthetic accessibility
Johann Gasteiger1, Thomas Seidel1, Krisztina Boda1, Achim Herwig2, and Oliver Sacher2. (1) Computer-Chemie-Centrum, University of Erlangen-Nuremberg, Erlangen, 91052, Germany, (2) Molecular Networks GmbH, 91052 Erlangen, Germany, Erlangen, 91052, Germany
De novo design systems usually generate large numbers of novel structures. Then, it becomes of crucial importance to develop methods that allow one to select those structures that are easily synthesizable. Various criteria can be invoked to estimate the structural complexity of a compound and its synthetic proximity to available starting materials. Furthermore, data mining in reaction databases can point out strategic bonds where a molecule should be cut to obtain simpler fragments whereby the cuts simultaneously correspond to reactions with a broad scope and high yields.
Algorithms and cancer drugs: In silico design of S100B ligands to block p53 binding
John L. Whitlow, Department of Electrical and Computer Engineering, NC State University, 2300 Avent Ferry Road, O2, Raleigh, NC 27606 and Yumin Li, Department of Chemistry, East Carolina University, 300 Science and Technology Building, Greenville, NC 27858.
Cancer is the leading cause of death for persons under the age of 85. Elevated levels of S100B are associated with cancer. This research focused on interactions between S100B and the tumor suppressor protein, p53. S100B disrupts p53's protective function by inhibiting p53's C-terminal regulatory domain phosphorylation. This study designed compounds to block the effects of S100B on p53. Compounds that enhance p53's cellular function may provide potent anticancer therapies.
Accelrys's Cerius2 software was used for de novo drug design. The three dimensional structure of S100B was analyzed to resolve its main interaction sites. Fragment molecules were screened against targets of interaction in the S100B active site. Top fragment molecules were used as scaffolds to design complete ligand molecules. Additionally, public and private molecular libraries were run through docking algorithms to locate existing molecules with high affinities for the S100B active site. ADME and toxicity properties were also investigated.
Closing the loop: From high-throughput screening to synthesis of novel protein displacers
N. Sukumar1, Curt M Breneman1, Steven M. Cramer2, James A. Moore3, Kristin P. Bennett4, Mark J. Embrechts5, Min Li1, Jia Liu2, and Long Han6. (1) Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, 110 8th St., Troy, NY 12180-3590, (2) Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, (3) Department of chemistry, Rennselaer Polytechnic Institute, 110-8th street, Troy, NY 12180, (4) Department of Mathematics, Rensselaer Polytechnic Institute, Amos Eaton Building, 110 8th St, Troy, NY 12180, (5) Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, (6) Decision Science and Engineering Systems, RPI, 110 8th St, Troy, NY 12180
Low-molecular-weight displacers employed in ion-exchange displacement chromatography have shown a great potential for the purification of proteins from complex mixtures. One of the advantages being their ability to carry out selective displacement chromatography in which target proteins can be eluted separately. Identifying efficient displacers, however, is a major challenge for protein displacement chromatography, as it depends not only on the protein mixtures, but also on the chemistry of the stationary phase and the conditions of the mobile phase. The choice of displacers is still mostly driven by trial-and-error and is largely dependent on domain knowledge of an expert. In this work we investigate an efficient procedure to quickly predict novel selective displacers: a small set of known selective displacers are used to train machine learning models (SVM and decision trees) that are then used to identify novel selective displacers from available commercial chemical catalogs and to progressively enrich the models.
DeNovo design tools for the generation of synthetically accessible ligands
A. Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, and Vilmos Valko. School of Chemistry, University of Leeds, Leeds, LS2 9JT, United Kingdom
An efficient de novo design system can generate large numbers of hypothetical structures which have been tailored to bind to a specific receptor. An advantage of the de novo process is that many of these structures will have novel structural motifs. However, a possible disadvantage is that some of the structures might be relatively diffficult to synthesise. A number of different solutions to the synthetic accessibility problem have been developed for use with the SPROUT system for de novo design: a)CAESA - a separate system for assessment of synthetic feasibility b)SynSPROUT - a de novo system which incorporates synthetic feasibility into the de novo construction process c)Complexity analysis which matches the designed structures against substitution patterns of known drug like molecules d)SPROUT LeadOpt which optimises structures to improve binding affinity by application of known chemistry using available starting materials. The relative merits of these different approaches will be discussed.
FlexNovo: Structure-based searching in large fragment spaces
Jörg Degen and Matthias Rarey. Center for Bioinformatics (ZBH), University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
We present a new molecular design program, called FlexNovo, for structure-based searching in large fragment spaces following a sequential growth strategy. The fragment spaces used consist of several thousands of chemical fragments and a corresponding set of rules, which primarily specifies how the fragments can be connected with each other. FlexNovo is based on the FlexX molecular docking software and therefore uses the same chemical models, scoring functions, docking algorithms and pharmacophore models. In addition, several placement geometry, chemical property (drug-likeness) and diversity filter criteria are directly integrated in the build-up process. FlexNovo has been used to design potential inhibitors for four targets of pharmaceutical interest (DHFR, CDK2, COX-2 and Estrogen receptor). The compounds obtained show that FlexNovo is able to generate a diverse set of reasonable molecules with drug-like properties. By comparing these to known inhibitors, similarities with respect to their structures and binding modes are frequently observed.
ThermoML: New IUPAC standard for thermodynamic data storage and exchange
Robert D. Chirico1, Michael Frenkel1, Vladimir V. Diky1, Qian Dong1, Kenneth N Marsh2, John H. Dymond3, William A. Wakeham4, Stephen E. Stein5, Erich Koenigsberger6, and Anthony R. H. Goodwin7. (1) Physical and Chemical Properties Division, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328, (2) Department of Chemical and Process Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand, (3) Chemistry Department, University of Glasgow, Glasgow, G12 8QQ, United Kingdom, (4) School of Engineering Sciences, University of Southampton, Southampton, SO17 1BJ, United Kingdom, (5) Physical and Chemical Properties Division, NIST, Gaithersburg, MD 20899, (6) Division of Science and Engineering, School of Mathematical and Physical Sciences, Murdoch University, Murdoch, WA 6150, Australia, (7) Schlumberger Technology Corporation, 125 Industrial Blvd., Sugar Land, TX 77478
ThermoML is an XML-based emerging IUPAC standard for storage and exchange of experimental, predicted, and critically-evaluated thermophysical and thermochemical property data. The basic principles, scope, and description of the structural elements of ThermoML will be discussed. ThermoML covers essentially all thermodynamic and transport property data for pure compounds, mixtures, and chemical reactions. Representations of uncertainties in ThermoML conform to the Guide to the Expression of Uncertainty in Measurement (GUM). Representation of fitted equations with ThermoML will also be described. The role of ThermoML in global data communication processes will be discussed with emphasis on a collaborative project with major journals (the Journal of Chemical and Engineering Data, The Journal of Chemical Thermodynamics, Fluid Phase Equilibria, Thermochimica Acta, and the International Journal of Thermophysics) for distribution of property data with benefit to authors, journal publishers, and data users. The project model described is readily applicable to other disciplines and data types.
Software infrastructure for ThermoML-based data exchange process: Guided data capture
Vladimir Diky, Thermodynamics Research Center (TRC), National Institute of Standards and Technology (NIST), Mailstop 838.01, 325 Broadway, Boulder, CO 80305
As ThermoML is becoming a standard for thermophysical and thermochemical data exchange, a necessity in publicly available tools for creation and interpretation of ThermoML files is becoming obvious. Guided Data Capture (GDC) is the natural choice as a generating tool. GDC was developed as a data entry tool for public use by a wide range of users. GDC does not require any specialized database knowledge, is compact and compatible with common PC systems. The program provides a sequence of screens guiding a user through the entire data entry process. The form design is based on the major thermodynamic principles that assures a complete and unambiguous definition of each system and property and allows preliminary validation of the information. GDC maintains internal data formats for basically all ThermoML data, so the only feature needed for making it a ThermoML-generating tool was data export in an XML format. XML export has been implemented at the text level because writing XML output is easy at low level when the data structures already exist in the program, and this solution eliminated the dependence on any XML parsing tool and the necessity to include it in the redistribution kit. ThermoML-generating version of GDC has been successfully used at Thermodynamics Research Center (NIST) and is planned to be freely available.
The conundrum of the scientific endeavor: ThermoML - a start
Kenneth N Marsh, Department of Chemical and Process Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
The progress of science and technology is based primarily on measured numeric data with the scientific journals publishing the values as numbers within the text and as tables and/or figures. Those journals are now published electronically but researchers and others needing that data have to retype the data to their required format for further analysis. Till recently only a few systems have been devised to capture numerical data at its source (the author), e.g. genome, protein and crystallographic data. Why? This was because there was no financial gain to the publisher, there was a lack of a standard format, and there was no central distribution site. A successful thermophysical property data gathering system requires: an agreed upon standard format, a mechanism and an incentive for the author to submit the data, and a body to accept, verify and distribute the data freely to the thermodynamic community. ThermoML with TRC/NIST backing provides a solution for thermophysical property data. The role of the Journal of Chemical and Engineering Data in the development of the ThermoML standard will be outlined.
Elsevier and NIST-TRC: How scientific journals are enhanced through a ThermoML linking agreement
Michiel S. Thijssen, Chemistry & Earth and Environmental Sciences, Elsevier B.V, Radarweg 29, Amsterdam, NL-1043 NX, Netherlands
A linking agreement between the Thermodynamics Research Center at NIST and Elsevier since early 2004 provides readers of The Journal of Chemical Thermodynamics (JCT) with a ThermoML link next to articles on ScienceDirect.com. This link connects to the respective ThermoML database record at TRC, where thermophysical and -chemical data related to the article are stored for direct and free downloading into laboratory environments. A reciprocal link connects the ThermoML record to the journal content. From 2005 onwards, authors of Fluid Phase Equilibria and Thermochimica Acta also deposit their data. The NIST-TRC & Elsevier collaboration will be discussed in detail. Statistics show its popularity, as presently 75% of JCT authors submit data. The data is enhanced in 10-20% of the cases by the data-capture process, and subsequently updated before publication in the journal. The collaboration contributes positively to our joint efforts to serve the authors, readers and users of thermal data.
Building ThermoML based bridges between thermophysical property packages and engineering applications: ThermoData Engine
Chris D. Muzny, Physical and Chemical Properties Division, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328
ThermoData Engine (TDE) is a recently released database and software product produced by the Thermodynamics Research Center at the National Institute of Standards and Technology in Boulder, Colorado. TDE is a dynamic data evaluation tool for thermodynamic properties that relies on SOURCE, a comprehensive experimental archival data system that includes rigorous quality evaluation. TDE is useful for any application that requires thermodynamic property information, but it is especially well suited to chemical engineering applications and process simulations. Because of the need to communicate results of data evaluations to other chemical engineering software applications, TDE implements ThermoML as a standardized data communication method. The use of ThermoML for both output and input of data in TDE will be described and examples of the usefulness of this method will be given.
Process informatics model (PrIMe): A customer for ThermoML
Michael Frenklach1, Andrew Packard1, Zoran M. Djurisic1, David M. Golden2, Craig T. Bowman2, William H. Green Jr.3, Gregory J. McRae3, Thomas C. Allison4, Gregory J. Rosasco5, and Michael J. Pilling6. (1) Department of Mechanical Engineering, University of California at Berkeley, Berkeley, CA 94720-1740, (2) Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, (3) Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Bldg. 66, Room 270, Cambridge, MA 02139, (4) Computational Chemistry Group, National Institute of Standards and Technology, 100 Bureau Drive, Stop 8381, Gaithersburg, MD 20899-8381, (5) Physical and Chemical Properties Division, National Institute of Standards and Technology, 100 Bureau Drive, Mail Stop 8380, Physics Building (221) Rm. A107, Gaithersburg, MD 20899-8380, (6) School of Chemistry, University of Leeds, Woodhouse Lane, LS2 9JT Leeds, United Kingdom
Process Informatics is a data-centric approach to developing predictive models for complex chemical reaction systems (http://primekinetics.org). It deals with all aspects of integration of pertinent data of complex systems (industrial processes and natural phenomena) whose complexity originates from chemical reaction networks. The primary goal of process informatics is information gathering, validation, and transformation into a useable form. The latter includes development of predictive (numerical/computer) models with quantified degrees of reliability. The Process Informatics infrastructure has two principal components: a Data Depository and a collection of Tools. The Depository is designed to represent the most currently complete set of knowledge available in a given field. The currently built Tools are of two general kinds, those enabling the collection, transfer, organization, display, curation, and mining of the data, and those enabling processing and analysis of the data along with assembly of the data into models. The handling of thermodynamics will utilize ThermoML
Concepts in receptor optimization: Targeting the peptide RGD
Wei Chen1, Chia en Chang2, and Michael K. Gilson1. (1) Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, (2) Department of Chemistry, University of Maryland, College Park, MD 20742
The present study uses an accurate and theoretically well-founded method of computing binding affinities as the basis for the design of novel receptors targeting the biologically important peptide RGD. This method is found to yield excellent agreement with experimental affinities for a synthetic RGD receptor; and four new receptors constructed in silico by a fragment-based approach are analyzed here. One of the new receptors is predicted to bind as tightly as the existing receptor, despite its lower molecular weight. One is found to provide affinity in the same range as expected for proteins for ligands with the size of RGD. Further analysis of these systems yields insights into the maximization of affinity in the face of losses in configurational entropy and solvation. The present study indicates that more efficient and tighter-binding receptors for RGD can be made, and represents a significant step toward the broader goal of targeted receptor design.
Generating and searching > 10E20 synthetically accessible structures
Richard D. Cramer1, Farhad Soltanshahi2, Robert Jilek3, and Brian Campbell3. (1) Chief Scientific Officer, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, (2) Research, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, (3) Tripos Inc, 1699 South Hanley Road, St. Louis, MO 63144
The limited novelty and feature content of commercially offered reactants and the very rapid similarity/QSAR-based ligand searching capability provided by topomers have prompted the creation of "allchem", a database of 10E7 mutually reactive synthons synthesizable in a few simple steps from commercially available reagents. Its contents are proving especially useful as novel scaffolds within lead discovery libraries and as candidate side chains within lead optimization programs. Virtual library construction via topomer-based searching of such databases seems very attractive to working medicinal chemists.
ROBIA: Computational assessment of synthetic procedures
Jonathan M Goodman and Ingrid M Socorro. Unilever Centre for Molecular Science Informatics, Cambridge University, Department of Chemistry, Lensfield Road, Cambridge, CB2 1EW, United Kingdom
Good drug candidates must be accessible through reasonable synthetic routes, and must not be too susceptible to degradation reactions that would alter or remove their biological activity. The ROBIA (Reaction Outcome By Informatics Analysis) program analyses organic transformations using detailed conformation analysis and molecular modeling approaches in order to generate and to evaluate likely reaction pathways. This can be used both to assess the likely stability of candidate structures and also to examine synthetic pathways towards these molecules.
The design and implementation of IUPAC ionic liquids database
Qian Dong, Physical and Chemical Properties Division, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80305-3328
IUPAC Ionic Liquids Database, ILThermo, was released to the public via internet in December of 2005 to meet the urgent need for critical data in academia and industry. ILThermo was constructed on the basis of NIST SOURCE - an extensive repository system of over 100 thermodynamic, thermochemical, and transport properties for pure compounds and mixtures extracted from world's scientific literature. ILThermo is a prototype for generating special-retrieval-purpose databases from SOURCE for different applications. First, ionic liquids data are captured and stored through the ThermoML-based data capture mechanism (GDC) for SOURCE on a daily basis; secondly, this ionic liquids subset is extracted, reorganized, and populated into ILThermo periodically; and thirdly, an updated ILThermo is exported from an internal server and imported to a NIST external server. ILThermo presents information via a high-density screen, which enables users to easily retrieve comprehensive ionic liquids data by navigating through a series of tables on one web page.
Infotherm: A thermophysical XML-database of mixtures and pure compounds in ThermoML-format
Martin Schmidt, Software development, FIZ Chemie Berlin, Franklinstr. 11, Berlin, 10587, Germany
The database Infotherm, currently available at www.chemistry.de/infotherm/, comprises more than 170,000 tables of PVT-properties, phase equilibria, transport and surface properties, caloric properties, acoustic and optical properties of 26,000 mixtures and 7,000 pure compounds taken from journals, data collections, manuals and measurement reports some of which exclusive to Infotherm.
This database contains search functions in order to combine about 150 properties, conditions and types of equilibria with definable value ranges, substance names, formulas and CAS registry numbers by Boolean operators.
Infotherm was relaunched in November 2005 with a download option in the ThermoML-format, an XML-based IUPAC standard for experimental thermodynamic property data storage and exchange.
Infotherm is a native XML-database, which is excellently adapted for the representation of a ThermoML-scheme and fast shared access by multiple users. The internal concept of the database is introduced and the import/export options will be illustrated. An insight into those parts of ThermoML, which are mainly used for the Infotherm application will also be provided and the quality of the data will be discussed. Finally, the roadmap for the further developments will be presented.
Thermo ML and thermodynamic calculations using VMGSim and VMGThermo
Marco Satyro, Virtual Materials Group, Inc, 657 Hawkside Mews NW, Calgary, AB T3G 3S1, Canada
Process simulators are used by engineers and scientists for the solution of material and energy balance equations that represent equipments found in processing plants. The most fundamental step for the creation of quality thermodynamic models used in the solution of balance equations is the proper characterization of pure component and mixture data. Therefore, the existence of a standard communication interface between physical property data providers and physical property consumers like simulators is a significant step towards rational use of resources, minimizing translation errors and maximizing the speed at which new data can be entered into process simulators. In this presentation we will show how ThermoML is used to facilitate the work process when integrated with the VMGSim process simulator and the VMGThermo physical property calculation kernel.
ThermoML and the PPDS thermophysical properties calculation software suite
Andrew I. Johns, Oil, Gas & Chemicals Group, TUV NEL Ltd, Scottish Enterprise Technology Park, East Kilbride, Glasgow, G75 0QU, United Kingdom and Alan C. Scott, Oil, Gas & Chemicals Group, TUV NEL Ltd, Scottish Enterprise Technology Park, East Kilbride, Glasgow, G75 0QU, United Kingdom.
This paper deals with the use of the ThermoML standard by the Physical Property Data Service software suite as a tool for the import and export of thermophycical property data.
An outline of the approach taken to implement the standard will be given together with some examples of its use.
An integrated Alzheimer's Disease information system
Huijun Wang and David Wild. School of Informatics, Indiana University, 1105 N. Union St., #112, Bloomington, IN 47408
Alzheimer's disease is a progressive, irreversible brain disorder with no known cause or cure. More than 4.5 million Americans are believed to have Alzheimer's disease and by 2050, the number could increase to 13.2 million. Brain imaging based on functional MRI (fMRI) is one of the powerful tools for characterizing age-related changes in functional anatomy. Completing such explorations may yield insights into the origins of age-associated cognitive change and perhaps even provide functional–anatomic markers that predict cognitive decline associated with Alzheimer's disease. Our integrated Alzheimer's Disease information system is designed to create applications by permitting data mining across a wild variety of chemical, biological, genomic and other databases using the IO-informatics Sentient package, which is designed to create applications by “pointing to” related but distributed data and securely and efficiently integrating relevant meta-data and in some cases image subsets into an object-oriented analysis and query environment.. The system has been developed in conjunction with several other institutions, and is of particular use in identifying biomarkers that cross traditional discipline boundaries. We outline several ways the system can be used to enhance Alzheimer's disease research, and discuss the implications of the system for future development of chemical and bioinformatics systems.
An intelligent system for mining and integrating diverse chemical information and chemoinformatics tools
Xiao Dong and David Wild. School of Informatics, Indiana University, Bloomington, IN 47408
We are developing a system of managing and mining chemoinformatics tools and data that uses web services and intelligent agents. Using this system, scientists are able to make high level requests to intelligent agents, which then use other agents and web services to carry out the request, employing a variety of computational tools and databases. In this poster we describe how use-cases can be implemented as workflows of web services wrapped around chemoinformatics tools and databases, thus enabling previously complex queries and requests to be carried out simply. The potential impact of systems such as this on the use of early stage drug discovery information will also be addressed.
Classification of enzyme reaction mechanisms
Noel M. O'Boyle1, Gemma L. Holliday2, Daniel E. Almonacid1, Peter Murray-Rust1, John B. O. Mitchell1, and Janet M Thornton2. (1) Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom, (2) EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
There is a clear need to develop informative enzyme classification schemes complementary to the EC system, which uses a hierarchical classification to describe enzymes by their overall reactions. For example, in the EC system all beta-lactamases are classified as 126.96.36.199. However, although the overall reactions are the same, the four different types of beta-lactamase use quite different mechanisms. Conversely, enzymes with very similar mechanisms may be widely separated in the EC system, as exemplified by the eukaryotic and prokaryotic phosphoinositide-specific phospholipases C. We have developed MACiE (Mechanism, Annotation and Classification in Enzymes), a representative database of enzyme reaction mechanisms. Each reaction step is fully described, both graphically and using annotation. MACiE will aid the development of a new enzyme classification system, based upon reaction mechanisms. Here we present progress in the development of a key component, a method to measure similarity between enzyme reaction mechanisms.
No one size fits all: Different pocket sizes for different mutants of HIV-PI: QSAR as a cheminformatics approach
Barun Bhhatarai and Rajni Garg. Department of Chemistry, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5812
QSAR is an important tool for ‘chemical information retrieval'. It helps in structure modification of ligand to yield a potent inhibitor. However, the successful outcome of future drug-therapy is determined by the drug-combination-therapy and retained susceptibility to mutant variants. Different ligands have different affinity to wild-type and mutant protein. It requires a clear understanding of the mutation pattern to explain them quantitatively. We use QSAR as a cheminformatics tool to understand the difference between the wild-type and mutant variety of HIV-protease. The comparison between the important parameters observed in QSAR models helps in finding ligand-receptor binding pattern and provides information about different types of receptor. QSAR models based on structural modification of Indinavir molecule analyzing different mutant variants such as K60C, V18C, NL4-3, 4X and Q60C were developed. Quantitative assessment of the similarities and difference between the wild-type and mutant receptor pocket in conformation and in affinity will be presented.
Novel similarity measure for comparison of spectra
Lorant Bodis1, Alfred Ross2, and Ernö Pretsch1. (1) Department of Chemistry and Applied Biosciences, ETH Zurich, ETH Hönggerberg, HCI E 312, Zurich, CH-8093, Switzerland, (2) Pharmaceuticals Division, F. Hoffmann-La Roche Ltd, Grenzacherstr, Basel, CH-4070, Switzerland
Most available vector comparison methods such as the correlation coefficient and Tanimoto coefficient are only able to find point-wise similarity. Similarity criteria for spectra comparison should include information about the neighborhood of the corresponding items in order to identify shifted signals as well. So far, only few such methods have been described. A recent method is based on a locally weighted cross-correlation function being normalized with geometric mean of the individual autocorrelation functions. A much better performance has been achieved with a novel similarity criterion. The two vectors to be compared are divided into i bins (i = 1, N) and for each division the integrals in each bin are calculated. Similarity indices are derived from the comparison of the corresponding integrals. The mean of the normalized similarity indices serves as the similarity criterion. The presented similarity criteria are characterized with contingency tables and histograms obtained from tests made on simple artificial 1H NMR spectra having different degrees of similarity. Furthermore, they are applied for comparing measured and estimated spectra of a complex real-life database. Although, so far, it has only been tested with one-dimensional 1H NMR spectra, due to the generality of the approach, the application of the novel procedure with spectra of two or more dimensions including image analysis is straightforward.
Pharmacophore-based molecular docking: A validation study
David C. Thompson, Iain J. McFadyen, Natasja Brooijmans, and Diane Joseph-McCarthy. Department of Structural Biology & Computational Chemistry, Wyeth Research, Chemical & Screening Sciences, 200 Cambridge Park Drive, Cambridge, MA 02140
In this present work our pharmacophore-based molecular docking approach, PhDock, is further validated against two well-known test sets: the CCDC/Astex set and the published Vertex set. Each element within our virtual screening protocol will be critically assessed as we examine potential correlations between the generation of site points through MCSS2SPTS and the position of true “hot spots” within the receptor, relationships between pharmacophores of the best scoring hits, and the importance of re-scoring hits with a physically realistic scoring function. The concepts discussed and tested here are of importance to the development of accurate and efficient approaches to structure-based drug design and are generally applicable to any docking scheme.
QCLDB II: Quantum Chemistry Literature Data Base II
Nobuaki Koga1, Masahiko Hada2, Kenro Hashimoto2, Haruo Hosoya3, Toshio Matsushita4, Hidenori Matsuzawa5, Umpei Nagashima6, Shinkoh Nanbu7, Keiko Takano3, and Shinichi Yamabe8. (1) School of Informatics and Sciences, Nagoya University, Furo-cho,Chikusa-ku, Nagoya, Japan, (2) Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Ohsawa, Hachioji, Tokyo, Japan, (3) Department of Chemistry, Ochanomizu University, 1-1-1 Otsuka, Bunkyo-ku, Tokyo, Japan, (4) Department of Chemistry, Osaka City University, 3-3-138 Sugimoto, Sumiyoshi-ku, Osaka, Japan, (5) Department of Chemistry, Chiba Institute of Technology, 2-17-1 Tsudanuma, Narashino, Chiba, Japan, (6) Research Institute for Computational Sciences, National Institute of Advanced Industrial Science and Technology, and CREST-JST, 1-1-1 Umezono, Tsukuba, Ibaraki, Japan, (7) Computing and Communications Center, Kyusyu University, Hakozaki 6-10-1, Higashi-ku, Fukuoka, Japan, (8) Department of Chemistry, Nara University of Education, Takabatake, Nara, Japan
Quantum Chemistry Literature Data Base (QCLDB) is a database of those papers published after 1978 which treat only ab initio calculations of atomic and molecular electronic structure. From about thirty core journals they are collected, surveyed, and given proper tags revealing the content and essence of the paper by the group of young Japanese quantum chemists. Those theoretical works even without reporting any computational results are also collected which are judged to have significant relevance to ab initio calculations, while no semi-empirical calculations are included. QCLDB is finally edited and copyrighted by Quantum Chemistry Data Base Group (QCDBG).
We announce the opening of our new web-version of QCLDB II (http://qcldb2.ims.ac.jp/) from April 1, 2004, which is offered the registered users free usage of the updated database including all the previous data. The new QCLDB II will help your research activities more efficiently than before.
Salt-Bridges are important for the HLA recognition with the KIR2DL receptors revealed by molecular modeling studies
Sivanesan Dakshanamurthy, oncology, Lombardi Cancer Center, Georgetown University, reservoir road, E401, NRB, washington DC, DC 20057
Natural killer (NK) cells constitute an important part of the innate immune system. Human killer-cell immunoglobulin-like receptors (KIR) are expressed on the surface of natural killer (NK) cells and modulate NK cell mediated cytotoxicity of tumor cells. These receptors deliver activating or inhibitory signals that depend, in part, on binding to HLA ligands. There are many different KIR2DL polymorphic receptors and exhibits several unique features. Previous studies indicated that HLA recognition by KIR depends on charge complementarity between them. Usually, KIR provides acidic residues and HLA contributes basic residues to the interface in addition to the hydrogen bond interactions. In the present work, several different mutations on the KIR2DL and HLA interface residues and subsequently, the stability, energetics of various KIR2DL/HLA complexes were performed by molecular mechanics and dynamics simulations. It has been found that the salt-bridge interactions between complementary residues are important for the KIR2DL receptor and HLA recognition.
Similarity calculation for anti-HIV drugs based on spanning tree matching algorithm
Zhong Li and Kayvan Najarian. Department of Computer Science, The University of North Carolina at Charlotte, 9201 University Blvd., Charlotte, NC 28223
Molecular similarity calculation are important for drug design. This paper presents a novel molecular similarity calculation method based on spanning tree matching algorithm and the physical chemistry parameters of atoms and bonds. The similarity between 15 FDA proved anti-HIV drugs were calculated and clusters formed according to their similarities.
Computational Chemistry in XML
Peter Murray-Rust, Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, United Kingdom, Henry S. Rzepa, Department of Chemistry, Imperial College of Science, Technology and Medicine, Exhibition Road, South Kensington, London SW7 2AY, United Kingdom, Joe A Townsend, Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom, and Dan Wilson, Mineralogisches Institut, Johann Wolfgang Goethe-Universit¨at, Senckenberganlage 30, Frankfurt am Main, Germany.
High-throughput computation of the structures and properties of molecules and materials is now supported by a generic infrastructure based on Chemical Markup Language (CML). By converting the input to and output from a code (such as CASTEP, GAMESS, DL-POLY, SIESTA, etc.) it is possible to chain together several operations which can process jobs automatically. This is supported by flexible dictionaries (XML) and ontologies (RDF) to represent computational processes, physical properties, strategies, parameters and algorithms. This can support coarse-grained parallelism, data mining and analysis. XMLisation is either through the additional of CML libraries to the code or transduction of legacy data (stylesheets and parsers). An important benefit is the increased detection of program errors and control of input and output quality.
AnIML: A new XML-based standard format for analytical data
Maren Fiege, Waters GmbH, Europaallee 27-29, 50226 Frechen, Germany
Analytical instruments today are producing data in a multitude of different formats. This makes the interchange of data between systems difficult. To deal with this problem, standard formats like ANDI and JCAMP have been created in the past. Based on the experience gained with these, ASTM has started an effort to create a highly flexible yet validateable standard format based on XML that can accommodate any kind of analytical data. This presentation will give an introduction into the concepts behind AnIML, and will show how AnIML can be customized to suit special needs without breaking the standard.
Chemistry publications in CML
Peter T. Corbett, Unilever centre for Molecular Sciences Informatics, Department of Chemistry, Lensfield Road, Cambridge, United Kingdom, Peter Murray-Rust, Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, United Kingdom, Nick E Day, Department of Chemistry, Unilever Centre for Molecular Sciences Informatics, Lensfield Road, CB2 1EW Cambridge, United Kingdom, Joe A Townsend, Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom, and Henry S. Rzepa, Department of Chemistry, Imperial College of Science, Technology and Medicine, Exhibition Road, South Kensington, London SW7 2AY, United Kingdom.
Much of the semantics in a chemistry article are now supported by Chemical Markup Language (CML) describable by an XML Schema (XSD). CML can support molecules, structures, reactions and reaction schemes, spectra (including annotations) and physicochemical data. These are supported by dictionaries and lexicons (also in XML) that provide linguistic and semantic support for the markup. Manuscript components can be created either with a range of authoring tools or through linguistic processing of conventional text. The semantics in such papers can now be processed by machine leading to high-throughput information extraction. A major feature is that chemical documents will be quicker to author and have a higher quality of embedded data and structure through machine validation.
Ensuring the interoperability of the Analytical Information Markup Language (AnIML)
Alexander Roth1, Ronny Jopp1, Peter J. Linstrom2, and Gary W. Kramer1. (1) Biochemical Science Division, NIST, 100 Bureau Drive, Bldg. 227; Rm. A-157, Gaithersburg, MD 20899-8312, (2) Physical and Chemical Properties Division, NIST, Building 221, Room A357, 100 Bureau Drive, Stop 8380, Gaithersburg, MD 20899-0830
AnIML (Analytical Information Markup Language) is being created by ASTM Subcommittee E13.15 to describe chromatography and spectroscopy data and metadata based on XML (eXtensible Markup Language) and its associated technologies. Once in AnIML format, analytical data can be interchanged over the web, converted to other formats, validated, or visualized in multiple formats using existing XML-based tools.
AnIML is built around a core schema that defines ways for describing almost any data. Technique Definition files are used to constrain the myriad data description mechanisms available for a given analytical technique to only those commonly accepted, to delineate the metadata items ordinarily associated with such domain data, and to permit content extension by vendors and users without changing the core schema. This presentation will describe the naming and design rules (NDRs) and other techniques being employed to ensure that AnIML is as interoperable as possible with other markup languages.
Incorporating Units Markup Language (UnitsML) into AnIML (Analytical Information Markup Language)
Ronny Jopp1, Alexander Roth1, Peter J. Linstrom2, and Gary W. Kramer1. (1) Biochemical Science Division, NIST, 100 Bureau Drive, Building 227; Rm. A-159, Gaithersburg, MD 20899-8312, (2) Physical and Chemical Properties Division, NIST, Building 221, Room A357, 100 Bureau Drive, Stop 8380, Gaithersburg, MD 20899-0830
Units Markup Language (UnitsML) is being developed to encode scientific units of measure using XML (eXtensible Markup Language). The development and deployment of a markup language specifically for units will allow for the unambiguous storage, exchange, and processing of numeric data, thus facilitating collaboration and the sharing of information, especially over the Internet. Incorporating UnitsML into other markup languages prevents duplication of effort and improves interoperability.
ASTM Subcommittee E13.15 is creating AnIML (Analytical Information Markup Language) to describe chromatography and spectroscopy data and metadata based on XML and its associated technologies. AnIML facilitates access to analytical data by building in descriptions of the data and metadata with delimited tags. UnitsML is being employed to handle the markup of the units information in AnIML. This presentation will describe how UnitsML is being used and how it is being incorporated into AnIML.
Integration of the Chemical XML standard in Laboratory Content Management Systems
Michael Burke, Agilent Technologies, 6612 Owens Drive, Pleasanton, CA 94588
Abstract text not available.
Feature-map vectors: A new family of informative and interpretable descriptors for drug discovery
Gregory A. Landrum, Julie E. Penzotti, and Santosh Putta. Rational Discovery LLC, 555 Bryant St. #467, Palo Alto, CA 94301
In order to develop robust machine-learning or statistical models for predicting biological activity, descriptors that capture the essence of the protein--ligand interaction are required. In the absence of structural information from x-ray or NMR experiments, deriving informative descriptors can be difficult. We have developed feature-map vectors (FMVs) to address this challenge. FMVs are problem-specific – derived from the conformational models of a few actives – and highly interpretable. By using shape-based alignments and scoring with chemical features, FMVs combine information about a molecule's shape and the pharmacophores it can match. We will present the details of the algorithm and the results of validation studies that establish the utility and interpretability of FMVs. After describing the performance of models built to predict biological activity for several biological targets (CDK2, thrombin, DHFR, and ACE), we will examine what can be learned about the protein--ligand interactions from the descriptors themselves.
Molecular fields point the way to a new paradigm in molecular modeling
Mark D. Mackey, Cresset BioMolecular Discovery, Spirella Building, Bridge Rd, SG 6 4ET, Letchworth, United Kingdom
Proteins recognise ligands through their surface properties (or fields), not their particular arrangement of atoms and bonds. Describing molecules in terms of molecular fields leads to powerful new techniques for ligand- and structure-based drug design. In particular, we detail a powerful field-based virtual screening method with real-world successes. We also present a new technique of field-based molecular alignments and its success in determining the bound conformation of active molecules purely from ligand data in the absence of any protein information. Case studies will be reported.
Scalable partitioning and exploration of chemical spaces using geometric hashing
Rajarshi Guha1, Debojyoti Dutta2, Peter C. Jurs1, and Ting Chen2. (1) Department of Chemistry, Pennsylvania State University, 104 Chemistry Building, University Park, State College, PA 16802, (2) Department of Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089
We introduce a data mining framework built on top of an approximate nearest neighbor algorithm termed Locality Sensitive Hashing (LSH). The core LSH algorithm hashes molecular descriptors so that points close to each other in descriptor space are also close to each other in the hashed space, resulting in sublinear search times. We validate the accuracy and performance of our framework on three real datasets of sizes ranging from 4,337 molecules to 249,071 molecules. Our results indicate that the identification of nearest neighbors using the LSH algorithm is two orders of magnitude faster than the ordinary kNN method and is over 94% accurate. We also use this framework to determine extremely rapidly whether a compound is located in a sparse region of chemical space. The algorithm is quite accurate compared to results obtained using PCA-based heuristics.
Generation of multiple pharmacophore hypotheses using a multiobjective optimization algorithm
Valerie J. Gillet1, Simon Cottrell1, and Robin Taylor2. (1) Information Studies, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, United Kingdom, (2) Cambridge Crystallographic Data Centre, 12, Union Road, Cambridge CB2 1EZ, United Kingdom
A pharmacophore is defined as the set of chemical features and the spatial relationships between the features that together form a necessary requirement for biological activity. The two major issues in pharmacophore identification are the correct representation of the chemical features, so that bioequivalent features are mapped together, and the appropriate sampling of conformation space so that the bioactive conformation of each compound is found. Often, there are several plausible hypotheses that could explain the same set of ligands and in such cases, it is important that the chemist is presented with alternatives that can be tested with different synthetic compounds. We have applied a multiobjective genetic algorithm to the pharmacophore elucidation problem to generate a range of chemically diverse solutions that represent equally plausible hypotheses. The hypotheses are evaluated over a number of objectives which are considered independently, according to the principles of Pareto dominance. Recent developments of the method will be described which allow the identification of pharmacophore features that are common to some, but not all of the ligands.
Efficient overlay of molecular 3-D pharmacophores
Gerhard Wolber and Alois A. Dornhofer. Inte:Ligand GmbH, Mariahilferstrasse 74B/11, 1070 Wien, Austria
Aligning and overlaying two or more bio-active molecules is one of the most important tasks in computational drug discovery and cheminformatics. Molecule characteristics from the view point of a macromolecular target - represented as a 3D pharmacophore - are of special interest when regarding macromolecule-ligand interaction. We present a novel approach for aligning rigid three-dimensional molecules according to their chemical-functional and steric pharmacophoric features. Optimal chemical feature pairs are identified using distance and density characteristics and obtained by correlating pharmacophoric geometries. The presented approach proves to be faster than existing combinatorial alignments and creates more reasonable alignments than earlier methods. Correlations between two similar pharmacophore features can even be identified if they show different constraints. Examples will be provided to demonstrate the feasibility and speed of this method.
Fig. 1. Three CDK2 inhibitors from the PDB (1ke5, 1ke6, 1ke7) in their bio-active conformation all aligned with their 3D pharmacophores describing the ligand-macromolecule interaction. Graphics were created with LigandScout 1.0, available from http://www.inteligand.com
Use of XML for analytical instrument control
Alex Mutin, Shimadzu Scientific Instruments, Inc, 7102 Riverwood Drive, Columbia, MD MD
There is a growing interest among analytical instrument users for multi-vendor support of their equipment in terms of instrument control, data acquisition and data processing capabilities.
Different vendors provide different software interfaces to control their instruments. Many users prefer to standardize on software to minimize validation and training costs, while keeping their hardware diverse. Because most laboratory software have limited multi-vendor support, often times when shopping for a new instrument users are burdened by a necessity to stay with one type of software.
XML-based web service embedded into an analytical instrument is a new technology that can potentially solve multi-vendor support limitations of current software. A web server equipped HPLC is directly connected to a computer network. Such system can be controlled from any PC without a need for any additional software except for a web browser such as the Internet Explorer. If laboratory software is linked with such web-service one can easily assemble systems out of multi-vendor hardware components while controlling them from the same application. In addition, the data can be interchanged between instruments, applications and databases using the Analytical Information Markup Language (AnIML) format.
XML for comprehensive 2-D gas chromatography
Arvind Visvanathan1, Qingping Tao2, Stephen E. Reichenbach3, Mengke Li2, Shilpa Deshpande3, and Xue Tian3. (1) University of Nebraska-Lincoln, Lincoln, NE 68588-0115, (2) GC Image, LLC, Lincoln, NE 68503, (3) University of Nebraska, Lincoln, NE 68588-0115
Comprehensive two-dimensional gas chromatography (GCxGC) is an emerging technology for chemical separations that provides an order-of-magnitude improvement in separation capacity, significantly greater signal-to-noise ratio, and higher-dimensional chemical ordering compared to traditional gas chromatography. Information systems are being developed to visualize, process, and analyze the complex data produced by GCxGC. The eXtensible Markup Language (XML) is powerful and flexible technology for structuring and describing data and so is especially well-suited for expressing the rich relationships that are only beginning to be discovered in GCxGC data. This paper describes the use of XML for GCxGC data, metadata, and information, including raw and processed data, peak tables, templates for chemical identification, journals and scripts with processing sequences, and formal reports. Ongoing work is evaluating XML-based technologies, such as the ANalytical Information Markup Language (AnIML), for GCxGC methods.
Integrative analytics and data harmonization in TOPCOMBI
Francois Gilardoni, Industrial Applications, InforSense Ltd, 459A Fulham Road, London, SW10 9UZ, United Kingdom and David Farrusseng, Groupe de Catalyse, Institut de Recherches sur la Catalyse IRC–CNRS, 2, Av. Albert Einstein, F-69626 Villeurbanne, France, France.
Best practice data mining techniques are ineffective without high-quality data, fast and reliable access to the information and a consistent capture of data and processes. The experimental issue is addressed with an apposite methodology by the experimentalist. The second topic is more challenging because it has to cope with the disparate data structures and data exchange protocols, and usually requires a plethora of data mining and analytical tools. This heterogeneous information is overwhelming to maintain and requires tailored tools to be utilized. This drastically impacts the total cost of ownership of the Informatics infrastructure, precludes a proper dissemination of knowledge and hinders scientific breakthroughs. TOPCOMBI, a project for Nanotechnologies and Nanosciences funded by the European Commission, dedicates collegially important resources to harmonize and integrate this incongruent information issued from high-throughput platforms, instruments, and data mining. TOPCOMBI is investigating how XML schemas – existing and in development – and webservices suit the stringent requirements for data standardization, accessibility, portability and modularity with new computational techniques. Also, the consortium is exploring how the semantic and the underlying ontology defined in the XML schema can facilitate the transformation of data into tangible knowledge. We will present the work in progress and how the integrative analytics paradigm and data harmonization operate on both software and data.
XML for quantum chemistry program input
Gary S. Kedziora, User Productivity Enhancement and Technology Transfer, High Performance Technonogies, Inc, ASC/HP Bldg. 676, 2435 5th St., Wright Patterson Air Force Base, OH 45433-7802, Scott R. Brozell, Department of Chemistry, The Ohio State University, 100 W. 18th Avenue, Columbus, OH 43210, and Eric A Stahlberg, Ohio Supercomputer Center, 1224 Kinnear Road, Columbus, OH 43212.
A new XML input format for the COLUMBUS suite of Multi-Reference Configuration Interaction (MRCI) programs will be described. This XML language, called COLUMBUS Input Meta Language (CIML), is designed to be easy for a human to prepare with a text editor as well as by the back end of a GUI. It specifies a clear and complete description of the computation that is suitable for archival. Since MRCI is generally not used as a model chemistry, CIML provides the flexibility for tailoring a MRCI calculation to a specific molecule, which often requires careful planning and exploratory runs. A corresponding program has been written that parses the CIML, produces the legacy input for the COLUMBUS programs, and provides the user with useful feedback about the calculation. The ontology of more general quantum chemistry calculations will be discussed in relation to CIML.
A marriage made in torsional space: Using GALAHAD models to drive pharmacophore multiplet searches
Robert D. Clark, Jennifer Shepphird, and Essam Metwally. Tripos, Inc, 1699 S. Hanley Rd., St. Louis, MO 63144
GALAHAD is a pharmacophore alignment tool that generates hypermolecular models composed of a 3D search query plus a set of aligned ligands as discrete substructures. A pharmacophore multiplet hypothesis generated from such a model is naturally fuzzy, in that the features in each molecule can "see" the features in all the others. Doing so effectively incorporates the variation in each feature's position across ligands into the hypothesis as well as its average position. This approach allows constraints considerably more complex than the spherical spatial constraints generally used in 3D searching to be included. Fast pharmacophore multiplet (Tuplet) searches carried out using such hypotheses can then augment or replace flexible 3D database searches.
Feature-based pharmacophores as a tool for activity profiling: Application examples
Thierry Langer, Institute of Pharmacy, University of Innsbruck, Innrain 52, 6020 Innsbruck, Austria
The chemical feature-based pharmacophore modelling approach has proven to be highly useful in virtual screening experiments. Thus, large molecular structure databases may be searched rapidly in order to retrieve biological active compounds. In the presentation, successful application examples will be discussed, covering targets from different pathologically important biochemical pathways. Details will be provided on high throughput structure-based pharmacophore generation methods as well as on compound selection issues. Moreover, the combined usage of pharmacophore screening and molecular docking will be demonstrated. In this context we show how scoring functions may be assessed for their prediction capacity by using enriched virtual combinatorial libraries.
Bridging the gap between two pockets by virtual screening
Holger Claußen, Markus Lilienthal, and Christian Lemmen. BioSolveIT GmbH, An der Ziegelei 75, 53757 St. Augustin, Germany
One strategy to enhance the binding affinity of active site-directed inhibitors is to identify additional sub-pockets or cavities on the protein surface to which additional tether groups may bind. The latter can be further linked to a scaffold by a more or less unspecific spacer group. In fact, the different scaffold variations, suitable tether groups and linkers can be considered a combinatorial library.
Our approach combines two add-on modules of the docking program FlexX  to assess the power of this strategy. We constrain the binding mode of the scaffold by receptor based pharmacophore constraints and guide the docking to more accessible potential binding pocket(s) by additional FlexX-Pharm  constraints. The combinatorial library of scaffold variations, tether, and spacer groups is docked with the combinatorial algorithms of FlexX-C, which can drastically reduce the average runtime by a factor of up to 30 compared to sequentially docking the corresponding enumerated library. FlexX-C has been extended to deal with pharmacophore constraints on the fly.
We demonstrate how we defined pharmacophore constraints for two specific sub-pockets for a given target, and how we docked a combinatorial library based on a) a set of suitable fragments for the respective sub-pockets and b) a number of spacer groups. Suitable linker groups could efficiently be detected with this approach.
 Krier et al., J Med Chem. 2005 Jun 2;48(11):3816-22.  a) Rarey et al., J Mol Biol. 1996 Aug 23;261(3):470-89; b) http://www.biosolveit.de/flexx  Hindle et al, J Comput Aided Mol Des. 2002 Feb;16(2):129-49
Identification of novel ACE2 inhibitors by structure-based pharmacophore modeling and virtual screening
Monika Rella, University of Leeds, Institute of Molecular and Cellular Biology, Leeds, LS2 9JT, United Kingdom and Richard M. Jackson, Department of Biochemistry and Microbiology, University of Leeds, Garstang Building, Department of Biochemistry and Microbiology, University of Leeds, Leeds LS2 9JT, United Kingdom.
The metalloprotease Angiotensin Converting Enzyme (ACE) is an important drug target for the treatment of hypertension and heart disease. Recently, a close and unique human ACE homologue termed ACE2, has been identified and is currently being validated as new cardio-renal disease target. We have undertaken a structure-based approach to identify novel small molecule inhibitors employing the resolved inhibitor-bound ACE2 crystal structure. Computational approaches focus on virtual screening of large compound databases using various structure-based pharmacophore models. Model selectivity was assessed by hit reduction of an internal ACE inhibitor database and the Derwent World Drug Index. A subset of 25 compounds was proposed for bioactivity evaluation derived from high geometric fit values and visual inspection as well as diverse structure. Seventeen compounds were purchased and tested in a bioassay. We show that all compounds displayed some inhibitory effect on ACE2 activity, the six most promising candidates exhibiting IC50 values in the range of 79-178 µM. Their binding mode and interactions were further analysed via docking and selectivity issues arising from biological counterscreens on ACE and NEP will also be discussed.
Reference: Rella, M., Rushworth, C.A., Guy, J.L., Turner, A.J., Langer, T. and Jackson, R.M. Structure-based Pharmacophore Design and Virtual Screening for Novel Angiotensin Converting Enzyme 2 Inhibitors (J. Chem. Inf. Mod., in press).
Applying computational pharmacophore models and in vitro approaches to rapidly identify novel P-glycoprotein ligands
Cheng Chang1, Praveen Bahadduri2, Peter Swaan2, and Sean Ekins3. (1) Biophysics Program, Ohio State University, 1614 Sparks Rd, Sparks, MD 21152, (2) Department of Pharmaceutics, University of Maryland at Baltimore, 20 Penn St., Baltimore, MD 21201, (3) GeneGo Inc, 500 Renaissance Drive, Suite 106, St. Joseph, MI 49085
Multidrug resistance has become a major obstacle in the treatment of cancer due to over expression of MDR pumps including P-glycoprotein (P-gp). At the same time the transporter has a major role in determining the absorption of some drugs. Despite its overall significance, P-gp is poorly characterized at the atomic level due to difficulties related to membrane protein crystallization. Computational pharmacophores have been generated to predict the inhibition of P-gp from in vitro data for several cell systems. Pharmacophore and quantitative structure activity relationship models derived from different subsets of substrates and inhibitors for P-glycoprotein (P-gp) have been evaluated to identify novel P-gp ligands. We have applied two distinct P-gp digoxin inhibition models and one P-gp substrate model to search three databases and assess their efficacy as database filters. One database (SCUT) consisted of 576 known and widely prescribed drugs. Inhibition pharmacophore 1 retrieved 40 drugs from the SCUT database of which 25 are known substrates of P-gp. The P-gp substrate pharmacophore returned 6 molecules of which 4 are known P-gp substrates. Inhibition pharmacophore 2 retrieved 68 drugs of which 33 drugs were identified as known P-gp substrates. Eight additional molecules (acitretin, cholecalciferol, miconazole, misoprostol, nafcillin, repaglinide, salmeterol, telmisartan) and two negative control molecules (phenelzine and zonisamide) with no published details for P-gp affinity were selected for testing after using these pharmacophores. The MDCK-MDR1 in vitro cell model was used to confirm their inhibitory effect on 3H-digoxin transport. The results indicate that the P-gp pharmacophore models identified seven new compounds with affinity for P-gp. The identification of these novel molecules demonstrates how pharmacophores for this and other transporters are of value for identifying potential molecules with affinity, efficiently during drug discovery.
Combined receptor-ligand pharmacophore method for screening ligands binding to G protein coupled receptors
Sandhya Kortagere, UMDNJ-Robert Wood Johnson Medical School, Dept. of Pharmacology, Piscataway, NJ 08854 and William J. Welsh, Department of Pharmacology, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, 661 hoes lane, piscataway, NJ 08854.
G-protein coupled receptors (GPCRs) are a large superfamily of proteins that are targets for nearly 60% of drugs in clinical use today. In the past, the use of structure based drug design strategies to develop better drug candidates has been severely hampered due to the non-availability of the receptor three-dimensional structure. However, with recent advances in modeling techniques and better computing power, atomic level details of these receptors can be derived from molecular models. Using information from these models coupled with various experimental evidence it is now feasible to build receptor pharmacophores. In this study, we demonstrate the use of a combined receptor-ligand pharmacophore that can be effectively used to screen ligands that bind to GPCRs. The successful candidates are then screened using our Shape Signatures tool and ranked based on a scoring function that can be easily customized to either a particular receptor subtype or a sub-family of receptors.
Comprehensive synthesis planning using multiple reaction search algorithms
Valentina Eigner-Pitto1, Josef Eiblmaier1, Hans Kraut1, Heinz Saller1, Peter Loew1, and Guenter Grethe2. (1) InfoChem GmbH, Landsberger Strasse 408, Munich, 81241, Germany, (2) Consultant, 352 Channing Way, Alameda, CA 94502-7409
InfoChem's approach to computer based synthesis design integrates various tools and algorithms to support the synthetic chemist in finding the optimal synthesis route to target molecules. Established search modes such as reaction substructure and role searching are enhanced by innovative algorithms such as reaction type and name reaction searching. A new retrosynthetic approach based on automatic transform library generation will be presented. A key innovation is that reaction classification allows a proposed retrosynthesis to be verified against databases of known chemical reactions. Our classification categorizes reactions according to the type of chemical transformation they represent. The resulting reaction "ClassCodes" are used to find and to interlink reactions having the same reaction type across large reaction databases. Synthetic routes can be further refined by name reaction filtering. The name reactions are organized in hierarchical order from main categories (addition, rearrangements, etc.) down to highly specific reaction variants (e.g., “Diels-Alder reaction”).
An integrated approach to synthesis planning and design: Linking in-house/commercial reaction and sourcing data, exploiting retrosynthetic scheme tools
Terry Wright, Elsevier MDL, 14600 Catalina Street, San Leandro, CA 94577 and Keith T. Taylor, Product Marketing, Elsevier MDL, 14600 Catalina Street, San Leandro, CA 94577.
Reaction databases: The contents and focus
Marudai Balasubramanian, Research Informatics, Pfizer, 2800 plymouth Rd, Ann Arbor, MI 48105
In recent years, the electronic age has created an enormous increase in the production of fewer large and comprehensive reaction databaes. The introduction of client-server-based reaction databases access systems certainly attracted the greater number of end users. The contents, coverage period, abstracting guidelines, and sources of data are the few reasons why chemists need multiple reaction databses for complete information. Reaction searching is incomplete if conducted with only one reaction database. The present work will provide insight in to the searches conducted on synthetic transformation, and author search in specialized smaller and comprehensive reaction databases. The hit answers are compared and overall results indicate that indeed these reaction databases are complementary to each other.
Total synthesis sketches generated from Notebook entries
Willi Sieber, Novartis Institutes of Biomedical Research, WSJ-310.517, 4002 Basel, Switzerland
In our Template Assisted Notebook application it was a requirement to generate a total synthesis sketch for the registration of tests compounds.
The sequence is gathered from ordinary notebook entries by looking at precursors until no further ones are found.
The XML file exported from the MS-Word document will be exploited for all necessary information. The program will estimate the space requirements for the synthesis sequence to be arranged on a single page and calculate feasible scaling factors.
In a next step it will virtually try to assemble the single steps in lines from left to right like words in a paragraph of text. If it is successful it will generate the layout as an ISIS Draw sketch. The sketch can be reorganized and improved if desired.
Genome-scale classification of metabolic reactions without assignment of reaction centers
Joao Aires-de-Sousa and Diogo A. R. S. Latino. REQUIMTE and Department of Chemistry, New University of Lisbon, campus FCTUNL, 2829-516 Caparica, Portugal
Enzymatic reactions are generally classified by EC numbers, which are chemically meaningful, but based on rules often ambiguous and heterogeneous. Their use for diversity analysis of metabolic reactions (the reactome) is limited. We report the mapping of a genome-scale set of 3468 enzymatic reactions by a self-organizing map (or Kohonen neural network), and their classification in terms of EC numbers. Computer assignment of EC numbers from the reaction equation is essential for the reconstruction of metabolic pathways. Furthermore, we show how a map of enzymatic reactions can be used to identify similarities between reactions exhibiting strong differences in EC numbers. This work uses a method for reaction representation that avoids identification of the bonds and atoms involved in the reaction (reaction center). The approach shows a general compatibility with the well established EC numbers, and overcomes some of their limitations for diversity analysis of the reactome.
Ethics, media, and climate change
DH. Gottlieb, Environmental Philosophy, Canopy Publishing, 946 Stockton Street, #15F, San Francisco, CA 94108
At what point do news organizations or any popular media outlet need to consider the ethics of what they are doing? To publish any viewpoint through mechanisms paid for by the advertisers' dollars means, at least in 21st century America, that the outlet's responsibility is to advertisers. Without funding, there are no resources to pursue worthwhile causes. Does the greater good then dictate that media outlets must bow to the wishes of advertisers? 19th century populations had limited access to media. However, since the Internet and other systems opened up access for virtually everyone, the issue for media outlets is no longer: the greater good vs the revenue to support the greater good. The author will make the point that ethics, rather than being esoteric abstractions, are, in the framework of the climate change debate, a species survival mechanism and therefore more pertinent than revenue.
Genesis of a science story: From idea to the printed page
Amanda Yarnell, Chemical & Engineering News Boston, 27 Everett St #1, Cambridge, MA 02138
I hope to provide a window into how science reporters take an interesting scientific idea and turn it into a story directed to a wider audience. Stories that catch science reporters' attention are the same ones that catch readers' attention: They describe a first, present a new solution to a long-standing problem, give a fair picture of both sides of a controversial issue, or tell a person's story. I'll describe the different ways in which science reporters collect such story ideas. By considering case studies of how ideas from different sources-including scientific literature, scientific meetings, and current events--became stories in Chemical & Engineering News, I'll discuss how scientists can work with science reporters to gain a wider audience for their work.
Talking to reporters 101
Patricia Thomas, Grady College of Journalism & Mass Communication, University of Georgia, Baldwin at Sanford, Athens, GA 30602-2183
If chemists don't talk to reporters about new research and important policy issues, someone else will. Even shy scientists can learn to prepare for media interviews, explain their experimental results with anecdotes, examples, analogies and metaphors, and use their intellectual passion to excite public interest in research. Thomas interviewed thousands of scientists during 30 years as a science and medical writer, and she'll draw on this experience and preview a new study – launched since she joined UGA last fall – examining how stories are selected in the newsroom.
The deterioration of popular science reporting: Biases resulting from public, commercial and political pressure combined with our growing national science illiteracy
Lloyd A. Davidson, Seeley G. Mudd Library for Science & Engineering, Northwestern University, 2233 Tech Drive, Evanston, IL 60208
Recently a number of historically responsible and credible media outlets have produced highly biased articles and interviews on controversial science subjects, particularly those that contain political or religious components. These biases result from personal opinion-based views, timidity in taking unpopular stands, naïve credulity for anti-science arguments, fear of political reprisal, the threat of losing commercial sponsors and a lack of understanding of the scientific method. Examples of good and bad reportage will be dissected to show just what went right or wrong in each and how those with problems might have been (often easily) corrected. While the long term damage biased reporting does is primarily to the historical heritage of the media themselves, it nevertheless causes acute damage to science and science education. These are not minor issues as the basis of the integrity of American democracy becomes threatened by the erosion of science as a factor in decision making.
Dealing with acute hazardous chemical releases
James Holler, The NCEH/ATSDR Information Center, 61 Forsyth Street SW, Atlanta, GA 30303
To be added
Dependence of various skin diseases on environmental pollution
Shavkat T Khakimov, Department of skin venerology, Samarkand city skin-venerology dispanser, Mervskaya-91, Samarkand, 703034, Uzbekistan
Rapid industrial development, rise of nuclear power engineering, as well as critical situations in some nuclear power plants and objects of nuclear complexes gave rise to noticeable pollution of environment by technically originated radionuclides.
In the present work we carried out investigations of elementary composition of soils and plants selected near industrial objects of Samarkand city (Uzbekistan).
In order to determine the concentration of technogenic elements and radionuclides we applied various methods of nuclear physics.
The experiments showed that rise of various skin diseases is observed in places where the concentration of toxic elements (As, Pb) exceeds the limit of maximal acceptable value.
Useful tools for synthesis planning from the Syngen Program
James B. Hendrickson, Department of Chemistry (MS 015), Department of Chemistry, Brandeis University, Waltham, MA 02454-9110
The SynGen program for retrosynthetic generation of synthesis routes has not had the usage which it appears to deserve, but its key tools will be useful for anyone in synthesis planning. The program is very fast and will find all the shortest routes to any input target. It has three central features that assure its efficiency: a) digital characterization of structures and reactions; b) organization by basic C,N,O,S skeleton and a starting materials catalog organized by skeleton; and c) rigorously unique canonical numbering of skeletons. The characterization of structures generalizes skeleton and functionality digitally with only four kinds of bonds to skeletal atoms and so hugely abstracts the search space for rapid search. It also generalizes all kinds of reactions by net structural change and so requires no reaction database. The canonical numbering of skeleton allows rapid and reliable comparison of generated structures with the starting material catalog. These are important tools for anyone to use, and our planned revision of SynGen will also provide for preliminary ranking of any large set of possible targets for their ease of synthesis.
Highly efficient process for chemistry development of parallel synthesis
Ying Zhang1, Jean E. Patterson1, Andrew Smellie2, and Libing Yu1. (1) Chemistry Department, ArQule, Inc, 19 Presidential Way, Woburn, MA 01801, (2) ArQule Inc, 19 Presidential Way, Woburn, MA 01801
Parallel synthesis has been increasingly integrated into the drug discovery process in order to move discovery programs forward quickly and cost effectively. However, developing a chemistry protocol and defining its scope for a given chemistry in a timely manner remains a challenge to medicinal chemists. At ArQule, we have integrated novel technologies and developed processes to address the bottlenecks in chemistry development. In this presentation, we detail the recent advances, including design of experiments, data analysis and data visualization in protocol development for parallel synthesis, and demonstrate the application of those tools in facilitating the process of shortening development cycle time.
Grid computing architecture: A roadmap
Henri B. Tuthill, Discovery Informatics, AstraZeneca, 200 West Street, Waltham, MA 02451
In an era of ever increasing competition for resources, one must endevour to make the best use of all available infrastructure to provide improved services to our end-user science community. Grid architecture is a service oriented architecture to consolidate resources within research areas to gain in efficiency and effectiveness. The presentation maps subject area data through their pre-defined affinity and research area program to their appropriate repositories. The roadmap is a precursor to the goal of establishing a web services organization. To paraphrase Louis Carroll, without a good map, any road may appear to be the right road.
Synthesize this! Using SciFinder as an essential tool for mining the world's synthetic literature
Kurt Zielenbach1, Eva M. Hedrick2, Damon Ridley1, Roger Schenk2, and Rebecca A. Wolff3. (1) Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, (2) Database Quality Engineering, CAS, 2540 Olentangy River Rd., Columbus, OH 43210, (3) Product Marketing Management, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505
With escalating research costs and time-to-market demands always mounting, the pressures on the synthetic chemist for increased efficiency and productivity have never been greater. Quick and easy access to timely and relevant synthetic information can certainly ease this burden. SciFinder™ is the leading tool for chemists to explore and mine the vast landscape of publicly available synthetic information including access to the CASREACT database, one of the largest collections of curated reactions in the world. This talk will focus on the advantages of the database and the effective use of a powerful array of search, analysis and linking tools and how they can be effectively utilized to solve research problems typically faced by chemists.
A road map for second generation tools for early ADMET prediction
Jacques R. Chretien, Nadege Piclin, and Marco Pintore. BioChemics Consulting SAS, Innovation Center, 16 L. de Vinci, 45074 Orleans cedex 2, France
Early ADMET prediction had been a key challenge for a long time in drug discovery and might still appear as a nightmare even for large pharmas. Based on recent innovative cutting edge technologies using Fuzzy Logic , Genetic Algorithms and Radial Basis Function (RBF), the framework of a Second Generation Tools (SGT) for early ADMET prediction has been recently demonstrated. It is supported by the Molecular Experimental Design (MED) concepts. Furthermore, simulated in silico benchmarks and actual in silico vs. in vitro comparisons  have brought a strong validation. A Road Map is defined by specifying the efforts underway to implement ADMET/SGT within a global strategy supported by the Bio-Rad KnowItall and other plate-forms. Key examples will be presented to show the potentialities and user-friendly evolutions of such SGT addressed at least to three different specialist categories involved in the optimization of pharmaco-kinetic properties: (i) bench-chemists to select the best pharmaco-modulations, (ii) biologists to help them delineate the driving factors of overall mechanisms and (iii), more generally, top managers requiring a strong handling of chemical molecular diversity to reduce significantly time and cost of the drug discovery process.
 Pintore M. et al, Eur J Med Chem (2003), 38, 427-431.  Lombardo F. et al, J Med Chem 2004, 47, 1242-1250.
In silico prediction of mutagenicity using molecular maps of atom-level properties (MOLMAPs) and empirical physicochemical descriptors
Joao Aires-de-Sousa and Qing-You Zhang. REQUIMTE and Department of Chemistry, New University of Lisbon, campus FCTUNL, 2829-516 Caparica, Portugal
Fast-to-calculate empirical physicochemical descriptors were investigated for their ability to predict mutagenicity (Ames test) from the molecular structure. Global molecular descriptors obtained by PETRA, and molecular maps of atom-level properties (MOLMAP descriptors) were used to train Random Forests. Error percentages as low as 15% and 16% were achieved for an external test set with 472 compounds and for the training set with 4083 structures, respectively. High sensitivity and specificity were observed. Random Forests were able to associate meaningful probabilities to the predictions, and to explain the predictions in terms of similarities between query structures and compounds in the training set. The use of physicochemical descriptors gave the model some ability to make predictions for functional groups not used in the training. At the same time, MOLMAP descriptors enabled the association of structural features to mutagenicity, without explicit encoding of molecular fragments.
Refinement of metabolic logic for biodegradation pathway prediction
Yogesh R Kale, Biotechnology Institute, University of Minnesota, 1479 Gortner Avenue, Suite 140, saint paul, MN 55108, Lynda BM. Ellis, Department of Laboratory Medicine and Pathology, University of Minnesota, Mayo Mail Code 609, 420 SE Delaware Street, Minneapolis, MN 55455, and Lawrence P. Wackett, Department of Biochemistry, Molecular Biology and Biophysics, Biological Process Institute, University of Minnesota, St. Paul, MN 55108.
The Pathway Prediction System (PPS, http://umbbd.ahc.umn.edu/predict/), a feature of the University of Minnesota Biocatalysis / Biodegradation Database (UM-BBD), proves a useful tool for estimating the fate of chemical compounds in the environment. The metabolic logic is encoded in form of about 230 biotransformation rules, which form the basis of pathway prediction. The system is being refined to prioritize the biotransformation rules by assigning to them a relative likelihood. Expert ranking under standard aerobic conditions is used and is the basis for further refinement. Estimation of thermodynamic feasibility of the predicted pathway would provide additional information in this assignment. The analysis of natural biodegradation pathways would form the basis for this approach. The system currently being developed for biodegradation pathway predictions can be extended to predict a general biotransformation. The likely users of PPS would be government regulators, environmental agencies, industrial units manufacturing new chemicals and academicians.
2-D Structure depiction
Alex M. Clark and Paul Labute. Research & Development, Chemical Computing Group, Inc, 1010 Sherbrooke St West, Suite 910, Montreal, QC H3A2R7, Canada
A new algorithm for the layout of 2D coordinates suitable for structure diagrams has been developed, which produces a very high incidence of presentation quality output for a large subset of organic chemistry. The methods will be discussed, with particular attention to resolving troublesome cases. Validation studies which demonstrate the overall efficacy will also be presented.
Estimating 1H NMR coupling constants with ANN models for chemical shifts: Spectra simulation in the SPINUS system
Joao Aires-de-Sousa and Yuri Binev. REQUIMTE and Department of Chemistry, New University of Lisbon, campus FCTUNL, 2829-516 Caparica, Portugal
Fast and accurate predictions of 1H NMR spectra of organic compounds play an important role in automatic structure elucidation and validation. The SPINUS program is a feed-forward neural network (FFNN) system developed over the last eight years for the prediction of 1H NMR properties from the molecular structure. It was trained using a series of empirical proton descriptors. The FFNNs were incorporated into Associative Neural Networks (ASNN), which correct a prediction obtained by the FFNNs with the observed errors for the k nearest neighbours in an additional memory. Here we show a procedure to estimate coupling constants with the ASNNs trained for chemical shifts. Now a memory of coupled protons and the experimental coupling constants is used. The ASNNs find the pairs of coupled protons most similar to a query, and these are used to estimate coupling constants. A web interface for 1H NMR spectra prediction is presented.
Future of electronic reference books in the chemical information industry William A. Woishnis
William A. Woishnis, Knovel Corporation, 13 Eaton Avenue, Norwich, NY 13815
Journals, A&I databases and patent databases are the historical drivers of the Chemical Information Industry. For cutting edge research this continues to be the case. However, for applied scientists and many researchers, chemical reference books and databases are an important staple and their migration to the electronic arena brings with it opportunities. Science publishers such as Springer, John Wiley, Elsevier and McGraw Hill have electronic book initiatives. Value added aggregators are also playing an important role. This presentation explores the evolving landscape of electronic chemical reference books and what the future has in store. Issues such as chemical structure searching, reference linking, new data types (video, flash presentations), the role of generic search (Google, Yahoo) and the impact of additional interactive features (tables, graphs, equations) will be addressed. The perspectives of publishers, value added resellers, librarians and end users will be covered.
SemDrug: Application of semantic relationship discovery to expedite lead identification
Vasudevan Chandrasekaran1, Karthik Gomadam2, Amit P Sheth2, and J. Phillip Bowen3. (1) Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA 30605, (2) LSDIS lab, Department of Computer Science, University of Georgia, Athens, GA 30605, (3) Center for Drug Design, Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, 400 New Science Building, PO Box 26170, Greensboro, NC 27402-6170
The sheer volume of existing information and the anticipated explosion of data generated in the life sciences domain pose a major hurdle in drug discovery research. Although a significant proportion of this data is organized in a structured form, the relationship between these data and their interpretation has not been fully exploited. The application of semantic techniques to drug discovery will facilitate in extracting and understanding relationships, for instance between genes and diseases or compounds and side effects which are fundamental for drug discovery. The focus of this semantic approach is to build multiple ontologies that can help in representing the relationships between different domain information. By understanding the complex relationship between these data and eliminating unwanted information, the process of lead identification in drug discovery can be expedited. This can have a significant impact on drug discovery productivity in pharmaceutical companies. We have developed a prototype system, wherein we have exploited the relationships between drug targets, bioactive compounds and their chemical information to answer questions critical to speeding the process of lead identification.