#233 - Abstracts
ACS National Meeting
March 25-29, 2007
Fanfare for an uncommon man: a tribute to Gary Wiggins' contributions to the chemical information profession
F. Bartow Culp, email@example.com, Mellon Library of Chemistry, Purdue University, 504 West State Street, West Lafayette, IN 47907-2058
For over 30 years, Gary Wiggins has ably served the information community as librarian, author, administrator, and mentor. He has been a trailblazer in many areas of our profession. Gary has been instrumental in the development of the Chemical Information specialty at the Indiana University School of Library and Information Science, authored a standard textbook on the subject of chemical information, and created the CHMINF listserv, among his many other accomplishments. In this symposium, we honor his contributions to our profession.
Changing nature of academic librarianship: Implementing a distributed institutional repository
Jeremy R Garritano, firstname.lastname@example.org, Mellon Library of Chemistry, Purdue University, 504 W. State St., West Lafayette, IN 47907
Librarians at Purdue University are adapting to the implementation of a distributed institutional repository (DIR). Traditionally, a librarian's role in the cycle of scholarly communication has been focused on the products of research: books, articles, dissertations, etc. The DIR is pushing the librarian's role further back in the cycle by assisting and advising researchers on how their data can best be collected, tagged, and stored. In some cases, librarians are even becoming embedded within research groups. Once in the DIR, these data sets and other non-traditional types of information are just as accessible as books and journal articles. The DIR is opening up new avenues of research for librarians and allowing for renewed interactions with faculty across disciplines. This paper will look at how the role of an academic librarian is changing in regard to the implementation of a DIR on campus, including successes and pitfalls.
Hands-on learning: developing a creativity collection
Ted Baldwin, Ted.Baldwin@uc.edu, College of Applied Science Library, University of Cincinnati, 2220 Victory Parkway, ML0103, Cincinnati, OH 45206-2839
Academic libraries are continuously expanding and redefining their roles, in order to retain their relevance and contribute to the learning objectives of their institutions. This presentation will describe how a library serving undergraduate students in the applied sciences has carved out a new role through the development of a creativity collection. The collection contains hands-on learning resources to spark learning and innovation, including chemical modeling kits, construction kits, scientific toys, and art supplies. These items feature prominently in the library, and have been utilized in both formal and informal learning settings. The speaker will detail sources of inspiration for the collection, the contents and collecting parameters, and the impact of the collection on undergraduate perceptions of libraries.
Challenges in Developing a Global Alerting System
Leah Sandvoss, Research Informatics, Pfizer, 10677 Granby Way, San Diego, CA 92126
In the competitive environment of the pharmaceutical industry, keeping scientists, clinicians, and marketing professionals informed of the latest public information on competing products and companies is key to understanding the position of internal drug projects. In order to track this information, many companies have information specialists whose sole responsibility it is to deliver cutting edge information to end-users. The information in question can come from a variety of sources, and include content types such as news, literature, patents, and pipeline reports. The most up-to-date content is often called an "alert" and the delivery of such referred to as the "selective dissemination of information (SDI)". The alert sources combined often include overlapping references, and as such are delivered separately in a number of different formats. The differing formats can be somewhat overwhelming and confusing for the end-user. A plausible solution to this challenge is to create an overarching system to combine the delivery of the different information types from multiple sources. Ultimately, the results are presented in one unified interface, in a common format, with links to full-text articles where available. Although collating this information through such a system might seem a simple task, there are many challenges to overcome. This presentation will discuss some of the major issues which can arise in developing a global alerting system.
Corporate libraries: evolving as the electronic resources evolve
Marilynn J Dunker, email@example.com, Intellectual Property & Business Information Services, The Procter & Gamble Company, 6280 Center Hill Ave, BB3N286, Cincinnati, OH 45231
Faced with serving a global community of researchers in a large consumer goods company, the libraries at Procter & Gamble have been evolving. What was once a series of local physical libraries with a few databases on a LAN has turned in to a global information research organization providing a virtual library and service to researchers regardless of location. The development & evolution of electronic resources has been one of the driving forces.
Towards a global chemical knowledgebase
Peter Murray-Rust, firstname.lastname@example.org, Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, United Kingdom
The world's approach to information is being transformed by pervasive resources such as Google, Flickr, YouTube, and many others. In our homes we presume instant access to knowledge - reference, news, services, etc. This is rapidly changing many aspects of scientific information, including semantic data, new methods or publication, automated and customised readers. The academic world is reacting patchily, with some exciting innovations in Open repositories and instant publication, but is also tied to conventional methods of scholarship and dissemination. This presentation will show live demos of what the future for chemistry might look like - a global knowledgebase without a fixed centre where machines remove the drudgery of search, calculation and analysis. The primary challenge is not technology but the social adoption of a radically new world.
Integrating text and literature sources with traditional chemoinformatics tools
David J Wild, email@example.com, School of Informatics, Indiana University, Bloomington, IN 47408
This presentation will review developments in text searching, natural language processing and the semantic web that enable structures to be automatically extracted from documents and fed to chemoinformatics tools like docking, cluster analysis and similarity searching. Examples will be given of the use of web service workflows for performing scientifically interesting tasks on structures extracted from literature, and the potential future directions for the exploitation of text and literature information will be discussed.
When will the evolution of chemical information on the Internet turn into a revolution?
Stephen R. Heller, firstname.lastname@example.org, Physical and Chemical Properties Division, NIST, Gaithersburg, MD 20899-8380
Decades ago computers were devices known and familiar to only a few people around the world. Today computers and the Internet are known and familiar to virtually everyone person on earth. In the area of chemistry, first computers and then the Internet have led to an evolution of information and data available to all. The Internet has also led to a revolution in the type and amount of information available. This presentation will describe where chemical information was just a few decades ago, where it is now, and when this slow, but mounting evolution of information and data are likely to become a revolution and change the world of information providers. Open Access journals (e.g., Beilstein Journal of Organic Chemistry and Chemistry-Central), Open Source standards (e.g., InChI), and Open Data (e.g., NCI and PubChem) will provide examples of this changing world.
The promise and reality of turning chemical literature into information
Thompson N. Doman, DOMAN_THOMPSON_N@Lilly.com, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN 46285
The fields of computational chemistry and especially chemical informatics have advanced greatly in recent years, and vastly more sophisticated analyses of chemical results are possible these days. However, due to technical, business, and political realities, there are still significant barriers to conducting these analyses in a timely fashion. Using a recent case study, I will highlight the challenges in turning traditional 2D chemical graphs found in many chemistry-related publications into 3D molecular models.
The present and future of informatics in chemistry
Trevor W. Heritage, T.Heritage@mdl.com, Chief Scientific Officer, Elsevier MDL, 2440 Camino Ramon, Suite 300, San Ramon, CA 94583, Phil McHale, email@example.com, Solutions Product Management, Elsevier MDL, 2440 Camino Ramon, Suite 300, San Ramon, CA 94583, and Tim Hoctor, firstname.lastname@example.org, Academic, Government, & International Markets, Elsevier MDL, 14600 Catalina Street, 60 Columbia Ave Bldg B, Morristown, NJ 07960.
Chemistry has evolved from hardcopy textbooks and journals to online access and directly to actionable data. This program will outline how changing technology drives informatics systems. In particular we will examine the state of the art in terms of electronic access to chemistry data, including integrated content systems from commercial data providers, as well as a discussion of future trends such as Web services.
Bibliometric analysis of chemoinformatics
Peter Willett, email@example.com, Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom
Chemoinformatics has come to the fore as a specialist discipline quite recently. This presentation will review the place of chemoinformatics in the chemical literature, highlighting the core journals in which chemoinformatics articles appear and the current main areas of research, and conducting citation analyses of the principal research groups and research workers.
Still searching for the perfect fingerprint
Robert D Brown, firstname.lastname@example.org, SciTegic, Inc, 9665 Chesapeak Dr. #401, San Diego, CA 92123
In the mid 90s the author, together with Yvonne Martin, published research results that sought to identify the most appropriate structural descriptors and clustering methods for library design and lead optimization. The goal was to identify methods that were best able to group similarly active molecules together and separately from inactives. The results, which were somewhat surprising at the time, showed that fingerprints based on 2D structure outperformed those based on 3D, and that in particular a fingerprint from the MDL MACCS system was preferred. Brown and Martin then showed that the 2D descriptors better captured information relevant to ligand-receptor binding. The quest for better fingerprints has continued within many research groups since then and most recently circular substructure fingerprints have shown good utility in activity prediction. In this paper, we review the original fingerprint work, discuss some of the latest approaches to fingerprint descriptors, including the development of a tautomer-independent circular substructure fingerprint, and compare the original results to those from the latest descriptors.
Virtual screening for new chemotypes using compound similarity measures
Ingo A. Muegge, email@example.com, Medicinal Chemistry, Boehringer Ingelheim Pharmaceuticals Inc, 900 Ridgebury Road, Ridgefield, CT 06877
Compound similarity-based virtual screening experiments have been conducted using a variety of different drug targets, 2D and 3D descriptors, and ranking approaches. Particular attention has been paid to assembling data sets such that each active compound represents its own unique chemotype. This condition guarantees that a similarity recognition event between active compounds constitutes a scaffold hopping event at the same time. In a series of virtual screening studies involving 7 drug targets with the number of actives varying between 4 and 13 and 9969 MDDR compounds as negative controls it has been found that atom pair descriptors, SciTegic fingerprints, and 3D pharmacophore fingerprints combined with ranking, voting, and consensus scoring strategies perform well in finding new bioactive scaffolds. The performance of descriptors largely depends on the structure of the database of compounds subjected to a virtual screen. If topological biases exist between actives, as is often the case when literature data sets are used in recall experiments, 2D topological fingerprints often perform best. However, if such biases do not exist as often the case when independent compound collections are screened, pharmacophore descriptors perform well. A comparison of virtual screening performances achieved with structure-based and compound-similarity based methods will be presented also.
Lead-like, drug-like or “pub-like”: How different are they?
Tudor I. Oprea, firstname.lastname@example.org, Division of Biocomputing, University of New Mexico School of Medicine, MSC 084560, 1 University of New Mexico, Albuquerque, NM 87131-0001
Trends in probe, lead and drug discovery are evaluated using the following compound categories: 385 leads, and the 541 associated drugs; “active” (152) and “inactive” (1488) compounds from the Molecular Libraries Small Molecule Repository (MLSMR) tested by HTS; “active” (46) and “inactive” (72) compounds from Nature Chemical Biology (NCB) tested by HTS; MDDR drugs (phases I, II, III and launched); and medicinal chemistry compounds from WOMBAT, split into high-activity (5,787 compounds with nM activity) and low-activity (30,691 with ìM activity). Molecular weight (MW), complexity, flexibility, the number of hydrogen bond donors and acceptors, the octanol/water partition coefficient estimated by CLogP and ALOGPS, the intrinsic water solubility estimated by ALOGPS and Rule of five (Ro5) violations were considered. Using the 50% and 90% distribution moments, we noticed no difference between leads and MLSMR/NCB “actives”. “Inactives” from NCB and MLSMR exhibit similar properties. These combined sets (“Actives”, 569 compounds) are less complex, less flexible, and more soluble than drugs (1,688 drugs), and significantly smaller, less complex, less hydrophobic and more soluble than the 5,787 high-activity WOMBAT compounds. These trends indicate that chemical probes (“pub-like”) are similar to leads with respect to complexity, solubility, and hydrophobicity
Computer-Aided Drug Design: the next twenty-five years
John Van Drie, email@example.com, Novartis Institutes for Biomedical Research, 250 Mass AVe, Cambrige, MA 02139
In a somewhat-serious, somewhat-lighthearted way, prognostications will be made on what the future holds for our field. Drawing upon the past, and in particular on the many roles that Yvonne Martin has played in that past, some possible future evolutionary paths of this field will be sketched. The areas for greatest potential impact on drug discovery and design will be highlighted. Also, today's key outstanding issues will be framed, to draw attention to these issues for students and others new to this field, in the hope that they will be the biggest determinant of what the future holds for CADD.
What I learned from a career in computer-assisted molecular design
Yvonne Martin, firstname.lastname@example.org, private consultant, 2230 Chestnut St., Waukegan, IL 60087
A fads come and go in CAMD, certain truisms remain: 1.) It is better to use an old or inaccurate method to help solve the real problem of your experimental collaborators than to use a fancy method that solves a problem that they don't have. 2.) It is often more fruitful to use computational methods to help set priorities than to use them to suggest new, and potentially time-consuming, directions. 3.) Encourage your collaborators to come up with tests of your predictions. This way if the predictions are correct, everyone is happy and if they are incorrect, you have more work to do but have gained scientific respect. 4.) Pay attention to the needs of your collaborators. If you don't have a method to solve a particular problem, let it rest and someday someone, maybe you, will come up with a solution.
ChemDB: A public database of small molecules and related chemoinformatics resources
Jonathan Chen, email@example.com, Erik Linstead, S. Joshua Swamidass, Dennis Wang, Yimeng Dou, and Pierre F. Baldi, firstname.lastname@example.org. Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697
ChemDB is a chemical database containing over 4M commercially available small molecules. The data is publicly available over the Web for download and for targeted search using a variety of powerful methods. The chemical data includes predicted 3D structure, ideal for docking and other studies, and physicochemical properties such as solubility. Recent developments include optimization of chemical structure (and substructure) similarity search algorithms enabling full database searches in less than a second. A text-based search engine allows efficient searching of compounds and over 65M vendor annotations, such as systematic and common names, and fuzzy text matching capabilities that yield productive results even when the correct spelling of a chemical name is unknown. Finally, built in reaction models enable searches through virtual chemical space, consisting of hypothetical products readily synthesizable from the building blocks in ChemDB. ChemDB is available at http://cdb.ics.uci.edu.
Characterization of spectra and other analytical data via combination of two methods: multivariate processing and overlap density heatmap visualization
Gregory M. Banik, email@example.com, Michelle D'Souza, firstname.lastname@example.org, and Marie Scandone, email@example.com. (1) Bio-Rad Laboratories, Informatics Division, 3316 Spring Garden Street, Philadelphia, PA 19104, (2) Informatics Division, Bio-Rad Laboratories, Inc, 3316 Spring Garden Street, Philadelphia, PA 19104
The use of methods such as Principal Component Analysis (PCA) to perform multivariate analyses on spectral and chromatographic data has used for years in the field of analytical chemistry. In this study, we will introduce a new method that combines a second—patent pending—technology known as Overlap Density Heatmap (ODH). ODH allows the user to explore data similarities and dissimilarities in large databases by providing information about the most and/or least commonly occurring spectral or chromatographic features in a data set(s).
In this session, the combination of these two approaches for spectroscopic analysis will be explored through a series of successful applications and case studies.
Novel visualization techniques for the analysis of molecular properties
Joseph Corkery, firstname.lastname@example.org, Brian Kelley, email@example.com, Kevin Schmidt, firstname.lastname@example.org, Mark McGann, email@example.com, Robert Tolbert, and Anthony Nicholls. OpenEye Scientific Software, American Twine Office Park, 222 Third Street, Suite 3211, Cambridge, MA 02142
Visualization remains an important tool in the analysis of molecular properties. We have developed new methods to aid in the small and large-scale analysis of molecular properties and the subsequent communication of information to others. For example, we now have a unique visual style of mapping specific atom, bond or molecule properties back onto the source atom or bond. Such mappings can be applied to properties from a variety of sources including pKa, solubility, strain energy, QSAR/reverse-QSAR, goodness-of-fit to electron density, docking scores, and many others. A potential-field style of visualization has also been developed for Cartesian-dependent properties such as ligand-protein interactions. These visual tools coupled to integrated data analysis can provide an excellent platform for elucidating specific areas for ligand optimization. Examples will be presented from docking, 3D shape comparisons, and crystallography.
Spectral clustering of chemical datasets
Rajarshi Guha, firstname.lastname@example.org and David J Wild, email@example.com. School of Informatics, Indiana University, 1130 Eigenmann Hall, 1900 E 10th Street, Bloomington, IN 47406
Spectral clustering utilizes matrix decompositions to transform a dataset of n-dimensions to a lower dimensional subspace within which clustering can be performed. The most common decomposition used is the SVD and it has been shown that the SVD of a data matrix represents a clustering. We investigate the use this approach in the clustering of an Ames mutagenecity dataset and an aqueous solubility dataset. We also investigate the use of the fast SVD algorithm which approximates the SVD of a matrix. Our results indicate that the approximation algorithm leads to an order of magnitude speedup. Furthermore the clustering results are similar to those obtained using traditional patritional clustering algorithms.
Data analysis and visualization: Some case studies
Donald Walter, Don.Walter@Thomson.com, Customer Training, Thomson Scientific, 1725 Duke Street Suite 250, Alexandria, VA 22314
Collecting, sorting and analyzing information from the technical and patent literatures to give an unbiased understanding of the scientific and competitive landscapes can be difficult. Each step involves choosing techniques which may skew the answers. Often one approach leads to hypotheses which must be tested with other approaches.
In this presentation, we will compare and contrast different approaches to understanding the literature and competitive intelligence on a given question. For example, we will see whether, the patent and technical literatures lead to the same picture of the state of the art; illustrate the use of perhaps little known tools in common applications, as well as some specialized tools; and more.
Right now, approximating just right! Chemical information resources for small/all [Canadian] colleges and universities
Lai Im Lancaster, firstname.lastname@example.org and Brian Maurice Lynch, email@example.com. Department of Chemistry and Angus L. MacDonald Library, St. Francis Xavier University, 1 West Street, Antigonish, NS B2G 2W5, Canada
In earlier CINF presentations we illustrated low-cost recording/storage and possible integration of audio files with slide shows and movies. Now, we describe further applications providing background material relevant to our local chemical informatics resources ["cinf-R"] and to inter-university distribution of keynote research presentations at regional conferences. The cinf-R patterns differ in Canadian degree-granting institutions from most other countries; a national consortium [Canada Research Knowledge Network] provides free access for every college and university researcher. We will demonstrate local electronic access to the complete full text and abstracts sets of ACS and RSC journals, to the complete set of the Canadian Journal of Chemistry, and also to references accessed from the World of Science, Wiley InterScience, Elsevier, ProQuest Research Library et al., extrapolating to ACS conferences. Webcasting and podcasting of key files of our undergraduate senior research seminars for the 2005-2006 and 2006-2007 academic years will be illustrated, including videoconferencing aspects.
Information literacy in the chemistry major: Stretching our money at Augustana College
Connie Ghinazzi, firstname.lastname@example.org, Dell Jensen Jr., DellJensen@augustana.edu2, and Richard Narske2. (1) Tredway Library, Augustana College, 639 38th Street, Rock Island, IL 61201, (2) Department of Chemistry, Augustana College, 639 38th Street, Rock Island, IL 61201
Augustana College, a four year liberal arts institution with 2400 students, has an ACS accredited Chemistry department that routinely integrates literature research into their curriculum. In the past five years, Tredway Library at Augustana College has significantly changed its chemistry collection practices to meet the current and future researching needs of our students. This presentation provides a model for other small libraries to maximize their budgets by giving examples of effective faculty/librarian collaboration in assignment design, database selection, and reference collection purchases.
Delivering chemical information in the age of tight budgets: Faculty and librarian cooperation at Trinity University
Steven M. Bachrach, email@example.com, Department of Chemistry, Trinity University, 1 Trinity Place, San Antonio, TX 78212 and Barbara MacAlpine, Barbara.MacAlpine@trinity.edu, Coates Library, Trinity University, 1 Trinity Place, San Antonio, TX 78212.
Trinity University chemistry faculty and librarians have followed a model of strong cooperation to beat the battle of the budget over information resources. Recognizing that shrinking financial resources requires creative solutions, we have explored new ways to insure that we deliver the appropriate combination of resources, principally databases and journals, to support information literacy for our undergraduate students and our active research program. The talk will discuss the difficult choices that were made and how the active participation of both faculty and library staff helped make implementation of these changes as seamless as possible.
SciFinder Scholar, Chemical Abstracts Student Edition or General Science Abstracts: which should you ask your library to purchase?
Patricia Kirkwood, firstname.lastname@example.org, University of Arkansas Libraries, University of Arkansas, 365 N. N. McIlroy Ave, Fayetteville, AR 72701-4002
Small schools have limited subscription dollars and science databases are expensive. There are many possible options with newer plans for small institutions that share seats and pricing, SciFinder Scholar has become one possibility for undergraduate needs. It has the advantage of serving researcher needs as well as instruction for students. Is it the better choice given the dollars and resources involved? Chemical Abstract Student Edition is available through OCLC FirstSearch with small college discounts as well. It has some advantages, like 24/7 availability and off campus access and it has two things non-science librarians understand -- the First Search interface and indexing for a selected number of journals that are more likely to be available to the student immediately. General Science Abstracts from H.W. Wilson, indexes all ACS journals including Journal of Chemical Education. It is an alternative that would allow the undergraduate chemistry student access to the most important literature of chemistry and be useful for other science students as well. Is it sufficient? This paper will discuss the advantages and disadvantages of these databases for lower division undergraduate course work. Committee on Professional Training guidelines will be addressed.
Make the most of what you have: use Scifinder Scholar as a collection development tool
Donna R. Resetar, Donna.Resetar@valpo.edu, Christopher Center for Library and Information Resources, Valparaiso University, 1410 Chapel Drive, Valparaiso, IN 46383
Money and materials are not the only resources in short supply at the smaller academic institution. Staff are also limited and most librarians serve in multiple roles. Time for special projects is scarce. Under these circumstances, it is very difficult to do a citation analysis or journal use study as detailed as those published in the library literature. However, SciFinder Scholar can be used to learn where some science faculty and students publish and what journals they cite, and thus provide concrete data to help support or negate a subscription list. This presentation will describe how to use SciFinder Scholar to do an efficient journal-use/citation analysis for your institution.
Meeting the challenge of "The New Biology" for college libraries and librarians in the post-genomic era
Frederick Stoss, email@example.com, Science and Engineering Library, University at Buffalo - SUNY, Buffalo, NY 14260
Sequencing the Human Genome was one of the greatest scientific achievements in history. The results unleashed a wave of research in molecular and structural biology giving rise to "The New Biology" of genomics, proteomics, bioinformatics, systems biology. Research universities have larger budgets than colleges to support libraries addressing these new endeavors. Four-year liberal arts colleges often serve as academic incubators for incoming cohorts of students into Masters and Doctoral programs at larger universities and must adequately prepare these undergraduates. This presentation provides insights for college librarians to access the new generation of genomic databases, describes development of essential monograph and journal collections, addresses reference services and librarian expertise in these subjects, and describes roles college librarians can play in innovative librarian-faculty collaborations. The later aspect of this presentation will outline various strategies for library and librarian outreach to college faculties and students, particularly among the life, physical, and computational sciences.
Analysis of documents pertaining to the phenomena of RNA interference
Brian Sweet, firstname.lastname@example.org, Product Marketing, CAS, Olentangy River Rd., Columbus, OH 43210
RNA interference (RNAi) prevents genes from being transcribed into proteins. The therapeutic potential of RNA interference for infectious diseases, cancer and a variety of illnesses is being actively explored by biotechnology and pharmaceutical companies throughout the world. The promise of this technology was recently recognized when the 2006 Nobel Prize in Physiology or Medicine was awarded to the discoverers of RNA interference, Andrew Fire and Craig Mello, only eight years after they made their discovery based on research involving the roundworm Caenorhabditis elegans. Based on the body of literature citing these authors' seminal paper on RNAi, and using analysis and visualization technology, we'll review the institutional leaders in exploiting their discovery, the leading researchers in this area, and show how this technology is being applied to pharmacology and medicine.
Uncovering competitive technology intelligence from chemical information in patent databases
Bob Stewart, email@example.com, Dialog, Thomson Scientific, 3501 Market Street, Philadelphia, PA 19104-3302
There is a large amount of chemical information contained in commercially available patent databases. Uncovering that information is relatively straightforward, but converting it into intelligence is often more challenging. This presentation will examine several methods for turning chemical information from patents into actionable intelligence. At one end of the spectra are tools that are readily available to the majority of information professionals, such as Microsoft Excel and simple analysis tools that are built into commercial search engines. At the other end are sophisticated text and data mining tools that can uncover hidden intelligence from lists, matrices and relationship maps. The presentation will show that it is often possible to uncover intelligence with the simpler tools, while discussing situations where the more sophisticated tools add value.
Text visualization in chemistry: Roadblocks and rewards
Jeffrey D. Saffer, firstname.lastname@example.org, OmniViz, Inc, Two Clock Tower Place, Suite 600, Maynard, MD 01754
Decision-making in chemistry can be effective only when done in context of all relevant information. With at least 75% of that information in the form of text documents - such as journal articles and patents, the volume of information that has to be assimilated is huge. Data visualization thus becomes a required step in making effective decisions. However, the ambiguities in language and the sheer volume of data are roadblocks to the process. I will discuss recent efforts to visualize extreme volumes of data and work to disambiguate chemical literature. These efforts lead to the reward of useful visualizations that support integrated analysis of text with experimental and clinical data.
A new key-based molecular fingerprinter for visualization and data analysis in compound clustering, similarity searching, and substructure commonality analysis
Norah E. MacCuish, email@example.com and John D. MacCuish, firstname.lastname@example.org. Mesa Analytics & Computing, LLC, 212 Corona St., Santa Fe, NM 87501
We present a new, key-based fingerprinter designed specifically for the visualization and data analysis in clustering, similarity searching, and substructure commonality analysis. The use and efficacy of the fingerprinter in clustering and similarity searching is shown using an interactive hierarchical clustering tool and cost analysis ROC curves respectively. Visualization and data analysis of substructure commonality analysis is explored with the interactive ChemTattoo program.
Selection of commercially available lead discovery compounds potentially active against P. falciparum methionine aminopeptidase by substance analysis and clustering
Anthony J. Trippe, email@example.com, New Product Development, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505
This presentation will demonstrate new developments in pharmaceutically intuitive methods for visualizing chemical space by organizing and clustering large numbers of chemical substances. A practical application of this technology will be provided in the form of a case study examining a recently identified methionine aminopeptidase enzyme. An inhibitor of this enzyme has been identified using high throughput screening. Additional, commercially available candidates will be identified using the substance clustering and visualization methods recently developed by CAS.
Effective teaching requires comprehensive reaction databases
Valentina Eigner Pitto, firstname.lastname@example.org, Josef Eiblmaier, Hans Kraut, Heinz Saller, and Peter Loew. InfoChem GmbH, Landsberger Straße 408, D-81241 München, Germany
SPRESIweb is a Web application that enables access to the SPRESI data: a collection of 5 million molecules, 3.7 million reactions and 28 million factual data items abstracted from the most representative journals in the field of organic chemistry. In this lecture the experiences and opinions of teachers who are using SPRESIweb in undergraduate institutions in Europe and the US will be presented. Focusing on structure, reaction and reference searches in chemistry we will show the features of the Web application that are used most and, in particular, those aspects that help faculty in teaching students scientific information literacy skills. Examples given are the advanced options for defining specific query features (such as lists/not lists or R-groups) and the results of an optimized query submission on the hit list retrieved. Another significant feature is a new tool, Name Reactions that provides implicit definitions of complex reaction substructure queries, and assists teaching and retrieval of examples for reaction mechanisms.
ChemgaPedia Enzyclopedia: A new electronic visualization program for teaching and learning organic chemistry
Guenter Grethe, email@example.com, Consultant, 352 Channing Way, Alameda, CA 94502-7409
ChemgaPedia Enzyclopedia, a web-based program for e-learning and e-teaching, evolved from the German government sponsored project “Vernetztes Studium” to which 16 groups from European universities contributed their expertise and knowledge. The German version, covering all aspects of chemistry and related topics, features more than 15,000 pages with 25,000 media elements and 900 exercises as well as glossaries and biographical entries. Further development of this project is now in the hands of FIZ CHEMIE Berlin. Translation into English of the Organic Chemistry part is nearly completed. The informative text is illustrated with graphic representations and 3D models and supported by video clips, animations and exercises to enhance the learning process. The material is designed for self-studies on the undergraduate level as well as for instructors who can include their own material for a more personalized course. We will provide an overview of the system and discuss some examples.
Creation of an instructional module for small college science librarians highlighting free chemistry resources and their use in undergraduate instruction
Susan K. Cardinal, firstname.lastname@example.org, Carlson Science & Engineering Library, University of Rochester, Carlson Library, Rochester, NY 14627 and Carrie L. Newsom, email@example.com, Marston Science Library, University of Florida, PO Box 117011, Gainesville, FL 32611.
The Chemical Education Committee of ACS CINF is creating modular instructional materials for the web. One module will teach science librarians of small colleges about free or nearly free resources and how the resources may used to teach chemistry undergraduates about chemical information. We will report on our progress and give a preview of the format and contents of the module.
One answer: Online access to chemical information at community colleges, but what are the questions?
R. G. Landolt, firstname.lastname@example.org, Department of Chemistry, Texas Wesleyan University, 1201 Wesleyan Street, Fort Worth, TX 76105
From ACS Guidelines for Chemistry Programs in Two-Year Colleges: “Because of the increasing volume and complexity of chemical literature, students are no longer able to acquire skills in information retrieval without some formal instruction. … This can be accomplished in many ways, such as cooperative library arrangements and electronic access. ” Project UCAIR (Undergraduate Cooperative Access to Information Resources) has demonstrated how students may relate concepts learned in chemistry to published research, to attack "real-world" problems. A cost-effective Chemical Abstracts (CA) search process will be described in which Internet-based access has been used in undergraduate laboratories. Experience gained through UCAIR will be described, and feedback will be sought, to optimize benefits of using electronic journals as well as CA in the Community College environment.
A novel cheminformatics study of non-peptidic HIV protease inhibitors using machine learning and statistical too
Barun Bhhatarai, email@example.com, Srinivas Alla, firstname.lastname@example.org, Chad R Bernier, email@example.com, Rajni Garg, firstname.lastname@example.org, and Sunil Kumar2. (1) Department of Chemistry, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810, (2) Electrical and Computer Engineering department, SanDiego State University, San Diego, CA
A combinatorial approach (QSARomics) has been applied to study a large dataset of non-peptidic HIV protease inhibitors retrieved from literature. This fusion-based chem-informatic study integrates the results of models obtained using different statistical and machine-learning techniques. Descriptors were calculated (CODESSA, MOE) using SMILES input. Genetic Algorithm (GA) and Principal Component Analysis (PCA) were used for feature selection. Several linear and nonlinear QSAR models were developed using CODESSA, C-QSAR, WEKA, Matlab and others. The relationship between biological activity and the important descriptors/components obtained after feature selection was analyzed using Neural Network models. Comparative analysis of these results; with emphasis on similarities and differences, with Multiple Linear Regression (MLR) and Partial Least Square (PLS) results will be presented. We hope that the results of this research will be helpful in further optimizing the structure of non-peptidic HIV-PIs and provide lead for development of new drugs active against emerging mutant virus.
Pharmacokinetic modeling of anti- HIV protease ritonavir analogs
Raghava Chaitanya Kasara, email@example.com, Barun Bhhatarai, firstname.lastname@example.org, and Rajni Garg, email@example.com. (1) Chemistry Department, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13676, (2) Department of Chemistry, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810
Quantitative Structure-Activity Relationship (QSAR) study is being increasingly used to predict and rationalize pharmacokinetic and pharmacodynamic profile of drugs. The extension of QSAR technique to the pharmacokinetic data has led to emergence of new tool called QSPkR (quantitative structure pharmacokinetic relationship) studies, which can be employed at early stage of drug development to select potent molecule out of a series of biologically active molecules (congeners). QSAR and QSPkR models were developed using multiple linear regression (MLR) analysis method on antiviral and bioavailability data of HIV protease inhibitors (Ritonavir analogues). The biological data was taken from the literature. Our results show that the antiviral activity and bioavailability of these inhibitors is highly dependent on their hydrophobicity along with some other important parameters. Besides providing mechanistic insight, these models also have the potential to be used as in-silico virtual screening tool for predicting pharmacokinetic profile of HIV protease inhibitors.
Understanding the effect of benchmark dataset composition on the validation and optimization of ligand based virtual screening using self-organizing maps
Sebastian G. Rohrer, firstname.lastname@example.org and Knut Baumann. Institute of Pharmaceutical Chemistry, Technical University of Braunschweig, Beethovenstr. 55, 38106 Braunschweig, Germany
A common finding of many reports evaluating VS methods is that validation results vary considerably with changing datasets. It is assumed that these dataset specific effects are caused by the self-similarity and cluster structure inherent to those datasets. Self-Organizing-Maps (SOMs) were used to analyze the structure of several published benchmark datasets. Utilizing the fact, that SOMs preserve dataset topology, a SOM-based quantitative measure for dataset diversity is introduced. It is shown, that the redundancy and inherent self-similarity of the datasets lead to general overestimation of all figures of merit. We demonstrate a linear relationship of the stability of VS validation results and dataset diversity, which can be used to quantify the robustness of a method. Furthermore, a quick and intuitive way to detect cases, when a method is not suited to model datasets with certain properties is provided. Our finding that a method's stability linearly decreases with dataset diversity has an important implication: when knowledge of active substances is sparse, which is usually the case at the beginning of real-life VS campaigns, the application of VS parameters optimized on self-similar datasets is misleading. We propose a procedure that finds a robust method for a given target by utilizing knowledge of a similar target protein with sufficient activity data. Target similarity is quantified by a spatial pharmacophore-hotspot guided alignment of binding pockets. Moreover, a methodology for conservative estimations of both expected hit rate and scaffold hopping potential is provided by statistical analysis of published case studies.
Data Mining of NIH DTP Human Tumor Cell Line Screen Data for Anticancer Drug Discovery
Huijun Wang, email@example.com and David J Wild, firstname.lastname@example.org. School of Informatics, Indiana University, 1200 S Rolling Ridge Way, #1001, Bloomington, IN 47403
The National Cancer Institute Discovery Therapeutics Program (DTP) maintains a database of compounds (currently over 40,000) which have been screened for activity as potential anticancer agents in 60 human tumor cell lines. Each of these cell lines has also been tested in a microarray assay to generate gene expression profiles. Those data are potentially useful in identifying lead compounds for a specific molecular target and studying the molecular mechanism action of a drug by appropriate data-mining methods. In our work, various statistical and artificial intelligence methods are used to analysis the screen data together with compounds fingerprint data and the microarray assay gene expression data. Mining those databases, which bridge chemical, biological and genomic information together, can provide useful information in finding the correlations between the chemical sub-structure and biology activity, selecting compounds most likely to interact with a specific molecular target and developing a genomic-based approach to the prediction of drug response.
A method for calculating the pKa values of small and large molecules
Jozsef Szegezdi and Ferenc Csizmadia, email@example.com. ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary
pKa is an essential factor of many drug disposal and lead development studies and considered to be as pivot parameter by synthetic and analytical chemists as well. Predicting the pKa of potential drug candidates in a huge data set or molecules with a large number of titratable groups requires a fast and an accurate method. There is significant interest in the development of an effective and accurate pKa prediction method. Some approaches apply the fragmentation of a molecule to predefined substituents, which is not unambiguous. This is why in our approach a molecule is not considered a finite set of fragments, but a set of partially charged atoms connected trough chemical bonds. Our purpose was to develop a new method for the calculation of macro ionization constants (pKa) of organic molecules in aqueous solution. This pKa calculation method pays special attention for molecules, which have a large number of ionizable groups. The ionization constants of the microspecies are estimated from empirically calculated partial charge distribution and the polarizabilities of atoms surrounding the ionizable centers (O, N, S, C). The number of microspecies is 2^N for molecules containing N ionizable atoms. Calculation is very time consuming for large N values, therefore, we have also developed an effective method for cases N>8. In these cases, instead of fixed microspecies, we defined a protocol for creating abstract groups of microspecies and pKa values calculated from this microspecies distribution at different pH values. Predicted and experimental values are in good correlation.
NCL-3D: a 3D natural compound library for computer-aided anticancer drug discovery
Zengjian Hu, firstname.lastname@example.org and William M. Southerland, email@example.com. Department of Biochemistry and Molecular Biology, Howard University College of Medicine, 520 W Street, NW, Washington, DC 20059
Natural compounds and their derivatives have historically been invaluable as a source of therapeutic agents and have played a significant role in the anticancer drug discovery and development process. To facilitate the application of natural compounds in modern anticancer drug discovery process, we have developed NCL-3D, a searchable 3-dimensional (3D) structure library of natural compounds. NCL-3D will be useful for structure-based virtual screening to find lead compounds in anticancer drug discovery process. It can also be used in the ligand-based virtual screening method when the structure of the target protein is not available. The third application of NCL-3D is to perform ligand-protein inverse-docking for finding potential protein targets of a small molecule, which is applicable to the identification of multiple proteins to which a small molecule can bind or weakly bind. ). We anticipate the NCL-3D will be used as a powerful research tool for researchers to discover novel lead compounds for modern anticancer drug discovery projects and as a useful and inexpensive source of potentially therapeutic compounds.This work was supported by grant 2 G12 RR003048 from the RCMI Program, Division of Research Infrastructure, National Center for Research Resources, NIH
Classification of proteomics data by kernel methods
Kailin Tang, firstname.lastname@example.org and Tonghua Li, email@example.com. Department of Chemistry, Tongji University, Shanghai, China
High-resolution mass spectrometry instruments are increasingly used for disease classification and therapeutic guidance. However, the analysis of immense amount of data poses considerable challenges. Here kernel PLS algorithm is presented and applied to the classification of the SELDI-TOF data of ovarian cancer and normal samples. This algorithm is a robust and nonlinear version of popular partial least square (PLS) method. In terms of process of SELDI-TOF data, the dimensionality reduction is critical stage before discrimination. We show that the Kernel-PLS method is capable of classifying SELDI-TOF proteomics data. Dimensionality reduction and classification can be carried out simultaneously. The method achieves an average sensitivity of 0.9833 and an average specificity of 1.0000 in leave-one-out cross-validations. This study demonstrates the potential applications of this algorithm for tumor diagnosis and the identification of candidate biomarkers.
Fabrication of Chemical and Engineering Devices
KM. Choi, Lucent Technologies, Bell Laboratories, 600 Mountain Avenue, Room 1D-357, Murray Hill, NJ 07974
Since industry has been seeking for advanced nanotechnology, development of new nano-materials and nanofabrication techniques for information technology is a key contribute to this area. We present here a technological emergence of engineering and chemistry to explore new advances in nanotechnology by developing new materials. As an example, soft lithography has been widely used in the replication and fabrication of small features to fabricate engineering and chemical devices at the nano-scale with high performances. However, commercial stamp materials used in current soft lithography are limited in their capability on fabricating nano-scale devices due to their low moduli since conventional PDMS materials are initially produced for other purposes. We thus developed a new version of PDMS stamp materials to overcome the limitations and thus to extend this technology to the nano-scale regime. We also fabricate ‘elastomeric photopatterns' at the micro-scale using the photocurable PDMS prepolymer for integrated circuits to fabricate devices with specific functions for information technology.
Gas Chromatography determining 2-methylnaphthalene and 2-methylnaphthoquinone-1,4 in electrosynthesis reaction solution
Song Chengying, firstname.lastname@example.org, Liu Zhisheng, email@example.com, Wang Liucheng, firstname.lastname@example.org, Zhao Jianhong, email@example.com, and Zhao Mingxing, firstname.lastname@example.org. Chemical engineering, Zhengzhou University, Wenhua 97th Road, Zhengzhou, 450002, China
The method for determination of 2-methylnaphthalene and 2-methylnaphthoquinone-1,4 in electrosynthesis reaction solution by GC was studied, naphthalene was used as a internal standard. The chromatography conditions are as follows: column temperature 200°C, detecting instrument temperature 280°C, vapor temperature 280°C, amount of feed 0.5µl respectively. Contents of the title compound were determined adopting DB-1 capillary column, FID as detector. Under the conditions which have been optimized, each component can be well separated. The content of 2-methylnaphthalene and 2-methylnaphthoquinone-1,4 in reaction solution were calculated by internal standard method, and the concentrations of 2-methylnaphthalene and 2-methylnaphthoquinone-1,4 presented good linear relation with their areas of peak. The mean relative standard deviation of 2-methylnaphthalene and 2-methylnaphthoquinone-1,4 is 0.95% and 1.18%. The recovery of them is 98.65% and 101.58%, respectively.
Loop fitting with a combined force field and shape potential
Brian P. Kelley, email@example.com, Geoffrey Skillman2, Matthew Stahl, firstname.lastname@example.org, Stanislaw Wlodek, email@example.com, and Anthony Nicholls4. (1) American Twine Building, 222 3rd St Suite 3211, OpenEye Scientific, 222 Thirs St, Suite 3211, Cambridge, MA 02142, (2) OpenEye Scientific Software Inc, Suite 1107, 3600 Cerrillos Road, Santa Fe, NM 87507, (3) OpenEye Software, 3600 Cerrillos Road, Suite 1107, Santa Fe, NM 87507, (4) OpenEye Scientific Software, 3600 Cerrillos Road, Suite 1107, Santa Fe, NM 87507
Rapid solution of protein-ligand co-crystal structures is critical to successful structure-based drug design. Drug-like ligands often contain complex chemical motifs that are poorly handled by typical crystal refinement protocols, particularly those with force-fields designed toward protein refinement. Building high-quality, low-strain models from electron density can become a bottleneck in the process of garnering structural insights for ligand design. Thus reliable, automated methods for fitting ligands into electron density have long been sought. We have demonstrated that adiabatic mixing of a high-quality small molecule force field (MMFF) with a shape-based fit to electron density can reliably solve this problem*. Here we extend this method to fitting loop structures, as well as simultaneous fitting of ligands and nearby loops that have moved during ligand binding.
* "Automated Ligand Placement and Refinement with a Combined Force Field and Shape Potential", S. Wlodek, A.G. Skillman and A. Nicholls, Acta Crystallographica D, D62, pp. 741-749 (2006).
Optimization of LC/APCI-MS quinone isomer separation
Polycyclic aromatic hydrocarbons (PAHs) are of health concern due to their mutagenic and carcinogenic properties. Incomplete organic combustion processes are the main source of this class of ubiquitous compounds. Oxidation of PAHs occurs in the natural environment and produces various products including isomers. A challenge faced while studying these compounds and their oxidized derivatives is their chromatographic separation. Isomers are the most difficult to separate because of their slight structural differences. High performance liquid chromatography separation of benzo[a]pyrene-1,6-, -3,6-quinones, oxidized products of the PAH model compound benzo[a]pyrene, have been optimized with a highly effective chemometric response surface designs. In particular, a Box-Behnken design was incorporated to study the effect of eluent composition, flow rate, and column temperature on experimental response. The response was measured as a combination of chromatographic resolution and retention time, and was interpreted with the aid of the Box-Behnken model results. The optimum predicted conditions were experimentally tested and offered resolved quinones giving the possibility for quantitative analysis by peak integration.
Implementation of scientific 'blogging" into chemical laboratory research
Albert C. Fahrenbach, firstname.lastname@example.org and Amar H Flood, email@example.com. Department of Chemistry, Indiana University, 800 East Kirkwood Avenue, Bloomington, IN 47405
The advent of the computerized age in the latter half of the 20th century has revolutionized the way people communicate and how information is stored and presented, particularly in regards to science and engineering. The blogging of scientific research, such as experiment results and conclusions, as well as the incorporation of digitized laboratory notebooks and other high-tech devices into the research activity has been proposed by many as an expedient way to achieve greater productivity and cooperation among the scientific community. The implementation of these new technologies into the everyday lives of a research group would imply the use of digitized laboratory notebooks, RSS (Really Simple Syndicate) feeding of group progress into the scientific communal web, virtual meeting rooms, as well as computerized data storage and record keeping of experiments. We have shown that it is possible to create a platform to host several of these technologies while maintaining a high standard of professionalism and security at no extra costs to the research group.
Filling the void: Organizations and social networking
Dennis S. Loney, firstname.lastname@example.org, Department of Member Research and Technology, American Chemical Society, 1155 16th St., NW, Washington, DC 20036
Nonprofit organizations, specifically membership organizations, are actively seeking ways to engage and attract members via online community building and collaboration tools. In November 2006, the American Chemical Society launched BiotechExchange.org, a social networking site targeting the biotechnology community. This presentation will describe how the social networking community was established, how successful it was in attracting its target audience, how success is measured, lessons learned, and future versions and implementations.
New global communication process in thermodynamics and its impact on quality of published experimental data
Michael Frenkel, email@example.com, Robert D. Chirico1, Vladimir V. Diky, firstname.lastname@example.org, Chris D. Muzny, email@example.com, Qian Dong, firstname.lastname@example.org, Kenneth N. Marsh, email@example.com, John H. Dymond3, William A. Wakeham4, Stephen E. Stein, firstname.lastname@example.org, Erich Koenigsberger6, Anthony R. H. Goodwin7, Joseph W. Magee1, Michiel S. Thijssen, email@example.com, William M. Haynes1, Suphat Watanasiri, firstname.lastname@example.org, Marco Satyro, email@example.com, Martin Schmidt, firstname.lastname@example.org, Andrew I. Johns, email@example.com, and Gary R. Hardin1. (1) Physical and Chemical Properties Division, National Institute of Standards and Technology, Boulder, CO 80305, (2) Department of Chemical and Process Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand, (3) Chemistry Department, University of Glasgow, Glasgow, G12 8QQ, United Kingdom, (4) School of Engineering Sciences, University of Southampton, Southampton, SO17 1BJ, United Kingdom, (5) Physical and Chemical Properties Division, NIST, 100 Bureau Dr Stop 8380, Gaithersburg, MD 20899-8380, (6) Division of Science and Engineering, School of Mathematical and Physical Sciences, Murdoch University, Murdoch, WA 6150, Australia, (7) Schlumberger Technology Corporation, 125 Industrial Blvd., Sugar Land, TX 77478, (8) Acquisitions, STM, BRILL, Plantijnstraat 2, Leiden, NL-2321 JC, Netherlands, (9) Aspen Technology Corporation, Ten Canal Park, Cambridge, MA 02141-2201, (10) Virtual Materials Group, Inc, 657 Hawkside Mews NW, Calgary, AB T3G 3S1, Canada, (11) Software development, FIZ Chemie Berlin, Franklinstr. 11, Berlin, 10587, Germany, (12) Oil, Gas & Chemicals Group, TUV NEL Ltd, Scottish Enterprise Technology Park, East Kilbride, Glasgow, G75 0QU, United Kingdom
Thermodynamic data are a key resource in the search for new relationships between properties of chemical systems that constitutes the basis of the scientific discovery process. In addition, thermodynamic information is critical for development and improvement of all chemical process technologies. Historically, peer-reviewed journals are the major source of this information obtained by experimental measurement or prediction. Technological advances in measurement science have propelled enormous growth in the scale of published thermodynamic data (almost doubling every 10 years). This expansion has created new challenges in data validation at all stages of the data delivery process. Despite the peer-review process, problems in data validation have led, in many instances, to publication of data that are grossly erroneous and, at times, inconsistent with the fundamental laws of nature. A new global data communication process in thermodynamics and its impact in addressing these challenges, as well as in streamlining the delivery of the thermodynamic data from “data producers” to “data users” will be discussed.
Semantic chemical publishing
Nick E Day, firstname.lastname@example.org, Peter T. Corbett, email@example.com, and Peter Murray-Rust, firstname.lastname@example.org. (1) Department of Chemistry, Unilever Centre for Molecular Sciences Informatics, Lensfield Road, CB2 1EW Cambridge, United Kingdom, (2) Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, United Kingdom
Modern informatics tools support, in principle, the complete publication of a chemical experiment: substances, procedures, timelines, observations and analysis. This is being rapidly enhanced by new social computing (blogs and wikis, Connotea, del.icio.us), feeds (RSS), reference works and collections(Wikipedia, Pubchem), ontologies (Goldbook-XML, GO, OBO, CheEBI, MeSH). These now support XML representations and we have enhanced all of them with chemistry though Chemical Markup Language. The challenge is populating them: through OSCAR3/OPSIN we use chemical linguistics to extract information from free-text. CIF2CML and CMLSpect support crystallography and analytical data. With Open publications machines can extract a large amount of semantic chemistry and we shall show live demos. This is merely an interim approach, however; the main requirement is for publishers to wake to the power of this and support the Open publishing of semantic chemistry.
Data lifecycle and curation of laboratory experimental data
Tony Hey, Tony.Hey@Microsoft.com, Microsoft Corporation, 1 Microsoft Way, Redmond, WA 98052-6399
There is an ongoing revolution in the collection of large volumes of experimental data in many fields of science, including chemistry. Early capture of digital data is vital for scientists to have the ability to integrate this data with other data and to search and analyze large amounts of such scientific data. This talk will use the UK e-Science ‘CombeChem' project as an exemplar of a variety of technologies for dealing with the entire data lifecycle from acquisition, curation, publication and preservation.
The semantic wiki as a model for an intelligent chemistry journal
Henry S. Rzepa, email@example.com, Department of Chemistry, Imperial College London, Exhibition Road, London, SW7 2AZ, United Kingdom
Wikis are now recognized as excellent collaborative content authoring environments (provided the appropriate level of author-authentication is achieved). In chemistry however, one also needs to capture both data and metadata in a formal manner, with correct datatyping, and an associated ontology (machine processable vocabulary). Potentially at least, the so-called Semantic Wiki offers just such an environment in which to explore how an intelligent chemistry journal article might be authored, to achieve a so-called SPARQL endpoint for logical analysis. Initial experiences using the Semantic Mediawiki environment will be reported, as applied to capturing relationships and attributes in an article on mauveine. Careful logical analysis (by a human) of an article first published in 1879 by William Perkin reveals that the erroneous molecular structure of this species, as widely reported in the literature up to 1994, could have been inferred as wrong by making use of the facts reported in the original article, with the addition of only a minimum of modern knowledge. The analysis centres on what would be needed for an intelligent software agent, armed with a semantic expression of the original facts as reported by Perkin, to reach the same conclusion, and whether issues of scale, and communal agreement on the appropriate chemical ontologies to deploy for this purpose, are in fact a reasonable and achievable goal for the chemistry community over the next 24 years.
Standard domain ontologies: The rate limiting step for the "Next Big Change" in scientific communication
Allen Renear, firstname.lastname@example.org, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 501 E. Daniel St, Champaign, IL 61820
The long awaited emergence of high-function scientific publishing may, finally, be near. There will soon be the tools, structured data, and communication infrastructure that will allow researchers to use new and innovative strategies for taking advantage of computationally available representations of scientific information. As this happens, the use of traditional publishing artifacts like journals, abstracts and articles will be increasingly away from simply finding and reading and towards more direct and efficient computer-supported exploitation. Several important social and technological trends are converging to make this possible. We focus here on the role of standard domain ontologies and their potential for interaction with changing user behavior in search environments.
Designing a new industry for sustainability: Life cycle analysis for the emerging bioeconomy
Bruce E. Dale, email@example.com, Dept of Chemical Engineering and Materials Science, Michigan State University, 2527 EB, East Lansing, MI 48824
Strong evidence exists that we are in the early phases of a truly historic transition--from an economy based largely on petroleum to a more diversified economy in which renewable plant biomass will become a significant feedstock for both fuel and chemical production. The development of the petroleum refining industry over the past 150 years provides many instructive lessons for the future biobased economy...and also many reasons for supposing that the new biobased economy will be different from the hydrocarbon economy in many crucial ways.
We assume a mature biobased economy--as the petroleum economy is mature today--and from that assumption we extrapolate likely features of the mature biobased economy. Among the technosocioeconomic forces that will drive the mature biobased economy we consider: 1) yield (using the whole "barrel of biomass"), 2) gradual diversification of biobased products, 3) the great diversity of biomass resources combined with their considerable compositional similarity, 4) possible/likely limits on agricultural productivity, 5) integration of biorefining and agricultural ecosystems in a local social and political context (the "all biomass is local" paradigm) and 6) the sustainability of the mature biobased economy and its most important underlying resource--productive soils.
This presentation emphasizes the use of life cycle analysis to evaluate the sustainability of the emerging biobased economy. Life cycle analysis is a systems level tool to evaluate the environmental impacts of processes and products. For the first time in human history, by properly using life cycle analysis and related tools, we have the ability to analyze and design a major emerging industry so that it satisfies both environmental and economic criteria.
Emerging technologies for renewable materials in the UK and EU
Jeremy Tomkinson, J.Tomkinson@nnfcc.co.uk, Chief Executive Officer, National Non-Food Crops Centre (NNFCC), NNFCC Biocentre, York Science Park, Innovation Way, Heslington, York, YO10 5DG, United Kingdom and Alison Hamer, firstname.lastname@example.org, Communications and Information Manager, National Non-Food Crops Centre (NNFCC), NNFCC Biocentre, York Science Park, Innovation Way, Heslington, York, YO10 5DG.
The development of first- and second-generation fuels is driving a change to the wider use of renewable materials across a widening industry base. The pharmaceuticals, biopolymers and biofuels markets are all showing an increased market acceptance and uptake of renewable materials mainly through the enhancement of material property or through the provision of new functionality in traditional formulations. This presentation will introduce some of the emerging technologies in the UK and EU, indicating where real market uptake is occurring and also highlighting some of the future issues in respect to land availability and the management of new waste streams.
Biofuels: From an information perspective
Kathleen Sands, email@example.com, Information Centers, Agricultural Research Servic, Room 108A, NAL BLDG, 10301 BALTIMORE BLVD, Beltsville, MD 20705-2351
Interest in biofuels is sweeping the nation due to a growing demand to relieve our nation's dependence on oil and to ease strains on the environment that are produced by using fossil fuels. In order for this industry to grow and become an integral part of our transportation and energy systems, people must be able to inform themselves of the issues, ideas and research activities associated with biofuels. Some of the leading public resources for obtaining the latest information on biofuels include the World Wide Web, libraries, online and in-print journals, and government and university research projects. The breadth of information found in these resources continues to expand because of the urgent need for research and development, advances in technology, and the push for additional public policy. These driving forces mandate the development of information coordination which, due to the subject's recent rapid growth, is not well established at the present time. This makes it difficult for researchers and the general public to navigate easily through the resources that are currently available. In addition, there are a broad group of stakeholders who have a wide range of information needs on biofuels. Chemists, biologists, scientists, farmers, private companies, engineers, environmentalists, economists, government officials and the general public all want information on the topic. To best serve all of these interested parties, a coordinated information system would create and provide central, organized, and easily accessible information resources. For many years, the National Agricultural Library (NAL) has served as a central hub of current agricultural information, providing for a diverse group of patrons. Some of our latest focuses have been on rural issues, alternative farming systems and technology transfer. This talk will present NAL's plans for improving the availability of information on biofuels as this important industry continues to grow.
Survey of information resources covering renewable fuels, chemicals and energy
Samantha Swann, SSwann@wiley.co.uk, Business Acquisition Editor, John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, , West Sussex, PO19 8SQ, United Kingdom
The international Society of Chemical Industry (SCI) and John Wiley & Sons Ltd have undertaken a survey of information resources relating to renewable chemicals, fuels and energy. This paper describes the motivation for carrying out the survey, the methodology used, the results and the outcome of the study. The presentation will be illustrated with examples of print and online information resources relevant to this topic area. Statistics pertaining to the recent growth of literature on this subject will also be presented.
Enhancing the web experience with ACS journals
Evelyn Jabri, firstname.lastname@example.org and Sarah Tegen, email@example.com. ACS Chemical Biology, American Chemical Society, 1155 16th St NW, Washington, DC 20036
Enhancing our understanding of scientific concepts will require creative and effective uses of web functionalities. At the journal level, we use the web to keep our readers up-to-date on cutting research. With the continued growth of new areas of science such as chemical biology, we must make our content accessible to a larger, more scientifically diverse set of readers. At ACS Chemical Biology we are using many web tools, such Podcasts, forums and Wikis to engage readers with our content. We have also improved our HTML pages with web enhanced objects such as interactive figures and 3D images of macromolecular structures. These enhancements and others coming soon improve our user experience and enhance their ability to understand the subject matter.
Podcasting and social bookmarking at Nature
Joanna C Scott, firstname.lastname@example.org and Timo Hannay, email@example.com. Web Publishing, Nature Publishing Group, 4 Crinan Street, N1 9XW, London, United Kingdom
The Internet has created an array of options for publishers to support scientific communication. This talk will present two examples from Nature Publishing Group (NPG).
Creating and distributing audio content is easier than ever before, allowing organisations that previously issued only written information to explore this new medium. The Nature Podcast was released in October 2005 and quickly gained tens of thousands of listeners. Since then the range and variety of NPG's audio output has increased greatly (e.g., the Chemistry Podcast), reflecting behind-the-scenes changes at NPG that have made audio a natural part of the editorial mix.
The Internet also offers new opportunities for publishers to help scientists to collaborate with each other through participative websites. An example of this is NPG's own social bookmarking service, Connotea. This includes many features tailor-made for scientists, including automatic extraction of bibliographic details, support for DOIs and OpenURL, and privacy options.
Beyond searching: Adding increased value to today's scientific databases
Michael Dennis, firstname.lastname@example.org, Planning & Development, Chemical Abstracts Service, P.O. Box 3012, Columbus, OH 43210
At the beginning of the twentieth century, secondary information sources such as Chemisches Zentralblatt and Chemical Abstracts helped scientists derive value from the published research of their colleagues. Even in the Internet era, search and retrieval of well indexed, timely comprehensive databases remains an important component of the research process. But information technology now affords new means of helping researchers and information professionals not only to identify relevant papers and patents and access the primary documents but also to recognize trends and patterns in the shifting infosphere that continues to grow in size and complexity. From the perspective of Chemical Abstracts Service's hundred years in the information industry, an overview of new directions in the visualization, analysis, and processing of data, text and substance information will be outlined. Technology has changed markedly, but the objective of helping scientists assimilate and apply the wealth of available information remains essential and can be achieved more effectively than ever before. Reliable, trusted, and well organized databases are an essential platform for further advances.
Google Scholar: The adventure continues
Anurag Acharya, email@example.com, Google, Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043
Google Scholar is a fresh look at the traditional problems of discovering and accessing scholarly literature. It is currently being used by researchers all over the world. I will describe the general principles that underly its design and will share some operational experiences. I will also present lessons learnt - both from building and running such a service and from seeing how people use it.
Recommendation systems for research
Marc F. Krellenstein, firstname.lastname@example.org, Elsevier, 30 Corporate Drive, Burlington, MA 01803
As the relevant research literature expands at a rate beyond the ability of any one person to process, technology to recommend work related to your own interests is a valuable tool to help users discover possibly useful information. Collaborative recommendation systems leverage what similar researchers have viewed or done and can provide excellent and reliable suggestions, though there are some obstacles to their use due to privacy considerations and the need for usage history. Content-driven recommendations, perhaps augmented with chemical structure or other similarity measures, do not leverage other (human) judgements but also don't share privacy and usage history limitations, relying only on the existence of available research content and appropriate statistical and natural language technology. These technologies have a good track record of success for similarity searching and are being enhanced to produce ever-better suggestions for related work of interest.
Collaborative filtering in a scholarly context
Georgios Papadopoulos, email@example.com, Atypon Systems, Inc, 5201 Great America Parkway, Suite 510, Santa Clara, CA 95054
Collaborative filtering is the technical term for the "other users who bought this book also bought these books" feature popularized by Amazon. Atypon has been tracking user behavior and experimenting with various algorithms since July 2005 and deployed the first implementation in July 2006 and a full implementation in late 2006. We have gathered ample data on the usefulness of such a feature specifically for scientific publications and how it can be best utilized in conjunction with other discovery mechanisms such as search.
Modeling the scholarly community from usage data
Johan Bollen, firstname.lastname@example.org, Research Library, Los Alamos National Laboratory, TA03 - P362 - STBPO-RL, Los Alamos, NM 87545
This presentation outlines efforts at Los Alamos National Laboratory to construct models of the scholarly community from usage data that has been recorded by online information services. I will discuss an architecture we developed to collect and aggregate usage data at a large scale. Associative networks of resource relationships are extracted from the resulting usage data. These networks then form the substrate for the definition of usage-based metrics of scholarly impact, the implementation of recommender systems, and mapping of the social network structure of the community for which usage was recorded. Results obtained from usage recorded at the California State University system will serve as a case-study. The presentation will conclude with a summary of the main issues in this emerging domain and an overview of the MESUR project, an Andrew W. Mellon Foundation funded project at Los Alamos National Laboratory to support the development of usage-based impact metrics.
Ligand Binding and Circular Permutation modify Residue Interaction Network in DHFR
Zengjian Hu, email@example.com, Donnell Bowen1, William M. Southerland, firstname.lastname@example.org, Yongping Pan2, Antonio del Sol3, Ruth Nussinov2, and Buyong Ma, email@example.com. (1) Department of Biochemistry and Molecular Biology, Howard University College of Medicine, 520 W Street, NW, Washington, DC 20059, (2) Center for Cancer Research Nanobiology Program, Basic Research Program, SAIC, NCI-FCRDC, Frederick, MD 21702, (3) Research & Development Division, Fujirebio Inc, Tokyo,Japan
Residue interaction networks and loop motions are important for catalysis in dihydrofolate reductase (DHFR). Here we investigate the effects of ligand binding and chain connectivity on network communication in DHFR. We carry out systematic network analysis and molecular dynamics simulations of the native DHFR and 19 of its circularly permuted variants by breaking the chain connections in 10 folding-element regions and in 9 non-folding element regions as observed by experiment. Our studies suggest that even though the cutting in the folding element area may not destroy the protein structure, chain cleavage in these regions may de-activate DHFR due to large perturbations in the network properties near the active site. Protected areas are often associated with protein folding; however, our study indicates that chain connection in protected areas may also be important for network interactions. Further, our network analysis reveals that ligand binding has “network bridging effects” on the DHFR structure. The protein active site is near or coincides with residues through which the shortest paths in residue interaction network tend to go. Our results suggest that ligand binding leads to a modification, with most of the interaction networks now passing through the cofactor shortening the average shortest path. Ligand binding at the active site has profound effects on the network centrality, especially the closeness. This work was supported by grant 2 G12 RR003048 from the RCMI Program, Division of Research Infrastructure, National Center for Research Resources, NIH
Chemical superposition and pharmacophore elucidation by SCAPFOld: Self-Consistent Atomic Property Field Optimization
Maxim Totrov, Modeling and Drug Design, Molsoft, LLC, 3366 North Torrey Pines Court, S. 300, La Jolla, CA 92037
Accurate multiple ligand superposition and subsequent elucidation of the pharmacophoric features are the key steps in ligand-based drug design process. The proposed method is based on iterative optimization of a composite 7-component atomic property field. Ligand conformations and positions are optimized by montecartlo minimization procedure in internal coordinates in the property field potentials combined with the internal force field energy. Up to several hundereds of ligands can be simultenuously flexibly superimposed. Rigid 'seed' structures can be included in the ligand set to drive the process towards preferred conformations deduced from experimental data such as X-ray structures. The resulting optimal self-consistent atomic property field can be used to elucidate a pharmacophoric model by locating the maxima of the field components corresponding to the classical pharmacophoric properties. The results are illustrated on Figure 1, depicting the conformations of 25 inhibitors of CDK2 flexibly superimposed by SCAPFOld procedure. Yellow, blue and red blobs are the isosurfaces contouring high potential value regions in space for, respectively, the hydrophobic, hydrogen bond donor and hydrogen bond acceptor components of the property field. Test results for several datasets will be presented and discussed.
Surface interaction property based similarity searching with the eHiTS Filter
Darryl Reid, Zsolt Zsoldos, Bashir Sadjad, firstname.lastname@example.org, and Aniko Simon. SimBioSys Inc, 135 Queen's Plate Dr, Suite 520, Toronto, ON M9W 6V1, Canada
Ligand based similarity searching is a key process in many drug discovery pipelines. The goal is often to find structurally diverse ligands which have similar binding properties and thus find new lead scaffolds for further optimization. The eHiTS Filter tool uses different interaction surface point types to capture the chemical properties on the surface of ligands and thus describe the ligand. A neural network is then trained on known actives and used to search for similar ligands in a screening database. The training process will be described, along with results from several test sets. The measure of the "Goodness" of the hit lists obtained will be measured by the GH Score.
Effect of query structure on specificity for flexible 3D searching
Philippa RN. Wolohan, email@example.com and Robert D. Clark, firstname.lastname@example.org. Informatics Research Center, Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144
Pharmacophores were originally defined as distributions of generalized features in space that are required for activity against a particular biochemical target. Every active compound must exhibit every component feature in such an “essential” pharmacophore, which has the virtue of making clique detection and other deductive approaches to pharmacophore elucidation feasible. Unfortunately, the corresponding 3D search queries are often comprised of too few features too closely spaced to effectively discriminate between active and inactive ligands - i.e., queries that have a large false positive "hit" rate. But such queries can also be so complex that they are too specific. Searches based on them will only recover ligands very similar to those in the training set and will have a prohibitively high false negative hit rate. Including partial match constraints in a query makes it possible to strike a useful balance between these two extremes, especially when a mix of stringent and permissive constraints is used. In the former, most or all features are required to "hit", whereas in the latter only a few are required. This talk will describe a model for predicting the search specificity of a query from its geometry, feature composition and constituent partial match constraints.
Adventures in Shape Space
Paul Hawkins, email@example.com, OpenEye Scientific Software, 3600 Cerrillos Road, Suite 1107, Santa Fe, NM 87507
A bedeviling problem in ligand-based design has been the nature of the conformation of the ligand(s) used to build a query and the conformations of the database compounds to be searched. Many assume the query should be the bioactive conformer or a similar low-energy conformer. We will present data to challenge this notion. Further, an approximation to an aqueous-phase ensemble is often used in multi-conformer databases, though evidence exists suggesting that ensembles of moderately unfolded conformers better reproduce crystal structures. In this paper the inter-relationship between conformer space and shape space of both queries and database compounds will be explored in the context of virtual screening with the shape-based similarity tool ROCS. The results will focus particularly on the impact of different conformer generation and sampling methods on success in shape-similarity virtual screening.
New self-organizing algorithm for molecular alignment and pharmacophore development
Deepak Bandyopadhyay, firstname.lastname@example.org and Dimitris K. Agrafiotis. Johnson & Johnson Pharmaceutical Research & Development, L.L.C, 665 Stockton Dr, Exton, PA 19341
We present a method for simultaneous conformational analysis and pharmacophore-based molecular alignment using a self-organizing algorithm called Stochastic Proximity Embedding(SPE). Earlier(2003) we used SPE to generate 3D structures by iterative rule-based adjustment of atom coordinates, achieving higher speed and conformational diversity than earlier programs. Here, we run SPE on an ensemble of 2D molecules to be aligned, with 1D correspondence between atoms/groups coming from automatically generated pharmacophore hypotheses or specified manually. We add distance terms to SPE to bring corresponding pharmacophore points and their associated direction vectors closer. Atoms/groups may also be constrained to lie near external coordinates from a binding site. The 3D structures of each molecule in the resulting alignment are nearly correct if the pharmacophore hypothesis was chemically feasible; post-processing by distance and energy minimization further improves the structures and weeds out infeasible hypotheses. 3D pharmacophores extracted from a successful alignment can be used for database searching.
Analyzing docking results by substructure search in euklidean space
Thomas Zuhl, Thomas.Zuhl@biosolveit.de, Marcus Gastreich, email@example.com, Christian Lemmen, and Holger Claußen, Holger.Claussen@biosolveit.de. BioSolveIT GmbH, An der Ziegelei 75, 53757 Sankt Augustin, Germany
The scoring problem remains unsolved for molecular docking. Target-specific scoring and other tweaks in the docking process are current work-arounds for this problem. Our Docking Database (DDB) provides a powerful analysis tool for this task.
We have introduced a DDB substructure search in 3D that allows to analyze spatial arrangements of substructures of docking poses. In addition to simple selection and filtering of particular substructures, the new mechanisms enable users to generate statistics of, for example, the distribution of functional groups within the active site and to test the effects of FlexX-Pharm constraints by eliminating unwanted results.
In order to yield fast analyses, all 3D coordinates of both protein structure and all ligand poses are preprocessed and stored in an underlying ORACLE database. The substructures are specified as SMARTS expressions.
We will present first test results and an outlook on further potential applications.
Using text mining software to identify drug, compound, and disease relationships in the literature
Darryl A. León, firstname.lastname@example.org, Active Motif, 1914 Palomar Oaks Way, Suite 150, Carlsbad, CA 92008
From national security agencies to drug discovery companies, these organizations find text mining an excellent approach for information retrieval and knowledge extraction. However, when analyzing life science abstracts, many text mining methods are challenged with such issues as multiple synonyms, inconsistent homonyms, and ambiguous acronyms. In the drug discovery arena, most scientists do not want to become experts in text mining techniques, but simply want to find key, published information and relationships about a drug, compound, or disease. This talk will introduce basic text mining approaches, describe how text mining software can help with extracting the relevant biochemical knowledge in life science publications, and give a short survey of select text mining tools for the life sciences. A few examples will be provided to show how text mining software is being used in research and discovery.
Descriptive and predictive models for in-vitro human cancer cell growth screens
Richard Kho, email@example.com, Mick Correll, and Jonathan Ratcliffe. InforSense, 25 Moulton St, Cambridge, MA 02138
The recent explosion of data in the life sciences has served as an impetus for the application of classic data mining methods in analysis and prediction. Here, we compare and contrast various data mining techniques for analyzing and predicting the inhibition of cancer cell growth by small molecules. Data consisting of >40,000 screening results were obtained from the NCI's Human Tumor Cell Lines Screen of the Developmental Therapeutics Program. Cheminformatic methods were used to calculate descriptors for the screened compounds, and data pre-processing was performed to cull out redundant features. A suite of data mining techniques were applied to a random subset (the training set) and validated using leave-one-out or stratified three-fold cross-validation. The performance of these models in predicting cancer cell growth inhibition is described for potential use in early virtual screening of chemical libraries.
Towards linking small molecules to biological processes in RSC publications
Colin R Batchelor, firstname.lastname@example.org, Royal Society of Chemistry, Thomas Graham House, Milton Road, Cambridge CB4 0WF, United Kingdom
Chemical text mining is unlike biomedical text mining for all sorts of reasons. We contrast the main challenges and current techniques in both fields and evaluate their applicability to chemical publishing, as well as the impact of developments in chemoinformatics and large databases such as PubChem. We look at what will be needed to go beyond drug discovery into general chemistry and discuss the possibilities for deep parsing of chemical text from publishers' and end users' perspectives.
Applying data mining approaches to further understanding chemical effects on biological systems
Chihae Yang, email@example.com, Leadscope, Inc, 1393 Dublin Road, Columbus, OH 43215 and Ann M. Richard, firstname.lastname@example.org, US EPA, MD B143-06, Research Triangle Park, NC 27711.
Data mining methods require the technological framework of a relational database based on a rigorous data model, flexible searching and retrieval functions, and data analysis and visualization tools. A data model, consisting of a schema (hierarchy) and controlled vocabulary, provides the foundation for meaningful data mining, enabling mechanistic hypotheses to be generated and validated. Advances in the field of computational toxicology are being driven by expanding capabilities for mining the domains of biology and chemistry simultaneously. To break away from the current paradigm of analog searching solely based chemical similarity, this paper presents informatics methods to finding chemical structures with biologically similar functions. A chemical stressor with particular biological attributes will seed the biology domain. The resulting biological profile will then be projected onto the chemical structure domain to broaden the concept of “analogs” and to assist in the understanding of hazard potential through iterative exploration of both chemical and biological analog space. The National Toxicology Program recently conducted high throughput screening of over 1400 chemicals in a series of cell-viability assays and made the data available through PubChem. This dataset will be used to illustrate various data mining techniques to biologically profile the chemical space. This abstract does not necessarily reflect EPA policy.
Pharmaceutically intuitive chemical space visualization--enabling the discovery of structural relationships and associated biologically relevant properties between substances
Anthony J. Trippe, email@example.com, New Product Development, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505
This presentation will demonstrate new developments in pharmaceutically intuitive methods for visualizing chemical space by organizing and clustering large numbers of chemical substances. In addition, data mining experiments to provide individual bioactivity profiles to substances will also be discussed. These two activities will be brought together by way of a pharmaceutical industry case study to demonstrate how the combination can provide immediate and unique insight into a related substance collection.
Mining and visualizing the chemical content of large databases
Hugo O. Villar, firstname.lastname@example.org, Mark R. Hansen, email@example.com, and Jason Hodges. Altoris, Inc, 11575 Sorrento Valley Rd, Suite 214, San Diego, CA 92121
The dominant paradigm in drug discovery emphasizes techniques that generate large amounts of data. What was possible by simple inspection in the past cannot be effectively achieved nowadays without the aid of informatics techniques. The selection of compounds for synthesis in a medicinal chemistry program is founded on the identification of structural or physicochemical patterns that correlate with activity data, or negatively correlate with undesirable effects. The task is made extremely taxing when researchers are confronted with massive amounts of biological information, where instead of a few compounds, entire chemical classes may have to be assessed simultaneously. The assimilation of these data is a prerequisite for the identification of patterns and gathering of information. Therefore, there is a need in modern chemistry to organize data for the bench chemist in a manner that facilitates incorporation of all available information into their research in the simplest manner that retains accuracy. In this way, information from high throughput screening or other tests can be fully exploited to improve library design and lead optimization. Most tools for analysis currently in vogue are highly complex and limited to the hands of experts. We will discuss some methods and statistical approaches that can aid in the assimilation of the information by bench scientists.
Developing Semantic Web Service for Chemical Informatics
Xiao Dong, firstname.lastname@example.org and David J Wild, email@example.com. School of Informatics, Indiana University, Bloomington, IN 47408
As the use for web service technology has become more prevalent in chemical informatics, it also poses new challenges in the organization and discovery of available information and computation services. In the mean time, semantic web developments are aiming to make web content machine-processable, allowing semantic interoperability and meta-level exploitation of data and computation. Here we explain how to use semantic web service technology to support an existing chemical informatics cyberinfrastructure. More specifically we use OWL-S, the web ontology language for web service, to describe the generic properties for our chemical informatics web services, which we extend to develop domain-specific ontology for those services. In this presentation we will also discuss how to utilize autonomous agent technology within the semantic web service infrastructure to allow automatic discovery, composition, invocation and execution of workflows that are meaningful in early stage drug discovery.
A tiered screening protocol for the discovery of structurally diverse HIV Integrase inhibitors
Rajarshi Guha, firstname.lastname@example.org, Debojyoti Dutta, email@example.com, David J Wild, firstname.lastname@example.org, and Ting Chen2. (1) School of Informatics, Indiana University, 1130 Eigenmann Hall, 1900 E 10th Street, Bloomington, IN 47406, (2) Department of Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089
We report a virtual screening protocol for the identification of identifying diverse HIV inhibitors. We developed linear and non-linear classification models based on a 900 compound training set. The models were then used to predict the activity class of a large vendor library. The vendor compounds that were predicted active were then filtered based on similarity to the most active compound in the training set. The final hit list was prioritized according to similarity to the most outlying active compound in the training set. Our initial results did not lead to a significantly diverse set of hits. Furthermore, none of our hits were common to the set obtained previously using a pharmacophore model. However, relaxing the similarity constraints identified four compounds which were very similar to a known inhibitor. We discuss possible reasons why this was the case and describe docking results for the hits obtained using our tiered screening protocol.
Something old, something new: creating an undergraduate chemical information seminar
Teri M. Vogel, email@example.com, Science & Engineering Library, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0175E and Barbara A. Sawrey, firstname.lastname@example.org, Department of Chemistry & Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0303.
For Spring quarter 2006, we taught the first undergraduate chemical information seminar at UC San Diego as a potential complement to efforts to increase course-integrated instruction. Twelve students, freshmen to seniors, took the six-session, one-credit course. We used ACS' undergraduate guidelines on chemical information retrieval as a guiding framework, while experimenting heavily with instructional technologies and teaching techniques to promote active learning. This presentation will summarize the challenges and opportunities we faced in designing and teaching the seminar; present a detailed breakdown of the learning objectives and chemical information resources covered during the seminar, including instructional technologies and in-class handouts; share the student evaluations; and describe the changes that will be incorporated into the Spring 2007 seminar.
Mmm...vanillin: Reaching graduate students through ice cream seminars
Jeremy R Garritano, email@example.com, Mellon Library of Chemistry, Purdue University, 504 W. State St., West Lafayette, IN 47907
While there is no required chemical literature course for graduate students at Purdue University, a series of library ice cream seminars is offered throughout the semester as an alternative. These seminars are targeted toward new graduate students to help them adjust to the wide variety of library resources now available to them. Each seminar is focused on either a particular resource (SciFinder Scholar, Beilstein, etc.) or a topical theme (patents, citation searching, spectral information, etc.). Implementation of the seminar series will be explained as well as content presented and effective marketing methods. Results of evaluations and future directions will also be discussed.
Hands-on remote training in chemical information
Peg Renery, P.Pontier-Renery@mdl.com, Educational Services, Elsevier MDL, 2440 Camino Ramon, Suite 300, San Ramon, CA 94583
The development and deployment of new training formats and delivery options can be challenging. At the same time, these initiatives offer significant organizational and productivity benefits. Over the last several years, Elsevier MDL Educational Services has expanded its training offerings and broadened its role to include: decision making in software/database acquisition; enabling others to support in-house training efforts; creating a self-service environment with multiple points of access; and delivering hands-on training using Web conferencing tools. In this presentation you will see demonstrations of our Learning Centers and Video Libraries showcasing interactive simulations. In addition, we will discuss our live, instructor-led training delivered via Webex and address the challenges of delivering successful hands-on training remotely. Our goal remains—to offer more scientists, librarians, faculty and students worldwide the benefits of our proven instructional methodology of “tell me, show me, let me do.”
Taking the graduate classroom teaching a step further
Monica Shokeen, firstname.lastname@example.org, Kenya T. Powell2, Karen L. Wooley2, and Carolyn J. Anderson1. (1) Division of Radiological Sciences, Washington University School of Medicine, 510 Kingshighway Blvd., Campus Box 8225, St. Louis, MO 63110, (2) Department of Chemistry, Washington University in Saint Louis, One Brookings Drive, Campus Box 1134, St. Louis, MO 63130
The scope of graduate classroom teaching can be significantly improved by increasing the interaction between different departments and schools. Present day science can metaphorically be referred to as a melting pot of different subject areas. At the college/graduate level, it seems highly appropriate to offer courses that have an overlap with various areas of science. The synergistic effect can be compounded by the extensive use of multimedia based communication tools. These novel principles were put into fruition by offering a course on the emerging field of nanomedicine that was available to students in different universities. The lectures were available live via teleconferencing and also videotaped. All the subject material including the videotaped lectures was made available to the students via internet based resources. The methodology, challenges, success, and lessons learned will be discussed.
Educating graduate students in chemical information
Engelbert Zass, email@example.com, Informationszentrum Chemie Biologie Pharmazie, ETH Zuerich, HCI J 57.5, CH-8093 Zuerich, Switzerland
We have a long tradition (back to 1984) of providing formal courses for graduates. In the last few years we have established a program for Bachelor students starting in their first year. The instruction provided on that level has a direct influence of what you can and must do at the graduate level and we have already reconfigured the old Ph.D. course along these lines.
Librarian office hours: An old tool with a new use to improve graduate education
Bing Wang, firstname.lastname@example.org, Library and Information Center, Georgia Institute of Technology, 704 Cherry Street NW, Atlanta, GA 30332
With more and more digitalized information available on their desktops, graduate students tend to go to the library building less and less. Although phone, email, and other telecommunication tools can facilitate students asking questions, face-to-face reference interviews are still very helpful especially for those who have language challenges. Besides, students hesitate to attend library instructional classes due to the physical distance between their labs and the library. Here at Georgia Institute of Technology, the subject librarian for the School of Materials Science and Engineering (MSE) has been holding office hours weekly in the MSE building. Instead of waiting for questions during office hours, the librarian starts with short demonstrations on various databases and resources followed by question and answer sessions. Such an approach is not only pushing information out to more graduate students but is also attracting faculty and undergraduate students' interests as well.
Deconstructing molecules in an organic information course
Judith N. Currano, email@example.com, Chemistry Library, University of Pennsylvania, 3301 Spruce St. 5th Floor, Philadelphia, PA 19104
The Chemistry Department at the University of Pennsylvania requires all PhD students to complete a course in chemical information during the first year of study. The twelve-week course, taught by the Chemistry Librarian, has the students divided into sections based on their subject interest (organic, inorganic, biological, or physical). The organic section focuses on resources that support synthetic organic chemistry and techniques of locating substances and reactions in the literature. It relies heavily on substructure searching in many of the databases; and three to five lectures are devoted to this topic. The students are taught to analyze molecules, create generic substructures, adapt them to be more or less specific, and apply these theoretical skills to each resource studied. The duration of the course gives students time to learn advanced substructure skills, including the use of complex generic groups, repeating units, reaction mapping, and the combination of substructure search sets.
Fifty years of the International Association for Great Lakes Research
Matt F. Simcik, firstname.lastname@example.org, Division of Environmental Health Sciences, University of Minnesota, MMC 807, 420 Delaware Street SE, Minneapolis, MN 55455
The International Association for Great Lakes Research is a scientific organization made up of researchers studying the Laurentian Great Lakes and other large lakes of the world, as well as those with an interest in such research. It was established in 1975 as a society of scientists interested in studying the Great Lakes. These scientists represent such diverse backgrounds as biologists, economists, physical oceanographers and chemists. Many of the early work on environmental chemistry was published in its journal and countless presentations on the chemistry of the Great Lakes ecosystem are presented every year at its annual meeting. The history, mission and seminal findings of research presented at its conference and within its journal will be presented.
Mass balance models for persistent, bioaccumulative, toxic chemicals (PBTs) in the Great Lakes: Application to Lake Ontario
Joseph V. DePinto, email@example.com, Limno-Tech, Inc, 501 Avis Drive, Ann Arbor, MI 48108 and Russell G. Kreis Jr., firstname.lastname@example.org, National Health and Environmental Effects Research Laboratory, USEPA Office of Research and Development, Mid-Continent Ecology Division, Large Lakes and Rivers Forecasting Research Branch, Grosse Ile, MI 48138.
Over the past 20 years, the Great Lakes research and modeling community has made great strides in the development and application of process-oriented models to support the assessment and management of PBTs in large lakes. These models have been used to develop a quantitative relationship between the loads of chemicals such as PCBs from various sources and the concentrations of those chemicals in the water, sediments, and biota of the lake. Mass balance modeling for PCBs in Lake Ontario will be used to illustrate how mass balance models are developed by blending research, monitoring, and modeling. We will also present insights that have been gained by models. For example, current external sources of PCBs to Lake Ontario are mainly from atmospheric deposition and upstream loading through the Niagara River. But the response of fish PCB concentrations, while continuing to decline, is controlled by feedback from sediments that still have high levels from historical loads.
Contaminant mass balance model applications in the Great Lakes: Lower Fox River/Green Bay and Lake Michigan
Russell G. Kreis Jr., email@example.com, National Health and Environmental Effects Research Laboratory, USEPA Office of Research and Development, Mid-Continent Ecology Division, Large Lakes and Rivers Forecasting Research Branch, Grosse Ile, MI 48138 and Joseph V. DePinto, firstname.lastname@example.org, Limno-Tech, Inc, 501 Avis Drive, Ann Arbor, MI 48108.
Multimedia, mass balance forecast models have been applied to determine the sources, transport, fate, and effects of contaminants in the Great Lakes and to aid managers in the decision-making process. This presentation provides an overview of applications to the lower Fox River/Green Bay complex for PCBs, and to Lake Michigan, on a lake-wide basis, for PCBs and atrazine. The lower Fox River/Green Bay complex has a long history of PCB contamination. Modeling results indicated that sediments were the primary source of PCBs, and if remediated, fish consumption advisories could be relaxed after approximately 10-20 years. The Lake Michigan Mass Balance Study indicated that the primary source of PCBs to the system was through atmospheric routes; whereas, atrazine inputs were primarily from tributaries. Forecasts indicated that PCBs will continue to decline in lake trout and could be accelerated; however, forecasts for atrazine suggest continued increases at present usage and input rates.
PBDEs and PCBs in the sediments of the Great Lakes: Distributions, trends, influencing factors, and implications
An Li, email@example.com, Karl Rockne, firstname.lastname@example.org, Neil C. Sturchio3, Wenlu Song1, Justin C. Ford1, Dave R. Buckley4, and William J. Mills1. (1) School of Public Health, University of Illinois at Chicago, 2121 W Taylor St, MC 922, Chicago, IL 60612, (2) Department of Civil and Materials Engineering, University of Illinois at Chicago, Chicago, IL 60607, (3) Department of Earth and Environmental Sciences, University of Illinois at Chicago, Chicago, IL 60607, (4) Department of Civil and Materials Engineering, University of Illinois at Chicago, Chicago, IL 60607
The spatial distribution and temporal trends of PBDEs and PCBs in the sediments of the Great Lakes were investigated through retrieving sedimentary records. The accumulation of the 9 tri- through hepta-PBDEs (Σ9BDEs), BDE209, and the sum of 11 PCBs were 5.2±1.1, 92±13, and 69±10 tonnes, respectively, around year 2002. The inventories of both PBDEs and PCBs show strong dependence on the latitude, and to a lesser extent on the longitude, of the sampling sites. From the 1970s to 2002, the increases in PBDE input flux were exponential at all locations. In the same time period, PCB fluxes were dramatically decreased or leveled off depending on locations. The year of deposition, latitude, and organic matter content of the sediments account for about 70% of the variations in PBDEs. For PCBs, changes in congener patterns with sediment depth differ among lakes, and evidence of in situ dechlorination was observed in Lake Ontario.
The Great Lakes offshore biological dessert and the nearshore slime around the tub
David C. Rockwell, Rockwell.David@epa.gov, Great Lakes National Program Office, United States Environmental Protection Agency, 77 West Jackson Blvd, Chicago, IL 60604
Annex 3 of the Great Lakes Water Quality Agreement calls for the development and implementation of phosphorus control programs and measures to reduce algal biomass and to eliminate nuisance conditions, especially in Lakes Erie, Michigan and Ontario. The primary objective of reducing phosphorus loadings to the Great Lakes was to control algal abundance and species composition. Based on TP trends observed in the GLNPO long-term monitoring data and Environment Canada, for all five lakes, there has been a decline in spring total P over the last 3 to 4 decades. At the same time, green slime conditions have reemerged in the nearshore zones and are reported in various places in all Lakes except Lake Superior. This talk will discuss the chemical and biological conditions observed in the biological nutrient depleted offshore and nutrient enriched nearshore zone.
Moving the region towards meaningful Great Lakes restoration
Kristy Meyer, Kristy@TheOEC.org, Lake Erie Program Director, Ohio Environmental Council, 1207 Grandview Ave., Ste. 201, Columbus, OH 43212
In 2004, President Bush signed an Executive Order that established the Great Lakes Regional Collaboration (GLRC) to address nationally significant environmental and natural resource issues involving the Great Lakes. The GLRC is comprised of more than 1,500 people representing federal, state, local and Tribal governments, non-governmental entities and concerned citizens that worked together to develop strong recommendations tied to funding targets to restore the Great Lakes. On December 12, 2005 the GLRC Executive Committee released the GLRC Restoration Strategy. Since December of 2005 the Healing Our Waters® - Great Lakes Coalition, a coalition of 86 national, regional, state and local groups that was founded in 2004, has been working with Congress, state, local and Tribal governments, non-governmental entities and concerned citizens to push for funding for critical projects in the Great Lakes basin that helps the region meet the goals and objectives laid out in the GLRC Restoration Strategy.
Index to physical, chemical and other property data – What's next?
Olivia Bautista Sparks and Linda Shackle, email@example.com. Noble Science and Engineering Library, Arizona State University, PO BOX 871006, Tempe, AZ 85287
The Index to Physical, Chemical and other Property Data webpage was created in 1998 to assist Arizona State University Libraries staff and students find property information. At ASU, the page is used in instruction sessions for chemistry courses such as organic and biophysical, and is highlighted in materials science and chemical engineering library orientations. Current work is focused on the expansion of definitions and symbols and the addition of a WorldCat link for resources listed. What is the future direction of this resource? Many libraries link to the website from their Chemistry Subject Guide or their own property page. Is there a way to combine our efforts into one global resource? Librarians unite! This presentation will highlight the history of the webpage and describe the evolution of a new concept in property indexing.
Open software and open standards may help cease the fire
Tobias Helmus1, Stefan Kuhn1, Peter Murray-Rust, firstname.lastname@example.org, Miguel Rojas Cherto1, Henry S. Rzepa, email@example.com, Ola Spjuth4, Christoph Steinbeck, firstname.lastname@example.org, Jarl E. S. Wikberg, Jarl.Wikberg@farmbio.uu.se4, and Egon Willighagen1. (1) Research Group for Molecular Informatics, Cologne University Bioinformatics Center (CUBIC), Zuelpicher Str. 47, D-50674 Cologne, Germany, (2) Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, United Kingdom, (3) Department of Chemistry, Imperial College of Science, Technology and Medicine, Exhibition Road, South Kensington, London SW7 2AY, United Kingdom, (4) Department of Pharmaceutical Biosciences, Uppsala University, Husargatan 3, 751 24 Uppsala
Recent open source developments in chemical informatics, together with open standards for coding analytical information, now allow for the easy creation of open source electronic lab notebook software. An ideal candidate for such an endeavor is our rich client workbench Bioclipse (http://www.bioclipse.net), an open workbench for I/O, display and processing of molecular, analytical and other types of data relevant for electronic lab notebooks . As a second component we have developed the open standard CMLSpect, an extension of Chemical Markup Language (CML) towards representing spectral data . In this presentation, we demonstrate Bioclipse for authoring of semantically rich XML documents, integrating textual, factual, molecular and analytical data into electronic laboratory notebook documents.
 Spjuth, O.; Helmus, T.; Willighagen, E. L.; Kuhn, S.; Eklund, M.; Steinbeck, C.; Wikberg, J. E., Bioclipse: An open rich client workbench for chemo- and bioinformatics. 2006, submitted.
 Kuhn, S.; Murray-Rust, P.; Lancashire, R. J.; Rzepa, H.; Helmus, T.; Steinbeck, C., Chemical Markup, XML, and the World Wide Web: CMLSpect, an XML vocabulary for spectral data. 2006, submitted.
IUPAC name generation: challenges and evaluation
Daniel Bonniot, email@example.com, ChemAxon, Budapest, Hungary
Names are widely used to describe molecules in a familiar and easily understandable way, and have been formalized by IUPAC. We introduce a new implementation of an automatic convertor from molecular structures to IUPAC names, that can generate the Preferred IUPAC Name in most cases. This generator is available as a module integrated in the chemical software tools provided by ChemAxon. We present challenging cases that needed to be addressed, in several areas including: parent selection, principal chain selection, optimal numbering computation. We also demonstrate the naming of large bridged ring systems with optimal locants. We consider the specific challenges in naming structures with aromatic rings, where dearomatization is required for naming and has to be performed in a controlled way to lead to minimal locants for the double bonds. Finally, we evaluate the rate of correct names and performance for various publicly-available collections of structures.
Information content in organic molecules: Brownian processing of ribonucleases
Daniel J. Graham, firstname.lastname@example.org and Jessica L. Greminger, email@example.com. Department of Chemistry, Loyola University Chicago, 6525 North Sheridan Road, Chicago, IL 60626
The informatic properties of organic compounds have been the subject of research in this lab during the past several years. The present study focuses on an important class of globular proteins (RNases) whose information can be expressed and quantified using Brownian processing. Proteins are like other organic molecules in that their informatics hinge on the details of atom/covalent bond networks and possible collision sequences in disordered environments. The folding and chemical function of a given sample thus correlate with the message space articulated by the network, its size, structure, and subtleties. In Brownian processing, this space is charted via elementary random walks applied to the structure graphs which are representative of the protein. We present informatic data for several members of the RNase extended family. We examine the correlations between the catalytic specificity of the enzyme and its information distribution functions. Further, the informatic ramifications of select amino acid residue mutations are discussed. Overall, the research investigates protein structure/function relationships from a Brownian computational perspective. The results are significant as RNases control(in part)information transport in the organism.
Chemical Terms, a language for cheminformatics
György Pirok, Nóra Máté, Jozsef Szegezdi, Zsolt Mohacsi, Szabolcs Csepregi, István Cseh, Attila Szabo, Miklos Vargyas, firstname.lastname@example.org, and Ferenc Csizmadia, email@example.com. ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary
Flexibility and ease of use are important yet contradicting requirements in cheminformatics systems. Complex search criteria routinely used in database queries provide apparent examples. Beyond flexibility and ease of use efficient computability sets a third requirement.
To meet all these demands we have designed and developed Chemical Terms, a chemical computation language constituted of hundreds of chemical functions (such as pKa, logP and many other chemical and structural calculations) and common mathematical and logical operators to combine these functions. The predefined set of functions can be extended using an open plug-in architecture.
Successful applications of the language including advanced search filters in chemical databases, virtual reactions design and pharmacophore perception are demonstrated. The Visual Chemical Terms Editor that embeds the language in a user friendly graphical environment is also presented.
The future development of the Chemical Terms language will introduce strong typing, functional abstraction and containers.
Accounting for 3D descriptors of conformers in QSAR modeling
Shaillay K Dogra, firstname.lastname@example.org, Achintya Das, email@example.com, and Kalyanasundaram Subramanian, firstname.lastname@example.org. Cheminformatics, Strand Life Sciences Pvt. Ltd, 237, C. V. Raman Avenue, Raj Mahal Vilas, Bangalore, India
3D descriptors computed from a single conformer are used in QSAR modeling. However, the molecule may exist in various conformers. How do we incorporate this variation in 3D descriptors in the modeling process? Does this affect modeling significantly? Averaging of descriptors based on probability-weights derived from a Boltzmann-distribution has been used as a method to address this issue. In an extensive study on a Solubility dataset (1305 compounds, ~45k conformers, 634 3D descriptors), we found that this approach did not make a significant difference. Comparable models were obtained for the two sets – a Boltzmann-averaged descriptors set and a minimum-conformer descriptors set. Though the values for a given descriptor differed across the two sets, these changed in a strongly correlated manner eventually giving us similar statistical models. Is there a better measure to account for the differences in conformers? We investigate different means of sampling and averaging and check their performance.