Titles link to slides when available. Please note: Presentations given at CINF symposia have been posted to the CINF website with express permission granted by the authors who retain the original copyright. These presentations are for information purposes only and cannot be further disseminated without the author's prior written permission.
![]()
CINF
1
Fate of
chemistry branch libraries: Onward toward 2015
Jeremy R Garritano, Mellon Library of Chemistry, Purdue University,
504 W. State St., West Lafayette, IN 47907, jgarrita@purdue.edu
The
pressures of technology, multidisciplinary research, and shrinking budgets
have caused many librarians to rethink the roles of chemistry branch
libraries in recent decades. Some of these libraries have reinvented
themselves, while others have been consolidated into general science and
technology libraries. The author will report on the results of a 2005 survey
of Association of Research Libraries (ARL) institutions and the status of
their chemistry related library resources and facilities. The survey will
look at the past, present and future of their chemical information
resources, paying particular attention to those that have been or will be
combined with other facilities. The reasons for consolidation will be
discussed, as well as what other disciplines are included within the
combined collection, and other issues regarding administration and outreach.
![]()
CINF
2
The Harvard
chemistry library: Ghosts aboard the starship
Marcia L. Chapin, Chemistry & Chemical Biology, Harvard
University, 12 Oxford St., Cambridge, MA 02138, Fax: 617-495-0788, chapin@chemistry.harvard.edu
The
Harvard Chemistry Library has played a quiet but profound role in chemical
education and research at Harvard. Since 1927, the Library, located in the
heart of the Chemistry & Chemical Biology Department complex, has served
as a focal point for chemical information resources, chemical contemplation,
and a host of Harvard chemistry community gatherings. The spirit of many an
illustrious faculty member is to be felt there. The reading room embodies
what students have come to expect from Harvard, a sense of history and
elegance. With the advent of digital access to chemical information, the
space occupied by the Library is beginning to be scrutinized very closely.
Is it reasonable to harvest the current Library space for laboratories and
create a small “starship” information center, a new paradigm where most
everything would be online? A perfect storm of university politics, space
competition, and financial constraints has come to bear on these decisions.
How important is the historic space to 21st century teaching and research?
![]()
CINF
3
Adaptation of a
chemistry library: The University of Chicago experience
Andrea Twiss-Brooks, John Crerar Library, University of Chicago, 5730
S. Ellis Ave, Chicago, IL 60637-1403, atbrooks@uchicago.edu
“You
cannot step twice into the same river.“ Heraclitus (c. 540 - c. 480 BC)
During
its eight decades of existence, the Chemistry Library at the University of
Chicago has undergone a lot of change. This change has been driven by many
factors, including advances in library technology, construction of new
library and research buildings, space planning challenges, migration to
electronic information resources, and more. In summer 2005, the Chemistry
Library was completely closed and the collections merged into the holdings
of the main science library. This presentation will explore some of the
driving forces behind the closure of the Chemistry Library, the changing
role of the Chemistry Librarian, and chemical information reference and
instructional services in the context of a centralized science library
environment. Effects of the closure on staff, resource reallocation, the
process of moving the collections, and service marketing in the new
environment will also be addressed.
![]()
CINF
4
Metamorphosis of
the chemistry library: What will emerge?
William W Armstrong, LSU Libraries, Chemistry Library, Louisiana
State University, Baton Rouge, LA 70803, Fax: 225-578-2760, notwwa@lsu.edu
Forces
ranging from institutional financial pressures and space constraints to
rapid technological advances are acting on the chemistry library causing a
metamorphosis. Technological advances have revolutionized the way scientists
communicate with one another and the way this information is disseminated.
Has the library's role in the flow of information changed in response to
these new developments? Have the needs of patrons changed as a result? What
shape or role will the library have in the future? What should its role be?
The author will provide an overview of some of the changes occurring or
likely to occur, while highlighting any positive or negative aspects these
changes might entail. We must balance an ideal with a knowledge of the
realities which act as constraints, or parameters, in which these changes
will take place. Change will occur. Will we merely react, or will we direct
this change?
![]()
CINF
5
Changing
mission, strengthened focus: A new use for the Current Periodicals Room at
the University of California, Santa Cruz
Catherine B. Soehner, Christy Hightower, and Wei Wei, Science &
Engineering Library, University of California, Santa Cruz, 1156 High Street,
Santa Cruz, CA 95064, Fax: 831-459-2797, soehner@ucsc.edu
The
Science & Engineering Library at UC Santa Cruz was built in 1991 and
included a beautiful room dedicated to a print collection of current
periodicals. During the past two years we have systematically canceled all
print journals for which there was an electronic counterpart, thus
diminishing the number of journals in the Current Periodicals Room. During a
strategic planning effort, the Library determined that it should be
identified as the 'Information Center' of the campus and be the 'destination
of choice' for students, faculty, staff, and members of our greater
community even in this digital age. As a first step toward realizing this
goal, the library staff began a lecture series entitled Synergy:
Explorations in Science and Society, held in the Current Periodicals Room.
This new lecture series highlights research, teaching and grants in science
and engineering at UCSC and brings these efforts to the attention of the
UCSC and greater Santa Cruz community. The response to this lecture series
has been overwhelmingly positive with record attendance. This venture marks
the beginning of a successful move toward integrating the library further
into the mission of the University and further increases the library's
connection with its faculty.
![]()
CINF
6
Planning a
combined engineering, computer sciences, and physics library at Stanford
University
Grace A. Baysinger, Swain Library of Chemistry & Chemical
Engineering, Stanford University, Organic Chemistry Building, 364 Lomita
Drive, Stanford, CA 94305-5080, Fax: 650-725-2274, graceb@stanford.edu
A
new library for the Engineering, Computer Sciences, and Physics communities
at Stanford University is slated to open in 2012. It will be a
state-of-the-art facility that will be designed as “stackless” or
without book stacks. Planning efforts include reviewing trends, assessing
issues, and developing future visions for the facility, including its
collections, services, and staffing. User needs are being assessed via
surveys and interviews. Technical, financial, and legal opportunities and
challenges are also being evaluated. This presentation will provide an
overview of the vision and planning efforts going into this new library.
![]()
CINF
7
Knowledge
management at Cytec Industries: Building the library of the future
David A. Breiner1, Joseph J. Kozakiewicz1,
Jeanne L. Courter1, Leonard Davis1, Raymond S.
Farinato1, Steven Greenhouse1, John H. Hillhouse2,
Nimal Jayasuriya1, James A. Jubinsky1, Dana B. Moore1,
J. Wilfredo Perez1, and Gary Walters1. (1) Cytec
Industries, 1937 West Main Street, Stamford, CT 06904, david.breiner@cytec.com,
(2) Phosphine Technical Center, Cytec Industries
Since
2003, the Cytec Technical Information Center (TIC) in Stamford, Connecticut,
has undergone a radical transformation. From moving its physical location to
hiring a new staff to launching a virtual library, the Cytec TIC has become
a center of excellence for learning, idea exchange, and innovation. As its
mission, the TIC partners with Cytec R&D to leverage appropriate
technology in order to search, archive, and disseminate internal and
external information in a cost-effective, user-friendly manner. To achieve
its mission, the Cytec TIC has designed and implemented a simple web portal
for instant “one-stop” global access to technical information. Primary
resources for external information include ACS, MicroPatent, Knovel,
Elsevier ScienceDirect, Teltech, and SRI Consulting, while a web-based
document management system is utilized for retrieving important internal
information. In addition, the Cytec TIC has become a hub for
cross-functional R&D activity by hosting scientific discussion forums
and weekly poster sessions. This presentation will highlight experiences
encountered during a Knowledge Management initiative including identifying
system requirements, process design, implementation issues, cultural
challenges, and lessons learned.
![]()
CINF
8 Virtually
virtual: The postmodern pharmaceutical library
Mary Laskow, Lou Ann Di Nallo, and Mary Talmadge-Grebenar,
Information & Knowledge Integration, Bristol-Myers Squibb, Rt. 206 &
Province Line Rd., PO Box 4000 J12-01, Princeton, NJ 08543, Fax:
609-252-6280, mary.laskow@bms.com
The
Research Libraries at Bristol-Myers Squibb serve a wide and varied audience,
with one of the main user groups comprised of chemists. Historically we have
given them primary focus from a collection and service viewpoint. As early
but sometimes reluctant adopters of ejournals and other electronic
resources, BMS chemists have, over time, become comfortable in the virtual
world. Increasing demands on our physical library spaces from other parts of
the organization have fortunately led to the opportunity to rethink our use
of space in a thoughtful fashion. Some of the areas we are addressing
include: increasing opportunities for collaboration, aligning chemical
information professionals with clients that they serve, and reducing our
collection footprint. The physical library will remain for at least the near
future until key chemistry reference resources become either available
electronically, or until pricing models evolve to make them more affordable.
![]()
CINF
9
Copyright basics
Eric S. Slater, Publications Division, Copyright Office, American
Chemical Society, 1155 Sixteenth Street, NW, Washington, DC 20036, Fax:
202-776-8112, e_slater@acs.org
This
session will feature a general discussion of basic United States Copyright
Law, including, but not limited to, such topics as subject matter of
copyright, exclusive rights of copyright, and duration of copyright.
Additionally, there will be a detailed discussion of ACS Publications
Division Copyright Policy and how United States Copyright Law ties in to ACS
Policy. In this regard, the speaker will cover why ACS requires transfer of
copyright from authors and discuss why this approach is beneficial to all
parties involved. Other related topics will include a detailed explanation
of the ACS Copyright Status Form, and specifically, to the rights that ACS
grants back to authors/employers of authors. Finally, the session will
conclude with a primer on the permissions process, and why it is important
to be aware of copyright when using material posted on the Internet.
![]()
CINF
10
Teaching
copyright to chemistry students
S. Scott Zimmerman, Department of Chemistry and Biochemistry, Brigham
Young University, C205 BNSN, Provo, UT 84602-5700, Fax: 801-422-0153,
scott.zimmerman@byu.edu
As
chemistry professors and students, we might ask the following questions
about copyright: Who owns the copyright to students' research reports and
laboratory notebooks? Can instructors make copies of a JACS paper and
distribute them to the students in their classes? What published materials
can instructors legally include in their course packets? Can graduate
students publish papers in scientific journals and then publish the same
papers in their theses or dissertations? In this presentation, I will try to
answer these and other questions about copyright. I will also outline
suggested topics and list online resources that instructors can use in
teaching copyright to chemistry students.
![]()
CINF
11 Solution
provider perspective: A brief case study in serving the customer and their
end-users
Robert Weiner, Senior Vice President, Copyright Clearance Center, 222
Rosewood Drive, Danvers, MA 01923, Fax: 978-750-0347, bweiner@copyright.com
The
demand for digital content is greater than ever, forcing both information
content users and rights holders to search for new ways to engender
compliance with U.S. copyright law. Rights holders want to maintain control
over how their intellectual property is used and at what cost, while
information consumers want to reproduce and disseminate material without
putting their institutions at risk of infringement litigation. Fortunately,
there are solutions.
![]()
CINF
12
Intellectual
property agreements
Gianna Arnold, Epstein Becker and Green, 1227 25th Street, NW, Suite
700, Washington, DC 20037-1175, Fax: 202-296-2882, garnold@ebglaw.com
Intellectual
property assets are critical for technology companies and often account for
a large percentage of such company's capital. Accordingly, appropriate
protection and leveraging of such assets can greatly enhance value and can
be crucial to success. Patents, trademarks, copyrights, trade secrets and
contracts are used to protect and leverage intellectual property assets.
This presentation will focus upon the use of contracts – both in-house
agreements and strategic alliances. Whether such contracts are used to
protect intellectual property rights, improve in-house capability or garner
revenue, the goal is to enhance the strength and value of the corporate
entity. Items discussed will include types of contracts, the licensing
process, and drafting considerations.
![]()
CINF
13
Publish and
your patent rights may perish
Alan M. Ehrlich, Weiss, Moy & Harris, P.C, 1101 Fourteenth St.,
N.W, Suite 500, Washington, DC 20005, Fax: 202-216-0083, aehrlich@weissmoyharris.com
Patents
are awarded for inventions of articles, methods and compositions that are
useful, novel, and not obvious to one ordinarily skilled in the art. A
patent's value stems from the fact that a patent owner initially has the
exclusive right to exclude others from making, using, selling or importing
the invention, and the owner can sell that exclusive right in whole or in
part. The novelty is lost if the invention has been published prior to
filing of a patent application. Thus, there is a potential conflict between
researchers' interests in publishing and their employers' desires to
maintain that exclusivity. This paper will outline those disclosures that
destroy patentability and ways to balance the interests of publication and
commercialization.
![]()
CINF
14
Harvesting the
scientific information in patent documents: What non-patent specialists
should know
William M. Mercier and Jan Williams, Chemical Abstracts Service,
Columbus, OH 43210, Fax: 703-435-0827, wmercier@cas.org
CAS
databases offer millions of patent references from more than 50 active
patent-issuing authorities around the world. These patents can be viewed not
only as documents of legal significance, but also rich sources of scientific
information; in fact, over 60 percent of the new small molecules CAS adds
each year to the CAS REGISTRYSM are from patent documents rather than
journal literature. The scientific information contained in these patent
records makes a broader scope of data available for research and data
analysis. Those patents records that qualify CAS selection criteria (those
covering chemistry, biochemistry and chemical engineering), are analyzed and
fully indexed by CAS scientists in less than 27 days from the date of issue.
Complementary to patent information, CAS references a wealth of journal
literature dating back to 1907. This information can assist in making
business critical decisions, direct a research project, or assess prior art
for patentability.
![]()
CINF
15 Text search
anomalies and how to cope with the "tough" searches in Pubmed for
your just-in-time knowledge needs
Soaring Bear, MeSH, NLM/NIH, 8600 Rockville Pike B2E17, Bethesda, MD
20894, Fax: 301-402-2002, soaringbear@nih.gov
As
much as one fifth of Medline subject header (MeSH) indexing vocabulary
(http://www.nlm.nih.gov/mesh/MBrowser.html) is modified each year to keep up
with additions and changes in science. Recent changes in MeSH will be
presented along with three easy steps you can follow to help you keep up
with and use the changes for better and faster search results.
Changes
in MeSH usually improves search results but can sometimes confuse searchers
and automated informatics tools. For instance, why does a search on the word
‘sweetening' fail to deliver 100 thousand citations on ‘sweetening
agents'? Why does a search on benzo[a]pyrene give a syntax error? Why does a
search on ‘plants' fail to find 20 thousand citations about ‘plant
extracts'. Why does a search on ‘anti-inflammatory' fail to get 60
thousand citations about ‘antiinflammatories'? MeSH is doing the best we
can to help provide good search results, but the multiplicity of word
meaning and the budget limits what any categorization scheme can do. You've
got to do the rest. Here's how.
![]()
CINF
16 Text and data
mining: Together at last!
Anthony J. Trippe, Science IP/Chemical Abstracts Service, 2540
Olentangy River Rd., Columbus, OH 43210, atrippe@cas.org
Many
techniques and tools have long been available to information professionals
for statistical analysis of fielded (structured) data. Lately, there has
been an increased focus on the analysis of textual (unstructured) data.
Traditionally, these forms of analysis have been conducted separately. In
general, it was not possible for the value and strengths of these approaches
to be combined. New software now allows the application of rigorous data
mining tools, e.g., data grouping and clean-up, to the creation of bar
charts and 2-D matrix charts from fielded data. It also allows the use of
text mining elements, including data harmonization, for the creation of
concept clusters and maps from unstructured data. Output from both is linked
and dynamically interactive. A brief discussion of the software's
capabilities will be followed by a case study on how the marriage of text
and data mining supports strategic business research by providing rapid,
insightful analyses.
![]()
CINF
17 Knowing when to
say "When..."
Farhad Soltanshahi, Michael S. Brusati, and Robert D. Clark, Tripos,
Inc, 1699 South Hanley Road, St. Louis, MO 63144
Sampling
large data sets efficiently is a computational challenge but it can also be
a philosophical one. Keeping structural diversity within the selected subset
high is important, but so is maintaining representativeness of the data set
as a whole. As the fraction of the data set selected increases, enhancing
diversity becomes increasingly expensive in computational terms, but of
progressively less value in practical terms. So when does it make sense to
stop worrying about diversity and shift over to straight random sampling?
Optimizable k-dissimilarity (OptiSim) is a stochastic selection method that
is uniquely positioned for addressing this question, in part because it
returns an ordered selection set in which the earlier selections being, on
average, measurably more distinctive and more representative than are later
ones.
![]()
CINF
18 Maximizing
chemical knowledge: New approaches in spectral data mining and search via
the successful consolidation of multi-technique spectral data
Gregory M. Banik1, Deborah Kernan2, Kevin
Scully3, and Marie Scandone3. (1) Bio-Rad
Laboratories, Informatics Division, 3316 Spring Garden Street, Philadelphia,
PA 19104, gregory_banik@bio-rad.com, (2) Bio-Rad Laboratories, Informatics
Division, (3) Informatics Division, Bio-Rad Laboratories, Inc
It
has become standard practice in multiple applications, such as compound
verification or unknown sample identification, for scientists to run a
sample and, using spectral search software, compare it to commercial and/or
proprietary reference databases of spectra. The software mines the reference
data and calculates a score or hit quality index (HQI) to describe the
correlation or “closeness” of the match between the spectrum being
examined and the spectra of known compounds in reference databases.
This
paper describes a new approach to spectral searching which gives scientists
who analyze samples using multiple spectral techniques the ability to
simultaneously combine all spectral information available to yield a single
search result. In a series of case studies, we will demonstrate how this
approach enables the optimization of chemical similarity and maximizes
chemical knowledge in order to identify several unknown samples.
![]()
CINF
19 Hierarchical
k-means clustering using principal components to solve the unsupervised
multi-class classification problem
James F. Rathman1, Syed B. Mohiddin1, and
Chihae Yang2. (1) Department of Chemical and Biomolecular
Engineering, The Ohio State University, Koffolt Laboratories, 140 West 19th
Avenue, Columbus, OH 43210-1110, Fax: 614-292-3769, rathman.1@osu.edu, (2)
Leadscope, Inc
Current
clustering techniques can be grouped as either supervised or unsupervised.
In a supervised method, each observation in the training dataset is
pre-assigned to a class based on prior knowledge, while an unsupervised
method uses no prior knowledge of the class distinction. Numerous supervised
techniques have been demonstrated to work well for binary classification and
a few of these are reasonably good at making supervised multi-class
predictions. However, techniques for unsupervised binary and multi-class
predictions have not been fully developed. In this work, we present an
analysis technique based on hierarchical K-means using differentially
weighted principal component analysis to address unsupervised classification
for both binary and multi-class problems. We demonstrate the methodology on
both biological (NCI 60 cancer cell lines dataset and acute leukemia
dataset) as well as chemical datasets with the objectives of predicting
class membership and identifying non-redundant features most responsible for
differentiating the observed classes.
![]()
CINF
20 Dynamic
equation of state evaluation with ThermoData Engine
Chris D. Muzny1, Eric W. Lemmon1, Robert D.
Chirico2, Vladimir V. Diky2, Qian Dong1,
and Michael Frenkel2. (1) Physical and Chemical Properties
Division, National Institute of Standards and Technology, 325 Broadway,
Boulder, CO 80305-3328, Fax: 303-497-5044, chris.muzny@nist.gov, (2)
Thermodynamics Research Center (TRC), National Institute of Standards and
Technology (NIST)
ThermoData
Engine (TDE) is a software tool recently released by the Thermodynamics
Research Center at the National Institute of Standards and Technology that
for the first time implements the concept of dynamic data evaluation for
thermodynamic property data. In this talk we will present an extension of
TDE that implements the dynamic data evaluation concept for pure fluid
equations of state. We will detail the performance of TDE in comparison to
established equations of state based on individual static data evaluations.
The specific equations of state we compare against are those presented in
NIST REFPROP, a software tool that delivers recent, state-of-the-art
equations of state for over 80 fluids. Full implementation of the dynamic
data evaluation concept requires continuous acquisition and storage of new
data. Toward this end we will also present an extension of TDE that allows
for on-demand TDE local database updates from a central server.
![]()
CINF
21 Leveraging open
access chemical information with Text Influenced Molecular Indexing
Richard D. Hull, Axontologic, Inc, 12565 Research Parkway, Suite 300,
Orlando, FL 32826
Research
and development of new text mining algorithms for drug discovery have been
hampered by the restricted availability of large, open access chemical
databases. Recent efforts to make more chemical information available to
researchers are opening promising new avenues of research. Text Influenced
Molecular Indexing (TIMI) is a process that discovers correlations between
structural components of chemical structures and the textual contexts that
these structures are described within, namely, the scientific literature,
internal research reports, and chemical patents. TIMI can identify
recognized and novel latent relationships between compounds, proteins,
genes, diseases and other domain concepts that are expressed across very
large textual corpora. A linchpin of this technique is the ability to
recognize chemical names within these texts and access their corresponding
chemical structures. We describe our work with TIMI as an example of what
can be done when large numbers of chemical structures are made available for
text mining purposes.
![]()
CINF
22 PubChem
Stephen H. Bryant, Computational Biology Branch, National Center for
Biotechnology Information, National Institutes of Health, Bldg. 38A, Rm.
5S504, Bethesda, MD 20894, Fax: 301-480-9241, bryant@ncbi.nlm.nih.gov
PubChem
is a new online information resource from NCBI. The system provides open
access to information on the biological properties of chemical substances.
Following the sequence-deposition model followed by GenBank, PubChem's
content is derived from user depositions of chemical structure and bioassay
data, including data from NIH's Molecular Libraries Roadmap initiative. The
PubChem retrieval system supports searches based on chemical names and
chemical structure, as well as searches based on bioassay descriptions and
activity values. It furthermore provides links to depositor sites, for
further information on each substance, as well as links to other NIH
resources such as the PubMed biomedical literature database and Entrez's
protein 3D structure database.
![]()
CINF
23 The ZINC
database as a new research tool for ligand discovery
John Irwin and Brian Shoichet, Department of Pharmaceutical
Chemistry, University of California, San Francisco, 1700 4th St, San
Francisco, CA 94143,
jji at cgl.ucsf.edu
(email address altered at author's request)
ZINC
is a free database of commercially available compounds for virtual
screening, available on the web at http://zinc.docking.org. ZINC represents
small molecules as biologically relevant models suitable for virtual
screening and other related applications. To make the database useful we
have focused on addressing commercial availability, "drug
likeness", stereochemical and regiochemical ambiguity of many supplier
catalogs, physical properties, protonation, charge and tautomeric equilibria.
The database may be searched and subsets created using on-line tools. Parts
of ZINC have been downloaded by thousands of institutions worldwide in
academia, government, and industry. ZINC continues to evolve: a dozen new
compound suppliers and millions of new compounds have been added over the
past year via quarterly releases. Numerous errors have been corrected thanks
to alert and helpful users. This presentation will discuss some applications
of ZINC as well as some of the ways we are trying make ZINC better. ZINC
relies extensively upon vendor catalogs, commercial software and GPLed
software which are acknowledged on our website. The delicate balance of
providing a freely available service based partly on commercial software
will be discussed.
![]()
CINF
24 MOLTABLE: An
open access intiative on molecular informatics
M Karthikeyan, Information Division (Digital information Resource
Centre), National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008,
India, Fax: +91-20-5893973, karthi@ems.ncl.res.in, and S Krishnan,
Information Division, National Chemical Laboratory
MolTable
is an open access initiative[1] to collect, compute and distribute the data
to academic and research community. Through this portal one can query large
number of molecules for similarity, computed molecular properties, etc., and
will be able to download the results in .csv format[2]. Since molecular
descriptors are extensively used for QSAR, QSPR, QSTR studies it was
proposed to compute descriptors such as topological, electronic, properties
data for all the molecules[3-4]. These data in combination with activity,
property or toxicity data can be used for building predictive models with
the aid statistical tools (PLS, PCR, kNN, SVM, ANN etc.). Some of the
molecules are linked with Dspace@NCL an open access initiative[5,6].
Molecular data can downloaded in standard SMILES format. The visualization
of the molecules achieved with the help of ChemAxon's MarvinViewer. Details
will be presented.
1.
http://moltable.ncl.res.in/index.htm 2. http://moltable.ncl.res.in/nrm/sample.txt
3. http://moltable.ncl.res.in/nrm/moltable.jsp 4. http://moltable.ncl.res.in/nrm/molprop.jsp
5. http://dspace.ncl.res.in/ 6. http://moltable.ncl.res.in/public/thesis_1130.jsp
![]()
CINF
25 Open access
chemical-information and computer-aided drug design resources
Marc C Nicklaus, Laboratory of Medicinal Chemistry, CCR, NCI, NIH,
Bidg.376 Boyles Street, Frederick, MD 21702, Fax: 301-846-6033, mn1@helix.nih.gov,
Markus Sitzmann, Laboratory of Medicinal Chemistry, CCR, National Cancer
Institute/Frederick, NIH, DHHS, Igor V. Filippov, Laboratory of Medicinal
Chemistry, National Cancer Institute, and Wolf-Dietrich Ihlenfeldt, Xemistry
GmbH
We
present an update on the tools and resources used in the drug design and in
silico screening work of the CADD Group at LMC, CCR, NCI. Many of these
chemoinformatics resources are implemented in the form of web services, and
open access is granted to the public for most of them at http://cactus.nci.nih.gov.
Web-based search interfaces are presented for databases with millions of
compounds using a search engine operating in distributed mode across a Linux
cluster. Many of these databases are being made publicly available,
including multi-million collections of commercial screening samples, as well
as data sets from various U.S. Government agencies. Also presented are new
automated tools for generating such web services, as well as tools and
services utilizing new calculable CACTVS hash code-based identifiers useful
for rapid compound identification and database overlap analyses.
![]()
CINF
26 Automatic
aggregation of open chemical data
Nick E Day1, Peter Murray-Rust2, Henry S. Rzepa3,
Simon M. Tyrrell4, and Yong Zhang4. (1) Department of
Chemistry, Unilever Centre for Molecular Sciences Informatics, Lensfield
Road, CB2 1EW Cambridge, United Kingdom, Fax: +44-1223-763076, ned24@cam.ac.uk,
(2) Unilever Centre for Molecular Informatics, University of Cambridge, (3)
Department of Chemistry, Imperial College of Science, Technology and
Medicine, (4) Unilever Centre for Molecular Science Informatics, University
of Cambridge
Most
experimental chemical data (e.g. crystal structures (80%), spectra (99%),
comp chem (>99%)) is never published in machine-understandable form and
is effectively lost. However where authors deposit it alongside publication,
either in repositories or as supplemental data to journal articles or
theses, we show that it can be extracted and preserved.
The
components of our process have been automated and are:
| a
workflow to manage the process | |
| conversion
of legacy structural formula (MOL, ChemDraw, SMI, etc.) to InChI (the
IUPAC chemical identifier) | |
| conversion
of crystallography (CIF), spectra (JCAMP) and computational chemistry (MOPAC,
GAMESS, etc.) to CML | |
| archival
in an Open XML-aware repository | |
| publication
of metadata through the Open academic repository system (e.g. DSpace,
eprints), disseminated using RSS and RDF. |
The
primary data object is the chemical compound, indexed by InChI and its
properties (with standard CML/RDF metadata). Robots can search collections
for compounds and properties and compile indexes of different degrees of
comprehensiveness or specialisation. We have shown that these are well
indexed by conventional search engines (Google(TM), MSN(TM)) thus removing
the need for specialised chemical software on the Chemical Semantic Web. The
search results are highly customisable and as they are Open can be used
directly for further scientific research or re-dissemination
All
software in this system ("WorldWideMolecularMatrix", WWMM) is
available as Open Source.
![]()
CINF
27 Predictive
models for genotoxicity based on discriminating structural features and
reassembled medicinal chemistry building blocks
Constantine Kreatsoulas1, Chihae Yang2, Glenn
J. Myatt2, and James F. Rathman3. (1) BMS, Princeton,
NJ 08543, constantine_kreatsoulas@merck.com, (2) Leadscope, Inc, (3)
Department of Chemical and Biomolecular Engineering, The Ohio State
University
A
chemical structure-based strategy is used to develop two classes of
predictive models of genetic toxicity as determined by the SOS Chromotest
assay. The SOS assay has high concordance with the standard Ames assay and
has been used successfully for numerous diverse compound classes. In one
approach, the MultiCASE algorithm was used to automate the extraction of
substructures for the prediction of genotoxicity. This model was then
applied to data sets for which SOS data is available.
In
addition to modeling the global results, models for chemically similar
subsets were also developed. For each specific dataset and endpoint,
predictive scaffolds were then constructed using structural features from a
library of 27,000 medicinal chemistry building blocks. Scaffolds were built
separately for the global dataset and each subset. Results are compared for
models built using partial logistic regression for both binomial and
multinomial ordinal toxicity endpoints.
![]()
CINF
28 Building and
using an in-house platform for data mining and analysis integrating open
source and proprietary software: I. Designing and constructing the framework
Erik Evensen, Hans E. Purkey, Ken Lind, and Erin K. Bradley,
Computational Sciences, Sunesis Pharmaceuticals Inc, 341 Oyster Point Blvd.,
South San Francisco, CA 94080, Fax: 650-266-3501, ee@sunesis.com
A
common problem faced by computational chemists is integrating and
transferring data among numerous and disparate systems. This process often
involves managing and translating multiple flat files, a process that does
not scale well to complex workflows with large data sets. We have
constructed a database-backed platform utilizing open source software,
primarily MySQL and Python, that enables building complicated data
management and analysis processes incorporating data generated by both open
and closed source software. In addition, we have developed internal
protocols based on open standards such as XML-RPC to make available
computational results both within and outside of our platform. By using
well-known, open standards, we are able to leverage widely available
knowledge and experience. We will present lessons learned and wisdom gained
during the development of this platform.
![]()
CINF
29 ABCD: From data
to insight
Dimitris K. Agrafiotis, Johnson & Johnson Pharmaceutical Research
& Development, L.L.C, 665 Stockton Drive, Exton, PA 19341, Fax:
610-458-8249
Johnson
and Johnson has recently unveiled ABCD (http://www.bioitworld.com/archive/061704/discovery.html),
an informatics platform that bridges multiple continents, data systems and
cultures using modern information technology, and provides researchers with
an environment that allows them to make better decisions. The system
consists of three major components: 1) a data warehouse, which combines data
from multiple chemical and pharmacological transactional databases,
organized using dimensional modelling principles to support supreme query
performance; 2) a state-of-the-art application suite, which facilitates data
upload, retrieval, mining and reporting, and 3) a workspace, which
facilitates collaboration by allowing users to share queries, templates,
results and reports across project teams, campuses, and other organizational
units. A central goal of ABCD is to provide users with the means to
retrieve, view and analyze multifactorial SAR data. Key to the success of
this effort is the ability to combine fast substructure and similarity
searching with conventional relational queries, and deliver the results in
an expedient and visually compelling format. In this presentation, we give
an overview of ABCD, and focus on a few core components that represent the
system's "chemical intelligence", including the chemical
cartridge, sketcher, molecular spreadsheet and interactive data mining
components.
![]()
CINF
30 Double focusing
by molecular bioactivity and drug likeness
Anwar Rayan, David Marcus, Ohad Givaty, Dinorah Barasch, and Amiram
Goldblum, Medicinal Chemistry and Natural Products, Hebrew University of
Jerusalem, School of Pharmacy, Jerusalem 91120, Israel, Fax: 972-2-675-8925,
anvarr@md.huji.ac.il, amiram@vms.huji.ac.il
We
have developed an Iterative Stochastic Elimination (ISE) algorithm to
construct sets of best results for highly complex combinatorial problems1-4.
The ISE was used to construct sets of molecular descriptor ranges that serve
as filters for distinguishing between drugs and non-drugs. Other methods
suggest filters that produce a binary result, acceptance or rejection of a
molecule as a drug candidate. We employ large sets of best filters to assign
a Drug Like Index (DLI) to any molecule, which corresponds to its chance to
belong to a database of drugs. A similar approach is applied to databases of
biological activity, for which a Molecular Bioactivity Index (MBI) is
produced for any specific activity. We find many molecules with a high DLI
value in large databases of non-drugs, and propose to examine them for their
bioactivity. These molecules are then assigned values of MBI for a specific
bioactivity. This double focusing approach with DLI and MBI is proposed as a
process for discovering molecules with specific biological activities in
large databases of known or of virtual molecules.
References:
(1)
Glick, M.; Rayan, A.; Goldblum, A. Proceedings of the National Academy of
Sciences of the United States of America 2002, 99, 703-708. (2) Glick, M.;
Goldblum, A. Proteins-Structure Function and Genetics 2000, 38, 273-287. (3)
Rayan, A.; Noy, E.; Chema, D.; Levitzki, A.; Goldblum, Current Medicinal
Chemistry 2004, 11, 675-692. (4) Rayan, A.; Senderowitz, H.; Goldblum, A.
Journal of Molecular Graphics and Modelling 2004, 22, 319-333.
![]()
CINF
31 Chemical
datamining approach to scaffold based QSAR studies of NCI anti-tumor dataset
M Karthikeyan, Information Division (Digital information Resource
Centre), National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008,
India, Fax: +91-20-5893973, karthi@ems.ncl.res.in, Letha Sebastian, Dept of
Bioinformatics, Amman College, and Alexander Tropsha, Laboratory for
Molecular Modeling, School of Pharmacy, University of North Carolina
National
Cancer Institute (NCI) has been carrying out in vitro screening of compounds
to determine their in vitro inhibitory activity of cell growth in the NCI 60
human cancer cell lines for the purpose of anticancer drug discovery. The
chemical structures along with their activity data were processed for
removing duplicate molecules and error structures. In this process about
32000 molecules with their reported biological activity data (NLOGGI50,
NLOGTGI, NLOGLC50) for 60 human tumor cell lines were organized in the
Oracle database table. Each molecule and their biological activity data were
linked to corresponding molecular descriptors using common identifier for
querying the database. Various molecular descriptors of type “topological,
electronic, quantum mechanical, 2D and 2D” along with predicted properties
such as molar refractivity, solubility logP(o/w) partition co-efficient, and
the drug likeliness related information related with Lipinsky rule of 5
including ‘number of rotatable bonds, number of hydrogen bond acceptors,
number of hydrogen bond donors, total polar surface area etc., were
calculated for all these molecules which are essentially required for QSAR/QSPR
analysis. Scaffold and functional group analysis was conducted on NCI data
set to identify the number of common scaffolds. [Fig-1]. Selected sets of
scaffolds were used for QSAR studies using MOE descriptors and in-built PLS,
PCR and other statistical methods. The methods of data-mining and
computational results are presented.
![]()
CINF
32 The use of
Random Forests for modeling in vitro ADMET endpoints
Jason D Hughes, Molecular Informatics, Pfizer, 620 Memorial Dr,
Cambridge, MA 02139
A
framework for molecular property/activity prediction consisting of a Random
Forest model coupled with a custom set of descriptors has been found to be
very effective across a variety of endpoints, including kinetic solubility,
membrane permeability, metabolic stability, and dofetilide binding. Random
Forests are bagged decision tree ensembles that are trained and applied
normally but for one exception: only a small, randomly selected subset of
descriptors are considered when selecting the best split at each node during
tree construction. The descriptors used here are all simple molecular
substructure or feature counts encoded as Daylight SMARTS queries. Some
mathematical properties of these RF-based models have been explored,
including the impact of descriptor and training set selection schemes,
nearest neighbor effects, etc. Additionally, examples will be given to
demonstrate that the effectiveness of this modeling paradigm compares
favorably to a selection of alternatives.
![]()
CINF
33 Web services as
integrators of public chemistry databases
Gary Wiggins, School of Informatics, Indiana University, 901 E. Tenth
Street, Bloomington, IN 47408-3912, Fax: 812-856-4764, wiggins@indiana.edu
PubChem
and other chemistry databases on the Web will provide a wealth of chemical
and biological information. We are embarking on a series of projects that
will utilize computer simulation and visualization environments to create an
integrated chemical informatics cyberinfrastructure built on modern
distributed service architectures. The projects will use the emerging
high-capacity computer networks, powerful data repositories, and computers
that comprise the Grid, thus ensuring scalability, computational efficiency,
and interoperability among heterogeneous components. A description of the
overall architecture of the projects and the planned links to the databases
will be presented.
![]()
CINF
34 Chemical and
biological data from DTP/NCI
Daniel W Zaharevitz, Information Technology Branch, Developmental
Therapeutics Program, National Cancer Institute, EPN, Room 8010, 6130
Executive Blvd, Bethesda, MD 20892, Fax: 301-480-4808, zaharevitz@dtpax2.ncifcrf.gov
The
Developmental Therapeutics Program (DTP) at the National Cancer Institute
has been acquiring compounds for testing since 1955. This effort has
resulted in the accumulation of a wealth of chemical and biological
information. DTP has made this information useful to the research community
by making the data publicly available and by developing tools that search
and analyze the data. Over 250,000 chemical structures and over 10 million
biological data points are available. Biological data includes measurement
of growth inhibition in sixty human tumor cell lines, growth inhibition in
yeast strains with defined mutations, protection from HIV in cell culture,
anti-tumor activity in numerous mouse tumor models in vivo, and several
other assays. Searches can be done by NSC number, CAS registry number,
chemical name, or chemical substructure. Development of a data architecture
for organizing this data will be discussed as well as plans for future
additions to the data.
![]()
CINF
35 Public
information databases for virtual screening
John Irwin and Brian Shoichet, Department of Pharmaceutical
Chemistry, University of California, San Francisco, 1700 4th St, San
Francisco, CA 94143, jji@cgl.ucsf.edu
Investigators
wishing to apply computational methods such as virtual screening to discover
novel ligands for proteins require a database of molecules suitable for
docking. To shorten the hypothesis-testing cycle, these compounds should be
commercially available and broadly "drug-like". To address this
need, which has been a barrier to entry to this field, we developed the ZINC
database of purchasable compounds for virtual screening, a collection
currently of 3.3M compounds available from over 20 vendors. Notwithstanding
our original goal of serving the virtual screening community, ZINC has
attracted the attention of cheminformaticists more generally as a source of
publicly available chemical structures for research. By the time of this
meeting, a large part of ZINC should have been loaded into PubChem, the new
database of chemical structures and screening data from NCBI that is tightly
linked into the chemical and biological literature. This link from PubChem
to ZINC complements the existing links from ZINC into PubChem, and to
compound vendor websites. We hope this growth of a web of publicly available
chemical information, linking the literature to 3D structures, properties,
and chemical suppliers, will be a boon to investigators, particularly those
who have hitherto not had access to this information. ZINC is on the web at
http://zinc.docking.org.
![]()
CINF
36
NIST
Computational chemistry comparison and benchmark database
Russell D. Johnson III, Computational Chemistry Group, National
Institute of Standards and Technology, 100 Bureau Drive Stop 8380,
Gaithersburg, MD 20899, Fax: 301-869-4020, russell.johnson@nist.gov
The
NIST Computational Chemistry Comparison and Benchmark Database (CCCBDB) is a
website and database which allows users to compare ideal-gas thermochemical
properties determined by experiment or by quantum chemical calculations. The
database contains experimental data for more than 640 small molecules, and
over 100 000 calculations. Types of data include enthalpies of formation,
entropies, geometries, vibrational frequencies, and dipole moments. The
primary goal of the CCCBDB is to allow comparisons of thermochemistry and
related properties (entropies, geometries, vibrational frequencies). The
CCCBDB illustrates the question “How good is that calculation?” by
providing many examples. This talk will describe the data present in the
CCCBDB, the tools available through the website for comparisons, and the
future plans of the CCCBDB. The CCCBDB is accessible at http://srdata.nist.gov/cccbdb.
![]()
CINF
37 Chemical
information databases for environmental fate and exposure assessments
Suzanne Bogaczyk1, Philip H. Howard2, William
M. Meylan2, Amy Hueber2, and Jay Tunkel2.
(1) Syracuse Research Corporation, 1215 South Clark Street, Suite 405,
Arlington, VA 22202, Fax: 703-418-1044, sbogaczyk@syrres.com, (2)
Environmental Science Center, Syracuse Research Corporation
Accurate
and dependable sources of chemical information are of great importance in
the assessment of chemicals for environmental purposes. Syracuse Research
Corporation (SRC) produces and maintains several databases of this type,
including the Environmental Fate Database (EFDB) and the physical properties
database (PHYSPROP). The EFDB, which is continually updated and maintained
at SRC, was developed in conjunction with the EPA to allow rapid access to
available environmental fate and physical/chemical properties data on
chemical substances. PHYSPROP contains a recommended single value for water
solubility, octanol water partition coefficient, melting and boiling point,
vapor pressure, Henry's Law constant, and hydroxyl radical rate constant for
over 25,000 chemicals. SRC also developed ChemS3, a web-based
search engine which allows sub-structure searches to be combined with
queries of text and numeric data. The compilation and versatility of these
databases to effectively search for environmental fate and exposure
information on chemical substances will be discussed.
![]()
CINF
38 3-D Database
search queries for colchicine binding site inhibitors
Ann Hermone, Tam Luong Nyguyen, James Burnett, Connor McGrath, Ernest
Hamel, Daniel W Zaharevitz, and Rick Gussio, Information Technology Branch,
Developmental Therapeutics Program, PO Box B, FVC 310, Frederick, MD 21702,
Fax: 301-846-6106, hermone@dtpax2.ncifcrf.gov
Microtubules,
which are linear arrays of alternating alpha and beta tubulin, are critical
for cellular proliferation and are therefore a target of cancer
chemotherapy. Colchicine was the first compound found to bind at the
interface of alpha and beta tubulin and to destabilize microtubules. Over
the years, a large number of structurally diverse small molecules have been
shown to bind at the colchicine site of tubulin and inhibit tubulin
polymerization. In other work by our group, docking studies involving the
recently-determined X-ray structure of the alpha,beta tubulin/colchicine
complex were used to construct binding models for a set of structurally
diverse colchicine site inhibitors, which subsequently formed the basis for
a common pharmacophore. This study expands on that work by developing
internally consistent Catalyst search queries that can discriminate between
colchicine site inhibitors and their inactive congeners.
![]()
CINF
39 Algorithms and
cancer drugs: In silico design of S100B ligands to block p53 binding
John L. Whitlow, Department of Chemistry, East Carolina University,
300 Science and Technology Building, Greenville, NC 27858, Fax:
206-424-1645, john@johnwhitlow.com
Cancer
is the leading cause of death for persons under the age of 85. Elevated
levels of S100B are associated with cancer. This research focused on
interactions between S100B and the tumor suppressor protein, p53. S100B
disrupts p53's protective function by inhibiting p53's C-terminal regulatory
domain phosphorylation. This study designed compounds to block the effects
of S100B on p53. Compounds that enhance p53's cellular function may provide
potent anticancer therapies.
Accelrys's
Cerius2 software was used for de novo drug design. The three dimensional
structure of S100B was analyzed to resolve its main interaction sites.
Fragment molecules were screened against targets of interaction in the S100B
active site. Top fragment molecules were used as scaffolds to design
complete ligand molecules. Additionally, public and private molecular
libraries were run through docking algorithms to locate existing molecules
with high affinities for the S100B active site. ADME and toxicity properties
were also investigated.
![]()
CINF
40 Framework for
integrating transcriptomic and proteomic profiles in Escherichia coli
Kunal Aggarwal, Leila H. Choe, and Kelvin H. Lee, School of Chemical
and Biomolecular Engineering, Cornell University, 120 Olin Hall, Ithaca, NY
14853, Fax: 607-255-9166, ka62@cornell.edu
We
have developed a model experimental system to study the relationship between
mRNA and protein expression profiles in genetically perturbed E. coli.
Experimental data at the genomic, transcriptomic and proteomic levels from
these cells are integrated on a common platform to understand the effects of
the introduced genetic and environmental perturbations in the cells at the
molecular level. The cells are perturbed to overexpress fragments of rhsA
in presence of IPTG and are observed to have a reduced growth rate. Gene
expression and protein abundance data from these cells suggests a perturbed
translation machinery and a non linear correlation between the mRNA and
protein levels in rhsA overexpressing E. coli cells. The gene
expression data is integrated with the connectivity information between
genes and their transcription factors using network component analysis to
gain information on altered levels of transcription factor activity and to
identify parameters that may cause the observed non linearity between the
mRNA and protein levels.
![]()
CINF
41 3-D-QSAR CoMFA
and COMSIA studies of novel alkoxylated and hydroxylated chalcones as
potential anti-malarial agents
Devendra S Puntambekar and Mange Ram Yadav, Department of
Pharmaceutical Chemistry, The M.S University of Baroda, Pharmacy department,
Faculty of Technology & Engineering, Kalabhavan, P.O Box - 51, Vadodara,
Gujrat, India, Baroda 390 001, India, Fax: +91-0265-2423898/2418927,
devendra_res@yahoo.co.uk
Comparative
molecular field analysis (CoMFA) and Comparative molecular similarity
indices (CoMSIA) was performed on a series of novel alkoxylated and
hydroxylated chalcones as antimalarial agents. The ligand molecular
superimposition on template structure was performed by atom/shape based RMS
fit methods. The removal of outliers from the initial set of 69 compounds
improved the predictivity of the models. The statistically significant model
was established from 52 compounds, which were validated by evaluation of
test set of 14 compounds. The atom and shape based alignment yielded best
predictive CoMFA model (r2cv = 0.674, r2ncv = 0.957, r2pred = 0.670, F value
= 83.040, r2bs = 0.992 with six components) while CoMSIA model yielded (r2cv
= 0.610, r2ncv = 0.913, r2pred = 0.726, F value = 50.115, r2bs = 0.947 with
seven components). The contour maps obtained from 3D-QSAR studies were
appraised for the activity trends of the molecules. The results indicate
that steric, electrostatic, hydrophobic and hydrogen bond donor substituents
play significant role in the antimalarial activity of these compounds
![]()
CINF
42 Automatic
molecular library generation of processed bioenzymes by proteolisys methods
for bioremediation processes
Vito Librando, Danilo Gullotto, and Zelica Minniti, Department of
Chemistry, University of Catania, via Andrea Doria 6, Catania, Catania,
Italy, vlibrando@unict.it, envch3@unict.it
The
goal of the present work concerns the implementation of informatic
procedures, able to interface themselves with application software
environments. Procedures were developed for computer processing in molecular
modeling fields and allow generation of molecular libraries, including data
relative to sequence and structure configurations of bio-enzymes. Each
library contains molecular structures that differs for several amino acid
delections inside specified molecules regions. So, it is possible to obtain
a collection of molecular fragments, sourced from the ancestral protein.
Protein side chains obtained by this strategy, were compatible with the
enzymatic proteolysis methods that are used on conventional laboratory
protocols and that was useful to decrease the time required to apply
experimental procedures. The developed methodology was able to identify many
chemical-physic properties in the source molecule, leading the selection
procedure to find out the most suitable residues candidates to proteolisys.
The program takes into consideration a set of index and parameters, related
both amino acids sequences properties (hydrophobicity) and the occurrence of
amino acids typology within secondary structures(helixs, sheets and loops).
Criteria used to perform the choice of residues suitable for proteolisys
methods were based on the capability to recognize many features in a protein
sequence. The advantage of a such strategy consists of allowing proteins to
maintain their structural and energetics features, without loss of
conformational changes in the secondary structure release avoiding,
consequently, a probable loss of the protein activity. Finally, this method
allows generation of a wide set of optimised fragmented structures that are
suitable to be tested and applied in subsequent computing molecular modeling
environment.
Acknowledgements
The Authors are grateful to MIUR for the financial support
![]()
CINF
43 Library
generation and lead selection for optimal laboratory procedure of
environmental biocatalists
Vito Librando, Danilo Gullotto, and Zelica Minniti, Department of
Chemistry, University of Catania, Viale Andrea Doria, 6, Catania 95127,
Italy, Fax: +39-095-580138, vlibrando@unict.it
Between
Sicilian contaminated sites, particularly the Siracuse Bay, poor attention
has been given to the pollution and remediation. The petroleum products that
remain as long term contaminants, include polycyclic aromatic hydrocarbons (PAHs),
that are a family of ubiquitous pollutants with similar biological activity,
high toxicity, mutagenic and carcinogenic power. This paper describe
preliminary results of an in situ treatment strategy using engineered
enzymes extracted from selected bacteria for low-cost bioremediation of
petroleum products that are poorly degraded by naturally-occurring bacteria.
Effects of sequence modification can be predicted using particular
algorithms, and it is possible to design and test numerous different active
molecules derived from the original ones. Multiple virtual delections of the
aminoacidical sequence were obtained working on the original PDB file, and
new sequence were annealed using force fields in molecular dynamics
simulations in which were considered real environmental parameters. The
structures were analyzed to find the ones with the best configuration of
active site and selective channels for the substrate; then multiple docking
simulations were performed for all the different substrates giving
information about the amount of the interactions between enzymes and
substrates of environmental interest. A complete scan of protein surface
were carried out using naphthalene as probe to find new eventual inactive
binding site that could hold the substrate far from the active site.
![]()
CINF
44 Modeling vs.
X-ray crystallography: The basal activity of constitutive androstane
receptor (CAR)
Björn Windshügel, Institute for Pharmaceutical Chemistry,
Martin-Luther-University Halle-Wittenberg, Wolfgang-Langenbeck-Str. 4, Halle
(Saale) 06120, Germany, bjoern.windshuegel@pharmazie.uni-halle.de
Abstract
text not available.
![]()
CINF
45 Mok: A
domain-specific language for molecular information processing
Ivan Tubert-Brohman and William L. Jorgensen, Department of
Chemistry, Yale University, 225 Prospect St., New Haven, CT 06520, Fax:
203-432-6299, Ivan.Tubert-Brohman@yale.edu
Mok
is a domain-specific language for molecular information processing, based on
the same execution paradigm as the AWK programming language. It is derived
from Perl and includes specialized functions and command-line options for
molecular file input and output, substructure matching, bond perception from
3D coordinates, and an object model for accessing and modifying various
properties of the atoms and bonds in a structure. It is freely available on
CPAN under the same license as Perl itself.
![]()
CINF
46 WinDock: An
integrated structure-based drug discovery environment using graphical user
interface
Zengjian Hu1, Donnell Bowen2, Shaomeng Wang3,
and William M. Southerland1. (1) Department of Biochemistry and
Molecular Biology, Howard University College of Medicine and the Howard
University Drug Discovery Unit, 520 West Street, Northwest, Room 324,
Washington, DC 20059, zhu@howard.edu, (2) Department of Pharmacology, Howard
University College of Medicine, (3) Comprehensive Cancer Center and
Department of Internal Medicine, The University of Michigan
In
recent years, virtual database screening using high-throughput molecular
docking (HTD) has emerged as a very important tool and method for finding
new leads in the drug discovery process. Most HTD efforts utilize expensive
workstations and hard-to-master Unix-like operating systems. With the advent
of powerful and inexpensive personal computers (PCs), it is now possible to
perform HTD investigations on Windows-based PCs. To make HTD more accessible
to a broad community, we present here WinDock, an integrated structure-based
drug discovery environment on Windows-based personal computers (PCs) which
integrates small molecule searchable 3D databases, homology modeling tools,
ligand-protein docking programs, and consensus scoring functions into a
cohesive system which provides a general tool for a wide range of
computer-aided drug discovery applications, including protein homology
modeling, lead identification, and lead optimization. WinDock is coded in
C++ language and is distributed free of charge for all users.
![]()
CINF
47 Turbo
similarity searching
Jérôme Hert1, Peter Willett1, David J. Wilton1,
Kamal Azzaoui2, Edgar Jacoby2, and Ansgar
Schuffenhauer2. (1) Department of Information Studies, University
of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom, j.hert@sheffield.ac.uk,
(2) Discovery Technologies, Novartis Institute for Biomedical Research
Previous
work has shown that fusing the outputs of similarity searches carried out
using different isoactive reference compounds produces a more effective
ranking than one based on just a single reference compound. Turbo similarity
searching applies this strategy using a reference molecule and its nearest
neighbours. The similar property principle implies that these neighbour
compounds are likely to have a similar bioactivity profile; accordingly it
may be worth including them in a fusion procedure. The effectiveness of this
method is investigated by means of simulated virtual screening experiments
using the MDL Drug Data Report Database. Extensive searches are carried out
for eleven diverse activity classes and consistently demonstrate the
superiority of turbo similarity searching over conventional similarity
search. This method hence represent a simple way of enhancing
similarity-based virtual screening methods.
![]()
CINF
48
On-line
submission and peer review systems
William G Town, Kilmorie Consulting, 24A Elsinore Road, London SE23
2SL, United Kingdom, bill.town@kilmorie.com
In
the last ten years, electronic publishing of the results of scientific
research has developed from being a novelty to being accepted as the normal
method of publishing. Systems for online submission of articles, for peer
review and for transmission of approved articles into the production
workflow systems which manage both print and electronic publishing are now
commonplace. This paper will review the technologies which have made this
transition possible and the impact of these systems on authors' and peer
reviewers' experience of publishing and on the timeliness of peer review and
publishing. The impact of preprint servers will also be discussed.
![]()
CINF
49
Path to
document recommendation services: Technologies that enabled the development
of on-line information systems
Gerry Grenier, Publishing Technologies, IEEE, Inc, 445 Hoes Lane,
Piscataway, NJ 08855, g.grenier@ieee.org
Online
information services are a 40-year-old phenomenon. The evolution of these
services has been on-going, with perhaps the most significant period of
change occurring over the past 10 years. The development of the internet
infrastructure, the rise of the world wide web and the http protocol spurred
an explosion of online information services that has evaporated temporal and
spatial barriers to information. Lost in the excitement of the development
of the internet are the developments of the previous 30 years that have
contributed significantly to the search and discovery capabilities of online
information services. Among those developments are full text search,
relevancy ranking, and markup languages. This paper will offer a look at the
development of these three technologies, and then review the state of the
art of the nascent service of document recommendation – a service that is
built upon the three aforementioned technologies.
![]()
CINF
50
Clustering and
meta-search as enabling technologies for rapid creation of vertical web
portals
Raul E. Valdes-Perez, Vivisimo Inc, 2435 Beechwood Blvd, Pittsburgh,
PA 15217, Fax: 412-422-2495, valdes@vivisimo.com
Specialized
web portals, or vortals, provide a comprehensive gateway for information on
a scientific specialty. The vortal fad of the late 90's stalled because of
costs, both human and technology, and the lack of a web business model. The
situation is now changed: the new technologies of search, meta-search, and
clustering enable rapid deployment of vortals that index one's own or public
information, meta-search partner search engines, and cluster the combined
information into categories. This opens up radical new possibilities for
publishers and for industry to deploy internal vortals for their scientists
and engineers.
![]()
CINF
51
Why your
library doesn’t do what you want it to
Stuart L. Weibel, Office of Research, OCLC Online Computer Library
Center, 6565 Frantz Road, Dublin, OH 43017, Fax: 614 764 2344, weibel@oclc.org
The
success and appeal of Lego blocks is more than esthetics; it is rooted in
engineering principles that comprise the foundation of the industrial age,
and are essential for systems engineering. The Lego metaphor gives us a
sound conceptual model for the design of information systems including the
principles of standardized interfaces, modular design, and extensibility.
There
is a darker aspect of information systems that can be modeled with Lego as
well, however, and this metaphor sheds some light on the difficulties we
have in the design and use of electronic publishing systems and digital
library technology in general. The Week-After-Christmas metaphor evokes a
box of dozens or hundreds of unique parts that, while interoperable in some
sense, can be recombined in a staggering array of configurations… not all
of which make sense. The rapid changes in a broadly distributed information
environment make it impossible to anticipate these changes and difficult
simply to accommodate them in a coherent way. The result is constant change
and a requirement for adaptation that is a new feature of the education and
research process.
The
challenge of designers and users is to recognize the useful bits that work
together, and configure them in sustainable, cost-effective systems that
meet the functional requirements of libraries and their constituents.
![]()
CINF
52
CAS Registry:
An evolving resource for science
Roger J. Schenck, Chemical Abstracts Service, 2540 Olentangy River
Road, Columbus, OH 43202, Fax: 614-461-7140
From
the inception of CA in 1907 and the publication of the first CA Substance
Index in 1920, the identification and storage of substance information from
the publicly-disclosed literature has always been a major focus at CAS.
Starting from a manually-curated 3X5 card file, the CAS Registry has evolved
into a computer-based collection approaching 80 million records for organic
and inorganic molecules, proteins, and sequences. The CAS Registry began as
a tool to serve the needs of CAS Production Operations but soon became and
remains today an essential adjunct to the work of researchers in academia,
industry, and government agencies around the world. This talk will
concentrate on the history of the CAS Registry, focusing on the changes in
computer technology that have enabled the evolution and huge growth of this
world resource.
![]()
CINF
53
Why are we
still reading "papers" in a digital world? Can papers become
digital, too?
David P Martinsen, ACS Publications, American Chemical Society, 1155
16th Street NW, Washington, DC 20036, Fax: 202-872-4389, d_martinsen@acs.org
The
last ten years have witnessed a revolution in the way scientists receive
information, with a remarkable impact on discovery and delivery. It is much
easier than ever before to find articles of interest and to obtain those
articles without ever leaving the lab or office. However, the method by
which scientists read articles has been evolutionary at best. Most of us
still print out a copy of the article to read and make notes. This talk will
examine some of the challenges to making reading a more digital experience,
to better realize the promises of the enhanced, digital editions of
articles.
![]()
CINF
54
Electronic data
standards for spectroscopy and analytical procedures
Antony N. Davies, Waters Informatics, Europaallee 27-29, 50226
Frechen, Germany, Fax: +49-2234-9207-99
With
the ever increasing availability of information in electronic form such as
in peer-reviewed journal articles, electronic patent submissions or
pharmaceutical submissions to regulatory bodies comes the equally increasing
pressure on standards bodies such as IUPAC and the ASTM International to
ensure that the associated data can be made available in open standard
formats. This talk will review the state-of-the art and identify
good-practice which scientists around the globe should adopt.
![]()
CINF
55
Science
online: Bridging scientific disciplines
Monica M. Bradford, Science, AAAS, 1200 New York Avenue, NW,
Washington, DC 20005, Fax: 202-289-7562, Mbradfor@aaas.org
Science,
the premier, inter-disciplinary journal of the world, has been helping
scientists communicate their peer-reviewed results to the scientific
community for over 125 years. For the last 10 of those years, Science has
embraced electronic publishing as a means of enhancing scientific
communication and helping scientists make their results more accessible.
Moving to an entirely electronic workflow for the journal has led to
decreased processing times, increased submissions, and a more international
review process. Forward and backward reference linking, multi-media
enhancements, online supplemental material, and a suite of tools have made
the online version of the journal a richer resource. Creation of two online
knowledge environments has allowed the staff to experiment with creating
online resources that expand beyond the traditional journal. Over the next
five years, the challenge will be to match technology to researchers'
behaviors to ensure that the communication vehicles match work styles and
information needs.
![]()
CINF
56
Publishing
innovation at the Royal Society of Chemistry
Richard Kidd and Robert Parker, Royal Society of Chemistry, Thomas
Graham House, Science Park, Milton Road, Cambridge CB4 0WF, United Kingdom,
Fax: +44 1223 420247, kiddr@rsc.org
The RSC implemented an XML