RE: V3000 molfile format
- From: Keith Taylor <K.Taylor;at;mdl.com>
- Subject: RE: V3000 molfile format
- Date: Thu, 5 Dec 2002 12:42:19 -0800
The V3000 molfile format is not new. It was first published in 1995 and
its
first role was to remove the arbitrary atom and bond limits that are built
into the V2000 format. Unfortunately the V2000 structure is very limiting
and is not extensible.
Large structure tend not to be distributed as molfiles and therefore there
have been very few of them in circulation. MDL's enhanced stereochemistry
introduces a number of new representation features that could only be fitted
in the V3000 format. We expect that the enhanced stereochemical
representation will have a noticeable impact on the number of V3000 format
molfiles in circulation.
The V2000 format has served us well and will continue to serve us for many
years. We have no plans to desupport it in our products. Structures that
can be totally represented in the V2000 will continue to be handled in a
V2000 format file. A V3000 format will be triggered only if the structure
contains features that cannot be represented in the V2000 format.
MDL is researching chemical structure file formats. XML is part of that
research and compatibility with CML is under consideration.
XML formats are very verbose and this imposes a large overhead when it comes
to parsing them. If you are dealing with a small number of structures this
overhead is tolerable. It is, however, common in our user base to need to
work on structure sets that contain 100,000+ entries. The overhead then
becomes significant and a more compact representation is required. This is
why applications that consume large numbers of structures tend to read and
write SDfiles or concatenated SMILES strings.
If anyone would like to engage in a more detailed discussion about this
issue or anything else connected with chemical structure representation
please contact me directly at k.taylor;at;mdl.com.
-----Original Message-----
From: Alberto Gobbi [mailto:agobbi;at;anadyspharma.com]
Sent: Wednesday, December 04, 2002 10:47 PM
To: CHEMISTRY;at;ccl.net
Subject: CCL:V3000 molfile format
Hi Everybody,
without wanting to be ofensive I would like to ask if you have really
considered all the options before creating a new file structure to store
structures.
The V2000 molfile format is well established and handles most of the cases
required so far. There are thousands of applications which can read and
write molfiles which would need to be modified. As one of your customers I
do not consider that you are really doing us a good service in creating a
new proprietary format.
Also XML is becoming the standard for persistently storing and transmitting
any kind of information worldwide in all different kinds of areas. There are
a lot of standard, open, well tested and robust applications and libraries
to read write and check for consistency of xml files. There is even an open
standard CML (http://www.xml-cml.org/) for storing chemical structures and
data based on XML. XML is carefully designed to be both flexible and
extensible. It's certainly more extensible and flexible than the V3000
format and would surely meet not just MDL's present needs but their future
needs as well. So if you really think that there is a need for storing
additional information I feel you would do your customers a better service
in supporting CML instead of creating a new standard which will cause a lot
of headaches to people who would like to exchange structures or simply
import them into their applications.
With best regards,
Alberto
===============================================
Alberto Gobbi
Anadys Pharmaceuticals
9050 Camino Santa Fe
San Diego CA, 92121
USA
-----------------------------------------------
AGobbi;at;AnadysPharma.com
Tel.: +1 858 530 3657
-----Original Message-----
From: Keith Taylor [mailto:K.Taylor;at;mdl.com]
Sent: Tuesday, December 03, 2002 8:24 AM
To: Computational Chemistry List
Subject: CCL:V3000 molfile format
If you use molfiles to transport structure information between applications,
you need to be aware that MDL is introducing an enhancement to its
stereochemical representation and this has an impact on the format of the
molfile. The enhanced stereochemical representation will require the use of
V3000 format molfiles and your molfile readers and writers will need to be
updated to handle this information.
MDL publishes the molfile format and the latest version of the document
(August 2002) can be downloaded from:
http://www.mdl.com/downloads/ctfile/ctfile_subs.html
-= This is automatically added to each message by mailing script =-
CHEMISTRY;at;ccl.net -- To Everybody | CHEMISTRY-REQUEST;at;ccl.net -- To
Admins
Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl;at;ccl.net