CCL: SUMMARY: Some Molecular File Format Descriptions
- From: jesus : at : canarylab.chem.nyu.edu (Jesus M. Castagnetto
M.)
- Organization: New York University, Department of Chemistry
- Subject: CCL: SUMMARY: Some Molecular File Format Descriptions
- Date: Mon, 21 Oct 1996 15:27:43 -0500 (EDT)
This is a summary of info I had previously and responses I got
with respect to my inquiry on molecular file formats.
The original message said:
> I have searched the CCL archives and several other web search
> engines/sites, but I could not find that anybody has compiled
> a list of the currently available molecular structure file
> formats, along with their respective format description.
> I know about the PDB, XYZ, MacroModel and some other formats,
> but not many. By no means my search was exhaustive, so if someone
> knows of good pointers related to this I will appreciate
> your input. I will summarize to the list the info I receive.
> Greetings and TIA for the help/hints/ideas/pointers.
Most of the people indicated BABEL(++), as a source of info on file
formats. I use it (almost) everyday, and can testify it is a fine
program, but what I was looking were *descriptions* of the formats,
in order to find out why some file I had were broken. Thanks to
Pat Walters and Math Stahl for a good program.
(++) BABEL: http://mercury.aichem.arizona.edu/babel.html
Here goes the info:
[1] I had gathered the info below about packages and file formats:
(a) Macromodel and related info:
http://www.columbia.edu/cu/chemistry/mmod/mmod.html
also the manual that accompanies the package describes
the formata in extenso.
(b) PDB format and related info:
http://pdb.pdb.bnl.gov
specially:"The Protein Data Bank Contents Guide: Atomic
Coordinate"
http://pdb.pdb.bnl.gov/Format.doc/Format_Home.html
(c) XMol general info (Minnesota Supercomputer Center, Inc.)
http://www.msc.edu/msc/docs/xmol/XMol.html
and the man page for XYZ (part of XMol)
XYZ(5MSC) Unix Programmer's Manual XYZ(5MSC)
NAME
XYZ - Cartesian molecular model file format
COPYRIGHT
: at : Copyright 1991 Research Equipment Inc. dba Minnesota
Supercomputer
Center
RESTRICTED RIGHTS LEGEND
Use, duplication, or disclosure of this software and its documentation by
the Government is subject to restrictions as set forth in subdivision {
(b) (3) (ii) } of the Rights in Technical Data and Computer Software
clause at 52.227-7013.
DESCRIPTION
XYZ datafiles specify molecular geometries using a Cartesian coordinate
system. This simple, stripped-down, ASCII-readable format is intended to
serve as a "transition" format for the XMol series of
applications. For
example, suppose a molecular datafile was in a format not supported by
XMol. In order to read the data into XMol, it would be possible to
modify the datafile, perhaps by creating a shell script, so that it fit
the relatively lenient requirements of the XYZ format specification.
Once data is in XYZ format, it may be examined by XMol, or converted to
yet another format.
The XYZ format supports multi-step datasets. Each step is represented by
a two-line "header," followed by one line for each atom.
The first line of a step's header is the number of atoms in that step.
This integer may be preceded by whitespace; anything on the line after
the integer is ignored. The second line of the header leaves room for a
descriptive string. This line may be blank, or it may contain some
information pertinent to that particular step, but it must exist, and it
must be just one line long.
Each line of text describing a single atom must contain at least four
fields of information, separated by whitespace: the atom's type (a short
string of alphanumeric characters), and its x-, y-, and z-positions.
Optionally, extra fields may be used to specify a charge for the atom,
and/or a vector associated with the atom. If an input line contains five
or eight fields, the fifth field is interpreted as the atom's charge;
otherwise, a charge of zero is assumed. If an input line contains seven
or eight fields, the last three fields are interpreted as the components
of a vector. These components should be specified in angstroms.
Note that the XYZ format doesn't contain connectivity information. This
intentional omission allows for greater flexibility: to create an XYZ
file, you don't need to know where a molecule's bonds are; you just need
to know where its atoms are. Connectivity information is generated
automatically for XYZ files as they are read into XMol-related
applications. Briefly, if the distance between two atoms is less than
the sum of their covalent radii, they are considered bonded.
FILES
/usr/local/etc/xmol/examples/*
sample datafiles
/usr/local/etc/xmol/xyz.types
table of atom types supported by XYZ format
/usr/local/etc/xmol/xyz.cnvt
conversion table for XYZ format
SEE ALSO
xmol(1MSC)
AUTHORS
Carolyn Wasikowski
Stefan Klemm
27 Apr 1993
(d) AMBER related info:
http://www.amber.ucsf.edu/amber/amber.html
and the AMBER file specifications:
http://www.amber.ucsf.edu/amber/formats.html
(e) CSD info in general at CCDC
http://csdvx2.ccdc.cam.ac.uk/
also the documentation that comes with the CD-ROM distribution.
(f) SPARTAN (from wavefuntion): Uses a cartesian coordinate representation
similar to the one used for XYZ files in its output file, minus
the charge (listed separately).
[2] From the responses I got the following pointers
(a) MDL formats (there is a PDF file with lots of info here)
http://www.mdli.com/prod/fileformats.html
(b) and another PDB info site
http://www.mi.uni-erlangen.de/~dosche/casihp.htm
Thank you to all who responded (list below in no particular
order, and I hope I am not missing anyone). Sorry I didn't
get to answer to each one individually:
Soaring Bear <bear : at : ellington.pharm.arizona.edu>
Pat Walters <pwalters : at : portal.vpharm.com>
Jonathan Baell <J.Baell : at : chem.csiro.au>
Dale Braden <genghis : at : darkwing.uoregon.edu>
Henry Chermette <CHERM : at : frcpn11.in2p3.fr>
Stefan Grzybek <grzybek : at : athena.chemie.uni-erlangen.de>
Bill Ross <ross : at : cgl.ucsf.EDU>
Ralph Puchta <Puchta : at : GWUP.org>
Willie Cui <microsim : at : nis.net>
Jasna Klicic <jasna : at : chem.columbia.edu>
Greetings.
P.S. Below it is a list of file formats babel undertands and
converts.
Babel 1.5 BETA -- Sep 29 1996 -- 22:48:48
for menus type -- babel -m
Usage is :
babel [-v] -i<input-type> <name> -o<output-type> <name>
"<keywords>"
Currently supported input types
alc -- Alchemy file
prep -- AMBER PREP file
bs -- Ball and Stick file
bgf -- MSI BGF file
car -- Biosym .CAR file
boog -- Boogie file
caccrt -- Cacao Cartesian file
cadpac -- Cambridge CADPAC file
charmm -- CHARMm file
c3d1 -- Chem3D Cartesian 1 file
c3d2 -- Chem3D Cartesian 2 file
cssr -- CSD CSSR file
fdat -- CSD FDAT file
gstat -- CSD GSTAT file
dock -- Dock Database file
dpdb -- Dock PDB file
feat -- Feature file
fract -- Free Form Fractional file
gamout -- GAMESS Output file
gzmat -- Gaussian Z-Matrix file
gauout -- Gaussian 92 Output file
g94 -- Gaussian 94 Output file
hin -- Hyperchem HIN file
sdf -- MDL Isis SDF file
m3d -- M3D file
macmol -- Mac Molecule file
macmod -- Macromodel file
micro -- Micro World file
mm2in -- MM2 Input file
mm2out -- MM2 Output file
mm3 -- MM3 file
mmads -- MMADS file
mdl -- MDL MOLfile file
molen -- MOLIN file
mopcrt -- Mopac Cartesian file
mopint -- Mopac Internal file
mopout -- Mopac Output file
pcmod -- PC Model file
pdb -- PDB file
psin -- PS-GVB Input file
psout -- PS-GVB Output file
msf -- Quanta MSF file
schakal -- Schakal file
shelx -- ShelX file
smiles -- SMILES file
spar -- Spartan file
semi -- Spartan Semi-Empirical file
spmm -- Spartan Mol. Mechanics file
mol -- Sybyl Mol file
mol2 -- Sybyl Mol2 file
wiz -- Conjure file
unixyz -- UniChem XYZ file
xyz -- XYZ file
xed -- XED file
Currently supported output types
diag -- DIAGNOTICS file
alc -- Alchemy file
bs -- Ball and Stick file
bgf -- BGF file
bmin -- Batchmin Command file
caccrt -- Cacao Cartesian file
cacint -- Cacao Internal file
cache -- CAChe MolStruct file
c3d1 -- Chem3D Cartesian 1 file
c3d2 -- Chem3D Cartesian 2 file
cdct -- ChemDraw Conn. Table file
dock -- Dock Database file
wiz -- Wizard file
contmp -- Conjure Template file
cssr -- CSD CSSR file
dpdb -- Dock PDB file
feat -- Feature file
fhz -- Fenske-Hall ZMatrix file
gamin -- Gamess Input file
gcart -- Gaussian Cartesian file
gzmat -- Gaussian Z-matrix file
gotmp -- Gaussian Z-matrix tmplt file
hin -- Hyperchem HIN file
icon -- Icon 8 file
idatm -- IDATM file
sdf -- MDL Isis SDF file
m3d -- M3D file
macmol -- Mac Molecule file
macmod -- Macromodel file
micro -- Micro World file
mm2in -- MM2 Input file
mm2out -- MM2 Ouput file
mm3 -- MM3 file
mmads -- MMADS file
mdl -- MDL Molfile file
miv -- MolInventor file
mopcrt -- Mopac Cartesian file
mopint -- Mopac Internal file
csr -- MSI Quanta CSR file
pcmod -- PC Model file
pdb -- PDB file
psz -- PS-GVB Z-Matrix file
psc -- PS-GVB Cartesian file
report -- Report file
smiles -- SMILES file
spar -- Spartan file
mol -- Sybyl Mol file
mol2 -- Sybyl Mol2 file
maccs -- MDL Maccs file
torlist -- Torsion List file
unixyz -- UniChem XYZ file
xyz -- XYZ file
xed -- XED file
-----
Jesus M. Castagnetto M. | "Organic Chemistry: The practice
Dep.of Chemistry - New York University | of transmuting vile substances
4 Washington Pl, Room 514. NY 10003 | into publications" (The Last Word-
jesus : at : canarylab.chem.nyu.edu | The Ultimate Scientific
Dictionary)