From chemistry-request@ccl.net Sat Jun 26 00:17:00 1993
Date: Fri, 25 Jun 1993 12:22:07 -0400
From: mckelvey@Kodak.COM
Subject: Gaussian92 on an SGI CRIMSON
To: chemistry@ccl.net
Message-Id: <9306251622.AA10210@Kodak.COM>


We would very much like to know what operating system and compiler levels
have been used on SGI CRIMSONS, with optimisation turned on, for G92.

We would like to know if the "hitches in our gitalongs" are of our own
doing, or have their causes lying elsewhere...We have to omit optimisation
for proper results.

I will gladly post the results!!

John McKelvey
Res Labs, Eastman Kodak
Rochester, NY
Fax 716-722-2327
Voice 716-477-3335


---
Administrivia: This message is automatically appended by the mail exploder:
CHEMISTRY@ccl.net --- everyone      CHEMISTRY-REQUEST@ccl.net --- coordinator
OSCPOST@ccl.net  send help from chemistry            Anon. ftp www.ccl.net
CHEMISTRY-SEARCH@ccl.net --- search the archives, read help.search file first
---


From SML108@PSUVM.PSU.EDU  Fri Jun 25 20:34:00 1993
Date: Sat, 26 Jun 1993 00:34 -0400 (EDT)
From: SML108@PSUVM.PSU.EDU
Subject: Re: Genetic algorithms for conformational searching
To: rsjuds@ca.sandia.gov
Message-Id: <01GZTCGG907M9GZN8Q@phem3.acs.ohio-state.edu>


Hi, the reference to my paper is
 
Le grand, S. M., and Merz, K. M., Jr (1993) The Application of the
Genetic Algorithm to the Minimization of Potential Energy Functions,
Journal of Global Optimization 3:49-66.
 
Thanks for all the references to the rest of the GA papers.  If anyone
else has any more, please post them to this list as I am putting together
a review on genetic algorithms and conformational search.
 
Scott Le Grand

From DSMITH@uoft02.utoledo.edu  Sat Jun 26 06:21:56 1993
Date: Sat, 26 Jun 1993 11:21:56 -0500 (EST)
From: "DR. DOUGLAS A. SMITH, UNIVERSITY OF TOLEDO" <DSMITH@uoft02.utoledo.edu>
Subject: full disclosure of methods?
To: chemistry@ccl.net
Message-Id: <01GZTY7NYTFM000SZH@UOFT02.UTOLEDO.EDU>


Mark Thompson recently wrote, regarding the SAM1 topic currently under 
discussion:

>Let me address the more fundamental issue that this topic 
>brings forth.  I share Graham Hurst's concerns. One of the 
>basic tenets of good science is that of reproducibility,
>and independant verification.

This is, I think, universally true and accepted.  However, it is rarely
followed.  For example, there has been talk over the years that people
who use molecular mechanics for their research should publish the parameters
used for each study as part of the paper (or at least they should publish
the differences between their parameters and the "standard" parameters in
the set used, e.g. MM2, AMBER, etc.).  I think that this issue was raised
in a paper by Peter Kollman a few years ago.  This is a particular problem
when people use MM2, which has been parameterized by many, many people in
addition to Lou Allinger for special situations and molecules.  Another 
place this occurs is in programs such as MacroModel, where parameters of all
types and qualities are available through user set switches.  Similarly,
in other commercial codes such as Hyperchem (with which I have experience)
many parameters other than the "standard" Allinger MM2 parameters exist.
(I will not discuss other vendor's codes because my experience is much more
limited with them.)  A similar problem occurred a few years ago when the
MMX force field in PCMODEL was being developed and expanded.  In house 
testing showed us that some of the parameters, particularly for organometallic
species, were not giving reasonable results.  (I believe that the current
MMX force field is much improved, and do not mean to cast any doubts on it.
My apologies to Kevin Gilbert.)

>If the results of a new method are published without 
>sufficiently describing the method to fulfill the above 
>criteria, then I personally could not take the results
>seriously.  Furthermore, I would never have recommended
>such work for publication.

While this is a real problem and a good argument for standardization, it is
in my opinion, a goal that is utopian and most likely not practical.  Part
of the problem is the codes and the proprietary nature of commercial
software.  Some of the problem is user naivete (i.e. the black box problem).
A question arises: is this the reason that results using commercial software
is so rarely published in most fields?  I almost never see modeling results
based on BioGraf, HyperChem, etc.  SYBYL results do appear, and so do
polymer modeling results from a wide variety of commercial codes, and even
MacroModel results (mostly from the academic community).  Or is the reason
that academics can't afford many commercial codes so don't use or publish
with them, while companies that purchase and use commercial codes keep 
their results in house and proprietary?

In addition, I do not agree that we should never recommend "such work" for
publication.  Often, as Andy Holder seems to be indicating, rapid 
communication of preliminary results with the promise of a more complete or
full disclosure of a method is very reasonable.  In the synthetic community
this is common -- look at how little experimental detail is provided in a
typical J. Am. Chem. Soc. or Tet. Letters communication.  

>I feel very strongly that when a new method is developed
>and implemented that it must pass the peer review process
>to gain legitimacy in the scientific community, regardless
>of whether most other scientists care to reimplement that
>method or not.

Again, in the specific case of SAM1, the method is publically available in
a Ph.D. dissertation from 1990 (if I remember Andy's posting correctly).
Besides, who ever said we had to reveal all our secrets and make them
readily available and accessible?  When software copyrights and patents 
really provide adequate protection, maybe I will agree with that attitude.

>Proprietary methods are fine, as long as it is openly
>known that they are proprietary.  Results of proprietary
>methods do not belong in the open scientific literature.

Then where do they belong?  Comparison of these results with "standard" and
commonly available "academic" results is healthy and stimulating.  And, not
to tweak Mark Thompson, who freely distributes Argus, what about Gaussian?
Many people no longer have access to G92 source code due to recent and
commercially driven changes.  Does that mean we cannot accept their results
in the open literature -- or must we decide based on whether or not their
results are from previously available pieces of the code rather than from
newer, proprietary sections?  Or what about the difference between someone
in industry who paid for the source code for MacroModel as compared to
the academic, such as myself, who only gets binaries?  Are my results to
be less acceptible because I don't have the absolute method available?  Or
are the industrial results less acceptible because they can be the results
of tweaking the code?

There are many, many issues hidden in this beast.  The scientific community
is just realizing that this beast is a tiger and that the tiger may have
a tail.  We still need to locate and identify the tail, grab it, and hang
on while figuring out how to keep the tiger from biting us.  My own conclusion
is that keeping the tiger in a dark cage called censorship would be the
worst thing we could do, and limiting access to the scientific literature
because someone's results came from what we thought might be a tiger but
had not proven to be one is not the best course of action.

Doug

Douglas A. Smith
Assistant Professor of Chemistry
The University of Toledo
Toledo, OH  43606-3390

voice    419-537-2116
fax      419-537-4033
email    dsmith@uoft02.utoledo.edu


From d3f012@pellucidar.pnl.gov  Sat Jun 26 01:51:42 1993
Date: Sat, 26 Jun 93 08:51:42 -0700
From: d3f012@pellucidar.pnl.gov
Subject: SAM1 reference, AM1 reference?
To: chemistry@ccl.net
Message-Id: <9306261551.AA05410@pellucidar.pnl.gov>


Andy Holder writes...

>4.  It should be noted that, whatever is stated to the contrary, suff-
>    icient detail will be published on SAM1 so that other establishments
>    and individuals will be able to generate reproducible code.  Note
>    that none of MJS Dewar's previous methods could have been coded 
>    from scratch as many have claimed, (see above) without reference 
>    to the code itself, either MOPAC or AMPAC.  it is impossible to 
>    obtain completely correct results for AM1
>    and MNDO from the papers alone.  Certain special corrections were
>    omitted from publication due to an oversight.  (These corrections
>    form the subject of another paper to be released shortly.)


Do I read this statement correctly?  It has been 8 years since the
original AM1 reference.  I presume these "special corrections"
were known at the time the method was coded?  This seems like 
an inordinately long time to wait for a complete description of
the method.  I will give you the benefit of the doubt and assume
I read this incorrectly!!

I would be very interested to see what these corrections are.
I encourage you to describe these corrections via the Internet
asap.

A couple of years ago, I coded the MNDO, AM1, and PM3 methods, 
in Argus, completely from scratch.  This included all the relevant 
integrals, using the local symmetry-based method as the papers 
suggested, etc.  

I have never had the benefit of inspecting the source code of either
AMPAC or MOPAC.  Later on, I did use an executable of MOPAC to 
compare some of my integral values.

Of course one should never merely code directly from the paper without
rederiving and verifying all the published equations, as I did.  
I did find some typos in the published work, as well as some 
inconsistencies in the way units were handled in some of the matrix 
element expressions.  I also included the fix suggested by Stewart in 
J. Comp. Chem 10, p 221 (1989) to fix the rotational variance in some 
of the (pp|pp) integrals.

I recall one specific instance that left me a little breathless:
I was trying to work out the units used in the published values
of the parameters used in nuc-nuc repulsions (especially the
K,L,M params used in the gaussian terms).  It turns out that,
to get consistent answers, one had to take the K,L,M
parameters directly from the literature at face value, 
use distances in angstroms, nuclear charges in atomic units,
and one ended up with energy in eV. It's as if all the relevant 
conversion factors are somehow buried in these parameters.  After 
some gynmastics, I did manage to coax everything into atomic units, 
which is what Argus uses internally.

I also recall that the published expressions for nuc-nuc term
were different in the original AM1 reference and Jim Stewart's
subsequent PM3 references.  I believe the AM1 reference was
wrong due to a typo. These are just a few of my experiences.

All tests I have carried out of geometry optimizations, dipole 
moments, etc have agreed well with the published values.  Of course 
I have not exhaustively tried all parameterized atoms, or published 
structures. I use cartesian coordinates rather than z-matrices.

In all fairness to Andy's statements, I have not yet distributed
my implementation of MNDO, AM1, and PM3.  I now feel encouraged
to release it and I'm sure more exhaustive use by a larger
group of chemists may indeed uncover some bugs.

Would anyone else out there who has implemented the MNDO-family
of methods care to comment on their experiences?  


**************************************************************************
Mark A. Thompson                    
Sr. Research Scientist              email:  d3f012@pnlg.pnl.gov
Molecular Science Research Center   FAX  :  509-375-6631
Pacific Northwest Laboratory        voice:  509-375-6734
PO Box 999, Mail Stop K1-90
Richland, WA.  99352

Argus available via anonymous ftp from pnlg.pnl.gov (130.20.64.11) (in the
argus directory).  Download the README file first.

Disclaimer:  The views expressed in this message are solely my own and
             do not represent Battelle Memorial Institute, Pacific 
             Northwest Laboratory, or any of its clients.
**************************************************************************


From st-amant@cgl.ucsf.EDU  Sat Jun 26 05:21:56 1993
Date: Sat, 26 Jun 93 12:21:56 -0700
Message-Id: <9306261921.AA12399@socrates.ucsf.EDU>
From: st-amant@cgl.ucsf.edu (Alain St-Amant)
To: DSMITH@uoft02.utoledo.edu, chemistry@ccl.net
Subject: Re: full disclosure of methods?


Douglas Smith recently wrote in the current discussion on the disclosure
of parameters:

> Besides, who ever said we had to reveal all our secrets and make them
> readily available and accessible?  When software copyrights and patents
> really provide adequate protection, maybe I will agree with that attitude.

I'll assume that Dr. Smith is referring to the specific algorithms that are
implemented that make the program more efficient but do not affect the final
results.  In which case, I might agree.

Of course, I couldn't disagree more if he is referring to some development
in the methodology that actually affects the final results in any way.

The point that interests me however, is the question of software copyrights
and patents to which Dr. Smith alludes.  I have been trying to get a feel
for what can be copyrighted and patented and I get a different answer from
everyone.  Can only specific code be copyrighted or can the structure and
algorithms be copyrighted as well?  How 'modified' should code be before
it can be called legally (and ethically if anyone is interested in expressing
an opinion) a new program?  Or is it simply forbidden for code to "evolve"
into a new program?

I will summarize to the net any e-mail sent to me, but I think that this would
make for an interesting discussion and it would be as interesting to hear how
people feel it "should be" as opposed to how it "is".

Sincerely,

Alain St-Amant
Department of Pharmaceutical Chemistry
University of California, San Francisco

From states@ibc.wustl.edu  Sat Jun 26 12:59:46 1993
Date: Sat, 26 Jun 93 17:59:46 CDT
From: states@ibc.wustl.edu (David J. States)
Message-Id: <9306262259.AA04271@ibc.WUStL.EDU>
To: chemistry@ccl.net, DSMITH@uoft02.utoledo.edu
Subject: Re: full disclosure of methods?


Mark Thompson recently wrote:

>Proprietary methods are fine, as long as it is openly
>known that they are proprietary.  Results of proprietary
>methods do not belong in the open scientific literature.
>...

and Douglas A. Smith replied:

	While this is a real problem and a good argument for
	standardization, it is in my opinion, a goal that is utopian 
	and most likely not practical.  Part of the problem is the 
	codes and the proprietary nature of commercial software. 
	...

These issues apply to results as well as methods.  Many scientific
results have significant commercial value purely as data (the crystal
structure of a receptor, the sequence of a disease gene, etc.), and
many scientists have been lax in distributing their results in a timely
manner, even after publication.  In the case of protein crystal
structures and molecular sequence data, a combination of peer pressure,
journal publication requirements, and some strong arm tactics on the
part of funding agencies have been necessary to see that data are
promptly submitted to the public databases.  As a result of this
pressure we now have macromolecular structure and sequence databases
that are extremely valuable public resources.

So lets consider some of guide lines:

1) Is the method or program available to most scientists?

	Most of us have no problem accepting results based on code
	compiled with a proprietary Fortran compiler.  The product is
	available to most people purchasing a workstation or PC of a
	particular architecture.  As long as other scientists can obtain
	the compiler to use your code, the use of a proprietary 
	compiler seems acceptable.

2) Is the method or function well defined?

	Again, the example of a compiler.  If the function is well
	defined then the use of a proprietary product seems acceptable.
	Conversely, consider a proprietary artificial intelligence engine.
	Most reviewers would find it hard to accept results based on a 
	program whose inner workings were not fully explained or were not
	publically available for examination.

3) Is it published?

	The usual implication of "publication" is that the results are
	publically available.  The academic currency of publications
	and promotions depends on others being able to make use of your
	results.  Prior to publication, your work is your own and you
	have the luxury of exploiting it as you see fit, but once you
	have published a paper, the data and code that were the basis
	for that paper need to be accessible to others so they can
	evaluate, test, and extend your work.  If you are not willing
	to give access to others, don't publish.

	Ancillary distribution mechansism are particularly important in
	fields like computational chemistry where it may not be
	feasible to fully elaborate the details of a calculation in a
	typical manuscript.  Conincident electronic distribution of
	supporting code and data through anonymous FTP sites, public
	database submissions, etc. is an implicit part of the
	publication process.

4) Who paid for it?

	This is the crux of the data issue, but the same reasoning
	can be applied to code development.  If something is the
	result of work performed under a publically funded reserach
	grant or award from a charitable foundation, it seems like the
	fruits of that research ought to be fully accessible to the
	academic community.  Diverting the rewards of work funded by
	public moneys to obtain substantial personal gain is hard to
	defend.

David States
Institute for Biomedical Computing / Washington University in St. Louis

From states@ibc.wustl.edu  Sat Jun 26 13:26:22 1993
Date: Sat, 26 Jun 93 18:26:22 CDT
From: states@ibc.wustl.edu (David J. States)
Message-Id: <9306262326.AA04289@ibc.WUStL.EDU>
To: st-amant@cgl.ucsf.edu, chemistry@ccl.net
Subject: Re: full disclosure of methods? (patents and copyrights)


|> Alain St-Amant wirtes:
|> The point that interests me however, is the question of software copyrights
|> and patents to which Dr. Smith alludes.  I have been trying to get a feel
|> for what can be copyrighted and patented and I get a different answer from
|> everyone.  Can only specific code be copyrighted or can the structure and
|> algorithms be copyrighted as well?  How 'modified' should code be before
|> it can be called legally (and ethically if anyone is interested in expressing
|> an opinion) a new program?  Or is it simply forbidden for code to "evolve"
|> into a new program?

The issues surrounding software patents are extremely complex and
precedents are often conflicting.  There is a USENET newsgroup devoted
soley to this subject (comp.patents).

A copyright protects the reproduction of a particular form of a piece
of work.  The structure of an alogrithm would, therefore, not generally
be considered copyrightable.  On the otherhand, simply changing the
variable names, or even disassembling object code and incorporating the
resulting sources into your own work is still basically reproducing the
previous representation of the work and therefore is covered by
copyright.

The structure of an algorithm may be patentable, and if a patent has
been issued you may be bound by it even if you independently derive the
algorithm or implement the code.  Well known example is the RSA public
key encryption.

David States
Institute for Biomedical Computing / Washington University in St. Louis