From chemistry-request@ccl.net Sat Jun 26 00:17:00 1993 Date: Fri, 25 Jun 1993 12:22:07 -0400 From: mckelvey@Kodak.COM Subject: Gaussian92 on an SGI CRIMSON To: chemistry@ccl.net Message-Id: <9306251622.AA10210@Kodak.COM> We would very much like to know what operating system and compiler levels have been used on SGI CRIMSONS, with optimisation turned on, for G92. We would like to know if the "hitches in our gitalongs" are of our own doing, or have their causes lying elsewhere...We have to omit optimisation for proper results. I will gladly post the results!! John McKelvey Res Labs, Eastman Kodak Rochester, NY Fax 716-722-2327 Voice 716-477-3335 --- Administrivia: This message is automatically appended by the mail exploder: CHEMISTRY@ccl.net --- everyone CHEMISTRY-REQUEST@ccl.net --- coordinator OSCPOST@ccl.net send help from chemistry Anon. ftp www.ccl.net CHEMISTRY-SEARCH@ccl.net --- search the archives, read help.search file first --- From SML108@PSUVM.PSU.EDU Fri Jun 25 20:34:00 1993 Date: Sat, 26 Jun 1993 00:34 -0400 (EDT) From: SML108@PSUVM.PSU.EDU Subject: Re: Genetic algorithms for conformational searching To: rsjuds@ca.sandia.gov Message-Id: <01GZTCGG907M9GZN8Q@phem3.acs.ohio-state.edu> Hi, the reference to my paper is Le grand, S. M., and Merz, K. M., Jr (1993) The Application of the Genetic Algorithm to the Minimization of Potential Energy Functions, Journal of Global Optimization 3:49-66. Thanks for all the references to the rest of the GA papers. If anyone else has any more, please post them to this list as I am putting together a review on genetic algorithms and conformational search. Scott Le Grand From DSMITH@uoft02.utoledo.edu Sat Jun 26 06:21:56 1993 Date: Sat, 26 Jun 1993 11:21:56 -0500 (EST) From: "DR. DOUGLAS A. SMITH, UNIVERSITY OF TOLEDO" Subject: full disclosure of methods? To: chemistry@ccl.net Message-Id: <01GZTY7NYTFM000SZH@UOFT02.UTOLEDO.EDU> Mark Thompson recently wrote, regarding the SAM1 topic currently under discussion: >Let me address the more fundamental issue that this topic >brings forth. I share Graham Hurst's concerns. One of the >basic tenets of good science is that of reproducibility, >and independant verification. This is, I think, universally true and accepted. However, it is rarely followed. For example, there has been talk over the years that people who use molecular mechanics for their research should publish the parameters used for each study as part of the paper (or at least they should publish the differences between their parameters and the "standard" parameters in the set used, e.g. MM2, AMBER, etc.). I think that this issue was raised in a paper by Peter Kollman a few years ago. This is a particular problem when people use MM2, which has been parameterized by many, many people in addition to Lou Allinger for special situations and molecules. Another place this occurs is in programs such as MacroModel, where parameters of all types and qualities are available through user set switches. Similarly, in other commercial codes such as Hyperchem (with which I have experience) many parameters other than the "standard" Allinger MM2 parameters exist. (I will not discuss other vendor's codes because my experience is much more limited with them.) A similar problem occurred a few years ago when the MMX force field in PCMODEL was being developed and expanded. In house testing showed us that some of the parameters, particularly for organometallic species, were not giving reasonable results. (I believe that the current MMX force field is much improved, and do not mean to cast any doubts on it. My apologies to Kevin Gilbert.) >If the results of a new method are published without >sufficiently describing the method to fulfill the above >criteria, then I personally could not take the results >seriously. Furthermore, I would never have recommended >such work for publication. While this is a real problem and a good argument for standardization, it is in my opinion, a goal that is utopian and most likely not practical. Part of the problem is the codes and the proprietary nature of commercial software. Some of the problem is user naivete (i.e. the black box problem). A question arises: is this the reason that results using commercial software is so rarely published in most fields? I almost never see modeling results based on BioGraf, HyperChem, etc. SYBYL results do appear, and so do polymer modeling results from a wide variety of commercial codes, and even MacroModel results (mostly from the academic community). Or is the reason that academics can't afford many commercial codes so don't use or publish with them, while companies that purchase and use commercial codes keep their results in house and proprietary? In addition, I do not agree that we should never recommend "such work" for publication. Often, as Andy Holder seems to be indicating, rapid communication of preliminary results with the promise of a more complete or full disclosure of a method is very reasonable. In the synthetic community this is common -- look at how little experimental detail is provided in a typical J. Am. Chem. Soc. or Tet. Letters communication. >I feel very strongly that when a new method is developed >and implemented that it must pass the peer review process >to gain legitimacy in the scientific community, regardless >of whether most other scientists care to reimplement that >method or not. Again, in the specific case of SAM1, the method is publically available in a Ph.D. dissertation from 1990 (if I remember Andy's posting correctly). Besides, who ever said we had to reveal all our secrets and make them readily available and accessible? When software copyrights and patents really provide adequate protection, maybe I will agree with that attitude. >Proprietary methods are fine, as long as it is openly >known that they are proprietary. Results of proprietary >methods do not belong in the open scientific literature. Then where do they belong? Comparison of these results with "standard" and commonly available "academic" results is healthy and stimulating. And, not to tweak Mark Thompson, who freely distributes Argus, what about Gaussian? Many people no longer have access to G92 source code due to recent and commercially driven changes. Does that mean we cannot accept their results in the open literature -- or must we decide based on whether or not their results are from previously available pieces of the code rather than from newer, proprietary sections? Or what about the difference between someone in industry who paid for the source code for MacroModel as compared to the academic, such as myself, who only gets binaries? Are my results to be less acceptible because I don't have the absolute method available? Or are the industrial results less acceptible because they can be the results of tweaking the code? There are many, many issues hidden in this beast. The scientific community is just realizing that this beast is a tiger and that the tiger may have a tail. We still need to locate and identify the tail, grab it, and hang on while figuring out how to keep the tiger from biting us. My own conclusion is that keeping the tiger in a dark cage called censorship would be the worst thing we could do, and limiting access to the scientific literature because someone's results came from what we thought might be a tiger but had not proven to be one is not the best course of action. Doug Douglas A. Smith Assistant Professor of Chemistry The University of Toledo Toledo, OH 43606-3390 voice 419-537-2116 fax 419-537-4033 email dsmith@uoft02.utoledo.edu From d3f012@pellucidar.pnl.gov Sat Jun 26 01:51:42 1993 Date: Sat, 26 Jun 93 08:51:42 -0700 From: d3f012@pellucidar.pnl.gov Subject: SAM1 reference, AM1 reference? To: chemistry@ccl.net Message-Id: <9306261551.AA05410@pellucidar.pnl.gov> Andy Holder writes... >4. It should be noted that, whatever is stated to the contrary, suff- > icient detail will be published on SAM1 so that other establishments > and individuals will be able to generate reproducible code. Note > that none of MJS Dewar's previous methods could have been coded > from scratch as many have claimed, (see above) without reference > to the code itself, either MOPAC or AMPAC. it is impossible to > obtain completely correct results for AM1 > and MNDO from the papers alone. Certain special corrections were > omitted from publication due to an oversight. (These corrections > form the subject of another paper to be released shortly.) Do I read this statement correctly? It has been 8 years since the original AM1 reference. I presume these "special corrections" were known at the time the method was coded? This seems like an inordinately long time to wait for a complete description of the method. I will give you the benefit of the doubt and assume I read this incorrectly!! I would be very interested to see what these corrections are. I encourage you to describe these corrections via the Internet asap. A couple of years ago, I coded the MNDO, AM1, and PM3 methods, in Argus, completely from scratch. This included all the relevant integrals, using the local symmetry-based method as the papers suggested, etc. I have never had the benefit of inspecting the source code of either AMPAC or MOPAC. Later on, I did use an executable of MOPAC to compare some of my integral values. Of course one should never merely code directly from the paper without rederiving and verifying all the published equations, as I did. I did find some typos in the published work, as well as some inconsistencies in the way units were handled in some of the matrix element expressions. I also included the fix suggested by Stewart in J. Comp. Chem 10, p 221 (1989) to fix the rotational variance in some of the (pp|pp) integrals. I recall one specific instance that left me a little breathless: I was trying to work out the units used in the published values of the parameters used in nuc-nuc repulsions (especially the K,L,M params used in the gaussian terms). It turns out that, to get consistent answers, one had to take the K,L,M parameters directly from the literature at face value, use distances in angstroms, nuclear charges in atomic units, and one ended up with energy in eV. It's as if all the relevant conversion factors are somehow buried in these parameters. After some gynmastics, I did manage to coax everything into atomic units, which is what Argus uses internally. I also recall that the published expressions for nuc-nuc term were different in the original AM1 reference and Jim Stewart's subsequent PM3 references. I believe the AM1 reference was wrong due to a typo. These are just a few of my experiences. All tests I have carried out of geometry optimizations, dipole moments, etc have agreed well with the published values. Of course I have not exhaustively tried all parameterized atoms, or published structures. I use cartesian coordinates rather than z-matrices. In all fairness to Andy's statements, I have not yet distributed my implementation of MNDO, AM1, and PM3. I now feel encouraged to release it and I'm sure more exhaustive use by a larger group of chemists may indeed uncover some bugs. Would anyone else out there who has implemented the MNDO-family of methods care to comment on their experiences? ************************************************************************** Mark A. Thompson Sr. Research Scientist email: d3f012@pnlg.pnl.gov Molecular Science Research Center FAX : 509-375-6631 Pacific Northwest Laboratory voice: 509-375-6734 PO Box 999, Mail Stop K1-90 Richland, WA. 99352 Argus available via anonymous ftp from pnlg.pnl.gov (130.20.64.11) (in the argus directory). Download the README file first. Disclaimer: The views expressed in this message are solely my own and do not represent Battelle Memorial Institute, Pacific Northwest Laboratory, or any of its clients. ************************************************************************** From st-amant@cgl.ucsf.EDU Sat Jun 26 05:21:56 1993 Date: Sat, 26 Jun 93 12:21:56 -0700 Message-Id: <9306261921.AA12399@socrates.ucsf.EDU> From: st-amant@cgl.ucsf.edu (Alain St-Amant) To: DSMITH@uoft02.utoledo.edu, chemistry@ccl.net Subject: Re: full disclosure of methods? Douglas Smith recently wrote in the current discussion on the disclosure of parameters: > Besides, who ever said we had to reveal all our secrets and make them > readily available and accessible? When software copyrights and patents > really provide adequate protection, maybe I will agree with that attitude. I'll assume that Dr. Smith is referring to the specific algorithms that are implemented that make the program more efficient but do not affect the final results. In which case, I might agree. Of course, I couldn't disagree more if he is referring to some development in the methodology that actually affects the final results in any way. The point that interests me however, is the question of software copyrights and patents to which Dr. Smith alludes. I have been trying to get a feel for what can be copyrighted and patented and I get a different answer from everyone. Can only specific code be copyrighted or can the structure and algorithms be copyrighted as well? How 'modified' should code be before it can be called legally (and ethically if anyone is interested in expressing an opinion) a new program? Or is it simply forbidden for code to "evolve" into a new program? I will summarize to the net any e-mail sent to me, but I think that this would make for an interesting discussion and it would be as interesting to hear how people feel it "should be" as opposed to how it "is". Sincerely, Alain St-Amant Department of Pharmaceutical Chemistry University of California, San Francisco From states@ibc.wustl.edu Sat Jun 26 12:59:46 1993 Date: Sat, 26 Jun 93 17:59:46 CDT From: states@ibc.wustl.edu (David J. States) Message-Id: <9306262259.AA04271@ibc.WUStL.EDU> To: chemistry@ccl.net, DSMITH@uoft02.utoledo.edu Subject: Re: full disclosure of methods? Mark Thompson recently wrote: >Proprietary methods are fine, as long as it is openly >known that they are proprietary. Results of proprietary >methods do not belong in the open scientific literature. >... and Douglas A. Smith replied: While this is a real problem and a good argument for standardization, it is in my opinion, a goal that is utopian and most likely not practical. Part of the problem is the codes and the proprietary nature of commercial software. ... These issues apply to results as well as methods. Many scientific results have significant commercial value purely as data (the crystal structure of a receptor, the sequence of a disease gene, etc.), and many scientists have been lax in distributing their results in a timely manner, even after publication. In the case of protein crystal structures and molecular sequence data, a combination of peer pressure, journal publication requirements, and some strong arm tactics on the part of funding agencies have been necessary to see that data are promptly submitted to the public databases. As a result of this pressure we now have macromolecular structure and sequence databases that are extremely valuable public resources. So lets consider some of guide lines: 1) Is the method or program available to most scientists? Most of us have no problem accepting results based on code compiled with a proprietary Fortran compiler. The product is available to most people purchasing a workstation or PC of a particular architecture. As long as other scientists can obtain the compiler to use your code, the use of a proprietary compiler seems acceptable. 2) Is the method or function well defined? Again, the example of a compiler. If the function is well defined then the use of a proprietary product seems acceptable. Conversely, consider a proprietary artificial intelligence engine. Most reviewers would find it hard to accept results based on a program whose inner workings were not fully explained or were not publically available for examination. 3) Is it published? The usual implication of "publication" is that the results are publically available. The academic currency of publications and promotions depends on others being able to make use of your results. Prior to publication, your work is your own and you have the luxury of exploiting it as you see fit, but once you have published a paper, the data and code that were the basis for that paper need to be accessible to others so they can evaluate, test, and extend your work. If you are not willing to give access to others, don't publish. Ancillary distribution mechansism are particularly important in fields like computational chemistry where it may not be feasible to fully elaborate the details of a calculation in a typical manuscript. Conincident electronic distribution of supporting code and data through anonymous FTP sites, public database submissions, etc. is an implicit part of the publication process. 4) Who paid for it? This is the crux of the data issue, but the same reasoning can be applied to code development. If something is the result of work performed under a publically funded reserach grant or award from a charitable foundation, it seems like the fruits of that research ought to be fully accessible to the academic community. Diverting the rewards of work funded by public moneys to obtain substantial personal gain is hard to defend. David States Institute for Biomedical Computing / Washington University in St. Louis From states@ibc.wustl.edu Sat Jun 26 13:26:22 1993 Date: Sat, 26 Jun 93 18:26:22 CDT From: states@ibc.wustl.edu (David J. States) Message-Id: <9306262326.AA04289@ibc.WUStL.EDU> To: st-amant@cgl.ucsf.edu, chemistry@ccl.net Subject: Re: full disclosure of methods? (patents and copyrights) |> Alain St-Amant wirtes: |> The point that interests me however, is the question of software copyrights |> and patents to which Dr. Smith alludes. I have been trying to get a feel |> for what can be copyrighted and patented and I get a different answer from |> everyone. Can only specific code be copyrighted or can the structure and |> algorithms be copyrighted as well? How 'modified' should code be before |> it can be called legally (and ethically if anyone is interested in expressing |> an opinion) a new program? Or is it simply forbidden for code to "evolve" |> into a new program? The issues surrounding software patents are extremely complex and precedents are often conflicting. There is a USENET newsgroup devoted soley to this subject (comp.patents). A copyright protects the reproduction of a particular form of a piece of work. The structure of an alogrithm would, therefore, not generally be considered copyrightable. On the otherhand, simply changing the variable names, or even disassembling object code and incorporating the resulting sources into your own work is still basically reproducing the previous representation of the work and therefore is covered by copyright. The structure of an algorithm may be patentable, and if a patent has been issued you may be bound by it even if you independently derive the algorithm or implement the code. Well known example is the RSA public key encryption. David States Institute for Biomedical Computing / Washington University in St. Louis