Semiempirical parameterization yet again...



 Netters,
    What a firestorm a few innocent comments in answer to a relatively
 simple question started up!  I'd like to very carefully (flame-proof
 clothing, I hope) take an opportunity to answer some of the issues
 raised by some of the people in a few previous postings last week in
 answer to my discussion of AM1 vs. PM3.  (An item I posted in reply
 to a direct question, I might add.)
    First, what seem to be the issues?  I have identified a few from
 the postings by Joe Leonard (JL) of Wavefunction, Craig Burkhart (CB)
 at Goodyear, John McKelvey at Kodak (JM), and Chris Cramer (CC) at
 the University of Minnesota.  (I don't want to put words in anyone's
 mouth, but below seems to an accurate rendering to me.  I have also
 "shaded" them perhaps with questions I have been asked and comments
 I have heard in the course of my travels and correspondence, so there
 may be some inferences here they didn't intend.)
 1.  The parameterization of semiempirical methods is a relatively
 simple "number-twiddling" task that is only limited by the speed
 of the computers at hand.  (JL and lots of other people)
 2.  Larger molecules should be used in the molecular basis sets for
 parameterization (we call this the MBSP).  (JL, CB)
 3.  More molecules and problem systems should be included in the MBSP
 for better results.  (JL, CB)
 4.  Better experimental data is now available to parameterize against
 and should be used.  (JL, CB)
 5.  Re-parameterization could give better results within the present
 semiempirical models of AM1 and PM3.  (JL)
 6.  Property-specific semiempirical methods should perhaps be developed
 to compute only what we are interested in and to allow us to forget the
 rest.  (JL, CB, CC)
 7.  Perhaps a new model is needed.  (CC)
 8.  Effort at semiempirical development is being duplicated.  (CC, JM)
 ("The snake oil salesman mounted the soapbox to confront the angry
 crowd.  He began to speak:")
 I'd like to try to answer these more or less one at a time, but first
 a few general comments, which I clearly identify as MY SCIENTIFIC
 OPINION.  As such, I would ask that it be respected just as you would
 anyone else's opinion.  This includes not calling me nasty names or
 accusing me of evil intentions.
   The development of a semiempirical quantum mechanical model is indeed
 similar to the development of a molecular mechanics force field in that
 the objective is to reproduce experimental data.  That is about where
 the similarity ends.  The underlying guts of the models are totally
 different!  I can safely say that MM has nothing to do with chemistry,
 but is more or less a fit to a convenient set of functions.  It
 does have chemical validity in that it gives us good answers to some
 questions.
   QM is an entirely different situation, in that the actual arbiters of
 chemistry, the electrons, are treated directly by mathematical functions.
 The chemistry come OUT of a model that describes the electrons.
 This may seem to be a matter of degree to some, but I think that it is an
 important distinction, especially as it finally applies to parameterization
 of quantum chemical models.
   Now, to answer the numbered items above.
 (1), (2), (3)  The speed of computers has indeed greatly enhanced the
 efficiency of parameterization.  Rather than develop "Several/many sets
 of params" quickly, my group has elected use this power to do parameter-
 izations more carefully than ever before possible.   This involves
 including larger systems and more molecules in the MBSP, a concept with
 which I heartily agree!  There are some things that will never be modeled
 properly unless we include them in the MBSP as larger molecules.  One
 of these gentlemen, John McKelvey pointed this out to me quite directly
 when referring to the poor reproduction of the twist angle in the diphenyl
 system with AM1.  We have also taken more care at examining the structure
 and complexity of the parameter hypersurface than ever before.  This
 hypersurface is an incredibly complex multidimensional mathematical
 construct.  One must examine it very carefully to find good starting
 points for further refinement.  This concept is directly analogous to
 the multiple-minima problem for conformer searches.  This discussion is
 also somewhat applicable to item (5).
 (4)  Agreed, more and better experimental data is available now.  We
 are trying to make use of this as much as possible.  It should be noted
 that this is only a marginal improvement over what went before rather than
 a quantum enhancement.
 (5), (7)  PM3 was essentially a re-parameterization of AM1 using a more
 mathematical approach (as opposed to chemical intuition and knowledge)
 to deriving parameters and applying very powerful computers.  It was
 better for some items than AM1, but as my previous posting showed, it
 was only marginally better.  It also possessed some severe problems.
 Chris is correct: to get better results, we need a new MODEL.  The work
 I am aware of along these lines is underway here in my lab at the
 University of Missouri-Kansas City (with C. Jie and R. Dennington)
 and under the direction of Prof. Walter Thiel at the Univ. of Zurich,
 in collaboration with A. Voityuk.
 Our work involves SAM1, which is a new model developed primarily under the
 guidance of Michael Dewar.  It is indeed a new model and it uses a new
 approach to compute the two-electron/two-center repulsion integrals (TERIs)
 for all systems.  This new approach allows us to also treat d-orbitals
 explicitly.  Results have been published for C, H, On, N, F, Cl, Br, and I
 in our new model and we intend to publish a method paper as soon as the final
 model for transition metals is finalized and tested.  (M. J. S. Dewar,
 C. Jie, G. Yu, Tetrahedron  23, 5003 (1993);  A. J. Holder, R. D.
 Dennington,  C. Jie, Tetrahedron  50, 627 (1994))
 Prof. Thiel's work involves MNDO/d, which uses a new version of the
 multipole expansion method to circumvent the TERI problem with d-orbitals.
 MNDO/d is an addition to the MNDO model developed by Prof. Thiel in collab-
 oration with Prof. Dewar in 1977.  (W. Thiel,  A. A. Voityuk, Theoretica
 Chimica Acta  81, 391 (1992);  W. Thiel,  A. A. Voityuk, International
 Journal of Quantum Chemistry  44, 807 (1993))
 (8)  As shown above, at least for the cases of Prof. Thiel's work and that
 work being done in my lab, effort is NOT being duplicated.  These are two
 clearly different approaches to the same problem, and I suspect that
 both will find users and adherents.
 (6)  I have left the property-specific semiempirical method question for
 last.  This one is also perhaps the one most prone to debate and opinion.
 Personally, I disagree with this entire concept.  Chris Cramer
 makes the point that one of the strengths of ab initio methods is their
 generality.  I would like to make this same case for the more popular
 semiempirical methods such as SINDO1, MINDO3, MNDO, AM1, PM3, SAM1, and
 MNDO/d.  Contrast these to INDO/S which is acknowledged to really work
 only for spectra.  (Yes, I know that papers have shown it works for other
 things sometimes, but even the authors of the method think of that as a
 fortuitous circumstance.)  Dewar's methods have found such wide approval,
 application, and acceptance simply because they are GENERAL.  If we paramet-
 erize semiempirical methods for heats for formation only or for a single
 class of compounds only, we may gain in accuracy, but we lose in the capa-
 bility to have a single approach to many problems.
   Now, back to what I was saying about models above.  (Bet you thought I
 wasn't going to get back to that, huh?)  Semiempirical models are chemical
 models based on a somewhat direct relationship to nature.  In my way of
 thinking, stretching this model away from all correspondence with some
 physical quantities (which are actually supposed to be IN THE MODEL) to
 gain an additional modicum of accuracy for the property YOU consider critical
 is not model development, it is "number-twiddling" of the highest
 order!
 Semiempirical parameters must fit into chemistry.  This is why we include
 dipole moments in the parameterization of a method that some people use
 to only compute heats of formation.  If the method cannot do some sort of
 systematic treatment of dipole moments and is based on the same quantum
 mechanical approach that brings us the heats of formation, how can we
 trust it?
   Finally, (bet you thought I'd never quit) where are the calls for special
 ab initio basis sets to compute reaction energetics?  What about special
 basis sets for fluoroethers?  Remember, basis sets are "parameterized"
 just like semiempirical methods.  (Where do you think those exponents come
 from?)  We don't ask for these because they would destroy the generality
 of the approach.  While some may choose the track of specialized semiempirical
 methods, we will proceed with developing general approaches.
 ("The snake oil salesman descends from the soapbox to thunderous
 applause.")
   Andy Holder
     4/18/94
 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                           DR. ANDREW HOLDER
          Assistant Professor of Computational/Organic Chemistry
 Department of Chemistry          ||  Internet Addr: aholder ^at^ vax1.umkc.edu
 Univ. of Missouri - Kansas City  ||  Phone Number:  (816) 235-2293
 Spencer Chemistry, Room 315      ||  FAX Number:    (816) 235-5502
 Kansas City, Missouri 64110      ||
 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=