From owner-chemistry@ccl.net Sun Sep 25 00:48:24 2005 From: "CCL" To: CCL Subject: CCL: Cleaning up dusty deck fortran and converting to C/C++ Message-Id: <-29292-050925001339-10992-dttFXCZCQ7JFRv3q5So+OA]_[server.ccl.net> X-Original-From: "Perry E. Metzger" Content-Type: text/plain; charset=us-ascii Date: Sun, 25 Sep 2005 00:13:33 -0400 MIME-Version: 1.0 Sent to CCL by: "Perry E. Metzger" [perry]_[piermont.com] --Replace strange characters with the "at" sign to recover email address--. >> Sadly, the more appropriate languages for this sort of work have become >> desperately unpopular... > > I gotta ask: what languages would you consider most appropriate for > this sort of work (computational chemistry calculations)? In general, I think it is better to work in strongly typed languages which do full run time error checking. Unfortunately, as I said, most such languages are now highly unpopular. Anyway, any of the Algol descendants (Pascal, Modula-2, Modula-3, even Ada) would work quite nicely from the point of view of people used to ordinary procedural languages. Unfortunately, compilers for such languages are not well maintained these days, because no one cares about them any longer. Because of that, C is a reasonable compromise for numerically intensive code. The compilers are generally excellent and the tools are very good -- but you have to be a damn careful programmer not to cut yourself on the sharp edges. I wouldn't recommend Java, both because of performance with a VM based system for numerical analysis intensive code, and because it does not automatically detect numerical overflow/underflow, though it does get upset about array bounds violations. If your code isn't computationally bound and the slight holes in the safety aren't an issue, Java certainly is better for you than C. Of course, if you aren't concerned about performance, Python seems even nicer. Python is a lot of fun, if a bit odd, but the interpreter is very slow. I do not recommend Perl for this sort of thing. (I think the bioinformatics people who use it are not using the right tool for their job.) Believe it or not, I'd actually say that Lisp is a pretty good choice, especially the implementations with very good compilers for numerical work like CMUCL, SBCL, and various commercial compilers like Allegro Common Lisp. Lisp is very alien to non-computer science types, and even to many computer scientists, but if you learn it well, it allows you to do a whole lot in the way of high quality abstraction -- you can write a lot of code very fast, and if you're not using an interpreter, the code is often as fast as you can get in any other compiled language. Were Lisp not so unloved and so ill supported in many environments, I'd push it even harder. A word for a moment about tools. If you are a cabinet maker, the difference between a good tool and a bad tool, and the difference between knowing which tool is good and which is bad, and the difference between knowing how to use the good tool and not knowing how to use it, all have very obvious impact both on the quality of the furniture you build and on how fast you can build it. This is entirely obvious. However, many computational chemists try to "go cheap" on learning about their tools and picking good ones. That means the difference between building fast, flexible and maintainable systems quickly, and building not so fast, not so flexible and not so maintainable systems not so quickly. It is no less obvious with computational chemistry than with cabinetmaking that you need to know your tools and know them well. Computational chemistry is a two part discipline. You really can't neglect the computer science side of things any more than you can neglect the chemistry side of things. It makes all the difference. Perry From owner-chemistry@ccl.net Sun Sep 25 00:51:49 2005 From: "CCL" To: CCL Subject: CCL: Cleaning up dusty deck fortran and converting to C/C++ Message-Id: <-29293-050925003925-28058-MqiLeptf2pgZpabwFjDh6w]*[server.ccl.net> X-Original-From: "Perry E. Metzger" Content-Type: text/plain; charset=us-ascii Date: Sun, 25 Sep 2005 00:39:21 -0400 MIME-Version: 1.0 Sent to CCL by: "Perry E. Metzger" [perry]*[piermont.com] --Replace strange characters with the "at" sign to recover email address--. >> I've seen many efforts in my Computer Science professional career in >> which people attempted to keep old code alive at all costs -- and in >> each case, it ended up costing far more human labor and far more money >> than a straight rewrite would cost. Rewriting, using the original code >> as a model and as a mechanism for validating regression tests, is >> often a superior solution. > > Alas, I have also seen some very high profile cases where old Fortran > codes were discarded in favor of new "object orientated C++" codes > that still have yet to match the old "legacy" codes they were supposed > to replace. One tip -- we computer science types like the words "programs" or "software" or "code" singular. "Codes" generally says to us "this person isn't a member of the guild" -- if you're a consultant, when the client says "codes" is the point at which you start adding zeros to your estimate. "Softwares", by the way, is even worse. Sure, you can write really crappy code in any language. However, some languages really make it much harder to write good code. Fortran and Cobol are at the top of my list for that. > Good or bad code can be written in many languages. Sure. To quote an old saw in CS, "You can write Fortran in any language", and many people have. You can write stuff in C that could be written right in Fortran -- no recursion, no data structures but arrays, no function pointers, etc. -- and many people who don't know better do. You can also use all those things but use them incorrectly and end up with a mess. There is never any substitute for knowing what you are doing. However, people who know what they're doing stay far from Fortran. We made used to make fun of people who still wrote in Fortran when I was an undergraduate decades ago, and we also said "and they'll still be writing in Fortran in 1990 I bet" as though that were an impossibly funny joke. Little did we know they'd still be using Fortran in 2005. Writing software in Fortran vs. writing software in a language that has things like structs, pointers and recursion is like cooking using only a small bunsen burner, a dull knife and a cheap aluminum pan, versus having a professional kitchen complete with a Viking range, knives from Global, good cookware and a 650hp food processor. Yes, both will get the job done, but what a difference in the difficulty of getting the job finished and the pleasure taken in the task. Incidently, proper editors, debuggers etc. are also important, as is knowing your way around the OS you use. Know your tools. > Besides, rewriting 200,000 lines of Fortran into a new language while > simultaneously changing all of the data structures is a pretty darn > big task that doesn't result in any publications. Yup, but if you know what you're doing it pays a dividend in the end. There are algorithms you just can't express in Fortran cleanly. I used to do high performance computer graphics work -- you couldn't even express your models reasonably in Fortran (though of course people often made do anyway). A lot of problems in computational chemistry look pretty similar, by the way. When I see the circumlocutions people use in Fortran to do what would be trivial in a better language -- shudder. It is much like watching someone trying to make furniture with an axe. Yes, if you work really hard at it, you can produce something -- but it won't work as well as using tools meant for the job. Sure, rewriting your code won't get you publications, but when you're done you can do new things faster. The name of the game, after all, is economizing on manpower and getting the computer to subsume more and more of your task and to do its work as fast as possible. Failing to invest in your tools is pennywise but pound foolish. > If I were ever to replace my Fortran code with C I would initially > translate and then begin changing some of the data structures as > desired within the new framework. Arrays are still the most efficient > data structure for many tasks, but it would be nice to have a few > linked lists and some memory allocation once in a while. Arrays are rarely the first tool I think of -- or even ever a tool I think of. They're fine tools for building other data structures -- they're often hidden deep underneath the covers of things like hash tables and such -- but what you want to be thinking of is *abstractions*, not *implementations*. The non-professional thinks first of what to implement, the professional thinks first about what abstractions to build. In any case, I can't imagine writing most software without real data structures. If you don't know why you want to be able to build clean hash tables, priority queues, search trees, etc., then you don't know why your programs are running orders of magnitude slower than they need to. The difference between the right and wrong data structure is the difference between trying to cut down a tree with a hand saw and trying to cut it down with a chain saw. -- Perry E. Metzger perry]*[piermont.com From owner-chemistry@ccl.net Sun Sep 25 12:31:49 2005 From: "CCL" To: CCL Subject: CCL: Cleaning up dusty deck fortran and converting to C/C++ Message-Id: <-29294-050925122433-17831-9JUtfnUunPL3G2OzvntyQw#server.ccl.net> X-Original-From: "Robert Duke" Content-Transfer-Encoding: 7bit Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Date: Sun, 25 Sep 2005 11:29:56 -0400 MIME-Version: 1.0 Sent to CCL by: "Robert Duke" [rduke#email.unc.edu] --Replace strange characters with the "at" sign to recover email address--. Folks - In discussing how to move forward from f77 to a language with more versatility, I think it is a mistake for the computer science and scientific communities to overlook the virtues of f90/95. There are great prejudices against the old fortran standards in the computer science community, with good reason. Also, that community is now rather heavily steeped in the religion of C/C++ because that series of languages has been tremendously successful, partly due to versatility and partly due to the fact that it was pretty easy to get a decent C compiler implemented on a wide variety of platforms. While f90/95 does not have all the capabilities of C/C++, it does nonetheless introduce a lot of the features for better data structures, algorithms, and sofware engineering practice that people whine (justifiably) about f77 lacking. Giving C/C++ to someone who is not heavily focused on software engineering is like giving a chain saw to a kid. Sharp edges for sure. The nice thing about f90/95 is that f77 ports easily to it and then the ported code can be cleaned up, and better algorithms/data structures can be gradually implemented. Also, the facilities for handling mathematical problems are, at least in my opinion, more intuitive and versatile. Pure computer science guys can moan about the array being a brain dead data structure all they want; it is a useful data structure, and the f95 implementation of multidimensional arrays is easier to use. And f90/f95 can do just about anything in the realm of dynamic data structures as one becomes more adept at using such things. To be sure f90/95 is not perfect; C/C++ is better tied to fundamental OS capabilities, C++ has a real object model, and intermodule dependencies in f90/95 can be awkward to handle. But it is a great next step in a progression from f77. In my view, the CS community has really failed the community of users in the area of practical language development; there is a tendency to either develop beautiful and useless abstractions like pascal, or languages like c which have a combination of sharp edges and subtleties that make them nonideal for someone not devoted to using the tool full time. My personal experience on this is that of someone who has lived in both the CS and science communities, and I have many years of experience developing systems in C/C++. I had personally rather work in C/C++ because it is more fun if you know what you are doing. If I am having to maintain/extend code from someone that does not have a fine appreciation of the finer points of software engineering, however, I had much rather have that code in f90/95. When the code has been C, I have ended up spending a lot of my time fixing the bugs/inefficiencies caused by the fact that the original author did not really understand the subtleties of the language. With the advent of g95, there is now a practical free f90/95 compiler within the reach of everyone; I mostly use proprietary f90/95 implementations due to better performance, but each user can make the choice. I support s/w on just about every type of platform out there for HPC, and I don't have problems with compiler availability. Regards - Bob Duke UNC-CH Chem Dept., NIEHS ----- Original Message ----- > From: "CCL" To: "Duke, Robert E " Sent: Sunday, September 25, 2005 12:13 AM Subject: CCL: Cleaning up dusty deck fortran and converting to C/C++ > > Sent to CCL by: "Perry E. Metzger" [perry]_[piermont.com] > > --Replace strange characters with the "at" sign to recover email > address--. > > >>> Sadly, the more appropriate languages for this sort of work have become >>> desperately unpopular... >> >> I gotta ask: what languages would you consider most appropriate for >> this sort of work (computational chemistry calculations)? > > In general, I think it is better to work in strongly typed languages > which do full run time error checking. Unfortunately, as I said, most > such languages are now highly unpopular. > > Anyway, any of the Algol descendants (Pascal, Modula-2, Modula-3, even > Ada) would work quite nicely from the point of view of people used to > ordinary procedural languages. Unfortunately, compilers for such > languages are not well maintained these days, because no one cares > about them any longer. Because of that, C is a reasonable compromise > for numerically intensive code. The compilers are generally excellent > and the tools are very good -- but you have to be a damn careful > programmer not to cut yourself on the sharp edges. > > I wouldn't recommend Java, both because of performance with a VM > based system for numerical analysis intensive code, and because it > does not automatically detect numerical overflow/underflow, though it > does get upset about array bounds violations. > > If your code isn't computationally bound and the slight holes in the > safety aren't an issue, Java certainly is better for you than C. Of > course, if you aren't concerned about performance, Python seems even > nicer. Python is a lot of fun, if a bit odd, but the interpreter is > very slow. I do not recommend Perl for this sort of thing. (I think > the bioinformatics people who use it are not using the right tool for > their job.) > > Believe it or not, I'd actually say that Lisp is a pretty good choice, > especially the implementations with very good compilers for numerical > work like CMUCL, SBCL, and various commercial compilers like Allegro > Common Lisp. Lisp is very alien to non-computer science types, and > even to many computer scientists, but if you learn it well, it allows > you to do a whole lot in the way of high quality abstraction -- you > can write a lot of code very fast, and if you're not using an > interpreter, the code is often as fast as you can get in any other > compiled language. Were Lisp not so unloved and so ill supported in > many environments, I'd push it even harder. > > A word for a moment about tools. > > If you are a cabinet maker, the difference between a good tool and a > bad tool, and the difference between knowing which tool is good and > which is bad, and the difference between knowing how to use the good > tool and not knowing how to use it, all have very obvious impact both > on the quality of the furniture you build and on how fast you can > build it. This is entirely obvious. > > However, many computational chemists try to "go cheap" on learning > about their tools and picking good ones. That means the difference > between building fast, flexible and maintainable systems quickly, and > building not so fast, not so flexible and not so maintainable systems > not so quickly. It is no less obvious with computational chemistry > than with cabinetmaking that you need to know your tools and know them > well. > > Computational chemistry is a two part discipline. You really can't > neglect the computer science side of things any more than you can > neglect the chemistry side of things. It makes all the difference. > > Perry> To send e-mail to subscribers of CCL put the string CCL: on your Subject: > line> > Send your subscription/unsubscription requests to: > CHEMISTRY-REQUEST#ccl.net> > > > From owner-chemistry@ccl.net Sun Sep 25 13:00:10 2005 From: "CCL" To: CCL Subject: CCL: Cleaning up dusty deck fortran... Message-Id: <-29295-050925123431-26434-yDgS9mZjHVKqnOFjb4M0/Q#%#server.ccl.net> X-Original-From: jle Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Date: Sun, 25 Sep 2005 12:39:50 -0400 Mime-Version: 1.0 (Apple Message framework v623) Sent to CCL by: jle [jle#%#theworld.com] --Replace strange characters with the "at" sign to recover email address--. "Perry E. Metzger" [perry]*[piermont.com] wrote two extensive emails which I can't reply to in the usually fashion without generating two replies which repeat themselves. Therefore, I'll try to do it in a "collected idea" email... I'll start with a premise - computer time's no longer a critical resource, developer time is. Therefore, anything being created today should make the latter efficient, not the former. One can always add iron. Perry mentions interpretive languages such as Perl and Python, and I think he thinks well of them - for their purposes. If one's code isn't highly CPU intensive, they're a great way to go. Code can be prototyped and even deployed quickly, and there is a wide range of existing libraries/modules/whatever available to draw from. IP issues seem stable, so commercial and non- commercial code can be mixed without concern. People like to tinker/script, so we're at a good point - there's a growing number of accessible toolkits and a growing number of people who look to use them. For this reason, I would be less quick to choose edit/compile languages like C/C++/Java/... today for anything other than "system requirements" or CPU-intensive routines. Chemists, on the whole, don't pay attention to their tools and probably don't wish to - they've got other things to do and merely want tools that work. > Computational chemistry is a two part discipline. You really can't > neglect the computer science side of things any more than you can > neglect the chemistry side of things. It makes all the difference. Actually, there's more parts than this, if "molecular modeling" and "computational chemistry" are considered. Medicinal chemistry expertise is probably the most valuable for users - those actually trying to DO something with the software. Aside from scripting, I'd say that 60-80% of the "customer base" really doesn't want to and shouldn't have to think about what's going on under the hood. It's up to the developers to deliver on this hope. We're not dealing with Vaxen anymore, and memory's not dear. Perry's right, though: > Sure, rewriting your code won't get you publications, but when you're > done you can do new things faster. The name of the game, after all, is > economizing on manpower and getting the computer to subsume more and > more of your task and to do its work as fast as possible. Failing to > invest in your tools is pennywise but pound foolish. User's won't pay for this rewrite, and the prime source of bulk labor (2nd-year grad students) might not have the time, training or inclination to think before coding. It has to be done, if we're going to use the new machines, architectures and the like Moore's Law has presented us. > Sure, you can write really crappy code in any language. However, some > languages really make it much harder to write good code. Fortran and > Cobol are at the top of my list for that. Well said, although you can also write good code in any language. I suspect if you're going to do linear algebra, it's going to look like fortran no matter what language you select... If you're going to process paychecks, Cobol does the trick. From what I recall from the 70's, there's way more structure in Cobol than F77. Hasn't Cobol gone through a "modernization" effort like Fortran? Choose the tool for the task. Perry stresses tools and tool selection extensively and well. If you're doing I/O, or doing graphics or doing embedded systems, Fortran's not necessarily a good choice. If you're crunching numbers, go for it. I'd argue, and have argued, that Perl or Python (my preference is Python) is the language of choice, where one can find or create the lower-level tools required by one's application. > Writing software in Fortran vs. writing software in a language that .> has things like structs, pointers and recursion is like cooking using > only a small bunsen burner, a dull knife and a cheap aluminum pan, True, but if I've got a nail, I look for a hammer, not a Leatherman. > Incidently, proper editors, debuggers etc. are also important, as is > knowing your way around the OS you use. Know your tools. Also true. As stated above, developer time is precious. However, as one essay in Joel Spolsky's essay collection stated, "if it's not tested, it's broken". Without a firm idea of what's "correct", and a test suite which supports that idea, you're wasting a lot of time and effort screwin' around with code. > Arrays are rarely the first tool I think of -- or even ever a tool I > think of. They're fine tools for building other data structures -- > they're often hidden deep underneath the covers of things like hash > tables and such -- but what you want to be thinking of is > *abstractions*, not *implementations*. The non-professional thinks > first of what to implement, the professional thinks first about what > abstractions to build. Well, I view myself as a professional developer, and yet my first thought is "what am I trying to do?". I think Perry and I are probably saying the same thing, since the next decision tends to be "how important is it to do it well?". If the answer is "throwaway", use whatever you're most adept with. If the answer is "it's important", then looking at the tools, data abstractions and the like is critical. If I've got to think "cache coherency", I'll be thinking things like arrays. Otherwise, it's things like "molecules", or better "things". The more abstract the better (as there's fewer lines of code to debug/test/maintain). > In any case, I can't imagine writing most software without real data > structures. If you don't know why you want to be able to build clean > hash tables, priority queues, search trees, etc., then you don't know > why your programs are running orders of magnitude slower than they > need to. Most chemists haven't a clue what these things are and don't care. They've not studied the literature, nor studied development as a discipline. Bad move, really bad move, for those paid to develop software, but we're a (small) subset of the whole. I think it was Rob Pike who said "The first rule of optimization is 'don't'". If it's not important to work quickly (and it's usually not), merely work correctly (you should get the latter and it's tests done before the former, anyway). In case people think I'm being too hard on chemists, I'm sure there's a fair number of CS people who aren't up on QM, the latest advances in Brownian Dynamics, molecular similarity (is 2D or 3D better?), ... There's way too much to try and keep track of, we merely have to make a go of it. It's sorta fun living in a time of bloody well infinite computer power. We've got to figure out how to develop for these systems, and more importantly rid ourselves of the biases we've grown up with. There's way less which is "too slow" anymore, and the sooner we shed that notion the better. We don't understand all the physics, and we need newer code and implementations, but it's WAY better running 3-minute than 3-day test jobs :-). Sorry to run on... Thanks for reading if you've made it this far (and Thanks Perry for the contributions). Joe Leonard jle#%#theworld.com From owner-chemistry@ccl.net Sun Sep 25 16:32:32 2005 From: "CCL" To: CCL Subject: CCL: Cleaning up dusty deck fortran... Message-Id: <-29296-050925162324-13268-yt6V6YAqI5inGaBY4pGGww(a)server.ccl.net> X-Original-From: "Perry E. Metzger" Content-Type: text/plain; charset=us-ascii Date: Sun, 25 Sep 2005 16:23:15 -0400 MIME-Version: 1.0 Sent to CCL by: "Perry E. Metzger" [perry(a)piermont.com] --Replace strange characters with the "at" sign to recover email address--. > It's sorta fun living in a time of bloody well infinite computer power. > We've got to figure out how to develop for these systems, and more > importantly rid ourselves of the biases we've grown up with. There's > way less which is "too slow" anymore, and the sooner we shed that > notion the better. We don't understand all the physics, and we need > newer code and implementations, but it's WAY better running 3-minute > than 3-day test jobs :-). I generally agree with what you said, but I will make one final comment. In most fields that computers are applied to, computer time is not precious. If, however, your code takes four days to run (as it can if you're doing a complicated simulation), or it can run fast but only on the one Beowulf cluster you share with many other people, computer time *is* precious. Weather prediction, computational chemistry and high end computer graphics are areas where optimization does indeed count. Pike's comment on optimization is a common piece of advice -- do not prematurely optimize. However, if a simulation is going to take 30 hours and you want to do hundreds of them over coming years, getting a factor of 10 out of your run time is worth a day of your human time many times over. Know *when* to optimize, and know *how*. I will note that generally speaking what counts is not saving two cycles here and three cycles there, but rather using the right tools to figure out what it is that is taking your code the longest time, and also picking the right algorithms. Know HOW to optimize. The difference between the right algorithm and the wrong one is enormous, and most importantly is often not a constant factor but rather a complexity order. Given the nature of what we're talking about, understanding numerical analysis tricks and the nature of problems like numerical stability is often also crucial. Sometimes, it is also important to understand how to cycle shave -- knowing when you can use single precision over double precision, or how to use the vector units on modern Pentium/Athlon hardware, can actually make a difference, but that's a rarer discipline and usually only worth it if it makes the difference between a 300 day computer run and a 3 day run. Joe mentioned (correctly) that for mere USERS of this sort of software a lot of what I've said isn't relevant, and that is true. I'm implicitly directing my comments at those that actually write code. If you do write code, and that code is the core of what you do for a living, you owe it to yourself to learn the fundamentals of computer science -- data structures, algorithms, and clean software engineering technique. It may seem like a distraction if what you are trying to do is do modeling and not per se computer work, but you *are* doing computer work to do the modeling, and the time spent reading good books on the subject and getting comfortable with your tools *will* reduce the overall human time spent in dealing with the software as well as the overall time your software is burning up compute time on supercomputer clusters. Don't be pennywise and pound foolish. Learn how your tools work. It will save you endless amounts of heartache in the long run -- and you *do* plan on having a long career, yes? Perry From owner-chemistry@ccl.net Sun Sep 25 20:57:45 2005 From: "CCL" To: CCL Subject: CCL: PCM-ONIOM Message-Id: <-29297-050925205623-6935-hFghMQZNHynPfvB3j5EYlg*_*server.ccl.net> X-Original-From: ying xiong Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=gb2312 Date: Mon, 26 Sep 2005 08:56:11 +0800 (CST) MIME-Version: 1.0 Sent to CCL by: ying xiong [xiongying96*_*yahoo.com.cn] --Replace strange characters with the "at" sign to recover email address--. Dear sir, I have a question on how to do PCM calculation with ONIOM method. I had asked this question at CCL. A kind person told me that I can add " IOp(5/94=0)" to do PCM/ONIOM calculation. However, I still have some questions on PCM-ONIOM. (1) 0 stand for "Standard PCM calculation". However what does Standard PCM calculation" mean? Can I set solvent in the input like "solvent=water" ? (2)I have tried to do two calculations to compare the results: a: # oniom(B3LYP/sto-3g:amber=softfirst) geom=connectivity IOp(5/94=0) b: # oniom(B3LYP/sto-3g:amber=softfirst) geom=connectivity The others information in the input file such as coordinates are the same as each other, but I find the output are also same. Where is the output of PCM calculation? ¡¡ Ying Xiong ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡xiongying96*_*yahoo.com.cn ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2005-09-26 ___________________________________________________________ ÑÅ»¢Ãâ·ÑGÓÊÏ䣭No.1µÄ·À¶¾·ÀÀ¬»ø³¬´óÓÊÏä http://cn.mail.yahoo.com