From ross@cgl.ucsf.EDU Sun Sep 6 03:07:01 1992 Date: Sun, 6 Sep 92 10:07:01 -0700 From: ross@cgl.ucsf.edu (Bill Ross) To: chemistry@ccl.net Subject: benchmark correction/update This posting is an update of my previous benchmarks posting. I correct an error, a new machine is included (Cray C90), and the availability of the test cases by anonymous ftp is announced. The error was in my assessment of the speedup afforded by compilation on a Silicon Graphics machine using the "fast math" -lfastm library. The observed speedup of 2.3 was actually due to an upgrade from a 310vgx to a Crimson while I was running the benchmark. The corrected numbers for Silicon Graphics machines are given below. FTP ACCESS Thanks to Jan Labanowski and OSC for providing this facility. The files are of two sorts: PDB files for use with non-Amber programs, and a suite of files for generating and running the benchmarks under Amber. The coordinates were generated from published crystal parameters for B DNA (Langridge). The files have been placed on the anonymous ftp site of Computational Chemistry List at www.ccl.net [128.146.36.48]. After changing directory to the one where you want to store the files on your computer, follow instructions below: ftp www.ccl.net (or if you have problems use ftp 128.146.36.48) Login: anonymous Password: your_email_address ftp> cd pub/chemistry/DNA_benchmark ftp> ascii ftp> get BenchDNA.README ftp> binary ftp> mget *.Z ftp> quit The total size of all files is about 300kBytes. After downloading the files read instructions in BenchDNA.README to find out how to uncompress the files and process them through tar. ---------------------------------------------- The whole set of benchmark numbers follows. The order is roughly that of performance for the fastest machine in a product line. All times are CPU seconds measured by system calls in the programs; wallclock times may not correspond. Times are given for single/double precision. dna/Run.bench dna/Run.bench2 DNA hexamer in periodic 68 DNA base pairs in vacuum. water box, constant volume. 4282 atoms, 10A cutoff on all 7682 atoms: 274 dna, 10 nonbonded pairs. Distance- counterions, 2466 waters. dependent dielectric. All solute interactions; 8A cutoff otherwise. Constant dielectric. ______________________________ ______________________________ min min+md sander sander gibbs min md ______________________________ ______________________________ Cray C90 - /49 - /56 - /57 - /25 - /25 59 Y-MP(ncsc) - /80 - /92 - /90 - /44 - /44 96 Y-MP(sdsc) - /91 - /104 - /104 - /44 - /44 98 Y-MP EL - /445 - /498 - /500 - /282 - /278 535 Fujitsu VP2200 52/54 62/64 63/64 26/24 26/25 92 HP 730 336/327 367/363 337/363 205/220 200/216 409 720/50MHz 434/462 480/512 476/503 iris Crimson 352/421 341/452 379/468 224/281 226/282 479 w/fastm** 337/416 358/447 325/439 219/273 216/268 481 4d/410vgx 730/1129 779/1180 768/1156 indigo3000 868/1253 881/1391 866/1300 490/629 461/623 1269 4d/310vgx 956/1618 1015/1704 993/1560 578/809 578/804 1244 personal 1722/2830 1724/3371 1724/2846 4d/80gt 1901/3542 1996/4006 2006/3201 vax 9000 vector 365/468 399/520 390/524 229/311 219/296 no vector 654/774 /865 948/917 462/497 454/794 789 convex c2 479/516 549/597 562/603 279/303 277/304 767 fps 500 744/774 855/865 921/915 mips rc6280 723/1133 758/1191 731/1101 565/869 564/867 888 rs6000 530 859/844 912/895 915/858 516/391 501/378 630 decstation 5000/200 1112/1585 1173/1657 1168/1638 670/884 663/871 1325 alliant FX/8* 1772/1876 2016/2160 2034/2184 1-process 4270 IBM 3090 200J vector - /1999 - /2051 200J scalar - /6059 - /6143 sun sparc2 1798/2145 1834/2312 1627/2299 sparc 4/280 2528/3830 2700/4062 2708/4018 PROGRAM NOTES All the programs are in the Amber 4.0 release. The minmd program contains the traditional energy minimization and molecular dynamics capabilities of Amber. Sander is essentially the same, as used here. (Both programs have significant other features which are not exercised by the benchmarks.) Gibbs is the Amber free energy perturbation program. Run.bench min: 100 steps minimization minmd: 20 steps min, 80 steps md sander: 100 steps gradual warming Run.bench2 sander/min: 100 steps minimization sander/md: 100 steps gradual warming gibbs: 100 steps of dynamic windows perturbation (double-wide sampling) note: gibbs4 does not have vectorization directives note: gibbs4 is double precision One interesting thing that came to light when developing bench2 was that the distance-dependent (1/r^2) dielectric was significantly faster than the normal (1/r) one. This effect, attributed to the taking of the square root, was more pronounced when hardware arithmetic support was lacking. Representative results (double precision sander minimization): SGI Crim32M MIPS IBM Convex HP Cray Fujitsu diel Crim32M -lfastm RC6280 530 C2 730 Y-MP VP220 1/r^2 347 346 616 391 303 220 44 24 1/r 594 495 869 576 313 240 49 29 ratio .584 .699 .709 .679 .968 .917 .898 .828 When the SGI Crimson32M used -lfastm, the double precision version was faster than the single for the 1/r^2 dielectric: 566 single, 495 double on minimization. MACHINE NOTES The Fujitsu VP2200 is a 32-bit machine with 64-bit arithmetic. The Cray Y-MP & C90 are 64-bit machines, so single precision results are irrelevant. ncsc = North Carolina Supercomputing Center sdsc = San Diego Supercomputing Center The Convex C2 was running under IEEE Floating Point default mode. ----- Cray hpm (hardware performance monitor) results ________ Y-MP ncsc___ ________ C90 ________ Run.bench Run.bench2 Run.bench Run.bench2 MFlops 61.5 72.7 109.0 116.6 MIPS 36.9 37.8 59.0 63.2 M_Mem/sec 70.9 91.2 115.1 142.3 ClockCyc/Inst 4.5 4.4 4.1 3.8 ----- Memory (Mb) Data Cache Instruction Cache Cache c2 1024 fps500 128 iris4d/410vgx 128 64K 64K Sec Cache 1MB alliant 64 512K mips 64 iris4d/310vgx 32 64K 64K iris4d/Crim32 32 8K 8K Sec Cache 1MB rs6000/530 16 irisPersonal 12 32K 64K iris4d/80gt 8 32K 64K dec5000/200 32 *Automatic parallelization directives were invoked in the Alliant compilation. The machine has 8 processors. I do not know what the parallel timings mean, but am impressed that correct results were obtained on all tests except polarization. -Bill Ross **compiled with the -lfastm "fastmath" lib. ---- Bill Ross