From ross@cgl.ucsf.EDU  Sun Sep  6 03:07:01 1992
Date: Sun, 6 Sep 92 10:07:01 -0700
From: ross@cgl.ucsf.edu (Bill Ross)
To: chemistry@ccl.net
Subject: benchmark correction/update


This posting is an update of my previous benchmarks posting. I correct
an error, a new machine is included (Cray C90), and the availability
of the test cases by anonymous ftp is announced.

The error was in my assessment of the speedup afforded by compilation on 
a Silicon Graphics machine using the "fast math" -lfastm library. The 
observed speedup of 2.3 was actually due to an upgrade from a 310vgx to 
a Crimson while I was running the benchmark. The corrected numbers for
Silicon Graphics machines are given below.

FTP ACCESS

Thanks to Jan Labanowski and OSC for providing this facility. The files
are of two sorts: PDB files for use with non-Amber programs, and a suite 
of files for generating and running the benchmarks under Amber. 

The coordinates were generated from published crystal parameters for B 
DNA (Langridge).

The files have been placed on the anonymous ftp site of Computational
Chemistry List at www.ccl.net [128.146.36.48]. After changing directory
to the one where you want to store the files on your computer, follow
instructions below:

ftp www.ccl.net      (or if you have problems use ftp 128.146.36.48)
Login: anonymous
Password: your_email_address
ftp> cd pub/chemistry/DNA_benchmark
ftp> ascii
ftp> get BenchDNA.README
ftp> binary
ftp> mget *.Z
ftp> quit

The total size of all files is about 300kBytes. After downloading the files
read instructions in BenchDNA.README to find out how to uncompress the
files and process them through tar.
----------------------------------------------

The whole set of benchmark numbers follows.

The order is roughly that of performance for the fastest machine in a
product line. All times are CPU seconds measured by system calls in the 
programs; wallclock times may not correspond.  Times are given for
single/double precision.

		dna/Run.bench			      dna/Run.bench2

	   DNA hexamer in periodic 	        68 DNA base pairs in vacuum.
	   water box, constant volume.	        4282 atoms, 10A cutoff on all
	   7682 atoms: 274 dna, 10 	        nonbonded pairs. Distance-
	   counterions, 2466 waters.		dependent dielectric.
	   All solute interactions;
	   8A cutoff otherwise. Constant
	   dielectric.
	   ______________________________        ______________________________
	   min	      min+md	  sander                sander          gibbs
					           min         md
	   ______________________________        ______________________________

Cray
C90          - /49      - /56      - /57           - /25     - /25       59
Y-MP(ncsc)   - /80      - /92      - /90           - /44     - /44       96
Y-MP(sdsc)   - /91      - /104     - /104	   - /44     - /44       98
Y-MP EL      - /445     - /498     - /500          - /282    - /278     535

Fujitsu
VP2200	     52/54      62/64      63/64           26/24     26/25       92

HP
730	    336/327    367/363    337/363	  205/220   200/216     409
720/50MHz   434/462    480/512    476/503

iris
Crimson     352/421    341/452 	  379/468	  224/281   226/282     479
 w/fastm**  337/416    358/447    325/439         219/273   216/268     481
4d/410vgx   730/1129   779/1180   768/1156
indigo3000  868/1253   881/1391   866/1300	  490/629   461/623    1269
4d/310vgx   956/1618  1015/1704   993/1560	  578/809   578/804    1244
personal   1722/2830  1724/3371  1724/2846
4d/80gt    1901/3542  1996/4006  2006/3201

vax 9000    
vector	    365/468    399/520    390/524	  229/311   219/296
no vector   654/774   	  /865	  948/917	  462/497   454/794	789

convex
c2	    479/516    549/597    562/603	  279/303   277/304     767

fps
500	    744/774    855/865	  921/915

mips
rc6280	    723/1133   758/1191   731/1101        565/869   564/867     888

rs6000
530	    859/844    912/895 	  915/858	  516/391   501/378     630

decstation
5000/200   1112/1585  1173/1657  1168/1638	  670/884   663/871    1325

alliant
FX/8*      1772/1876  2016/2160  2034/2184
1-process       4270

IBM 3090
200J vector  - /1999    - /2051
200J scalar  - /6059    - /6143

sun
sparc2	   1798/2145  1834/2312	 1627/2299
sparc
4/280	   2528/3830  2700/4062  2708/4018


PROGRAM NOTES

All the programs are in the Amber 4.0 release.

The minmd program contains the traditional energy minimization and
molecular dynamics capabilities of Amber. Sander is essentially the same, 
as used here. (Both programs have significant other features which are not 
exercised by the benchmarks.) Gibbs is the Amber free energy perturbation 
program.

Run.bench
    min: 	100 steps minimization
    minmd: 	20 steps min, 80 steps md
    sander: 	100 steps gradual warming
Run.bench2
    sander/min: 100 steps minimization
    sander/md:  100 steps gradual warming
    gibbs:	100 steps of dynamic windows perturbation (double-wide sampling)
		note: gibbs4 does not have vectorization directives
		note: gibbs4 is double precision

    One interesting thing that came to light when developing bench2
    was that the distance-dependent (1/r^2) dielectric was significantly 
    faster than the normal (1/r) one. This effect, attributed to the taking 
    of the square root, was more pronounced when hardware arithmetic
    support was lacking. Representative results (double precision sander
    minimization):

		 SGI    Crim32M  MIPS    IBM	Convex	 HP     Cray	Fujitsu
	diel   Crim32M  -lfastm	 RC6280	 530	 C2	 730	Y-MP	VP220

	1/r^2	 347	 346     616	 391	 303	 220	  44	  24
	1/r	 594	 495     869	 576	 313	 240	  49	  29
	ratio	.584	.699    .709	.679	.968	.917	.898	.828

    When the SGI Crimson32M used -lfastm, the double precision version
    was faster than the single for the 1/r^2 dielectric: 566 single,
    495 double on minimization.

MACHINE NOTES

The Fujitsu VP2200 is a 32-bit machine with 64-bit arithmetic.
The Cray Y-MP & C90 are 64-bit machines, so single precision results are 
irrelevant.
	ncsc = North Carolina Supercomputing Center
	sdsc = San Diego Supercomputing Center
The Convex C2 was running under IEEE Floating Point default mode.

-----

        Cray hpm (hardware performance monitor) results

             ________ Y-MP ncsc___              ________ C90 ________
             Run.bench  Run.bench2              Run.bench  Run.bench2

MFlops          61.5      72.7                    109.0       116.6
MIPS            36.9      37.8                     59.0        63.2
M_Mem/sec       70.9      91.2                    115.1       142.3
ClockCyc/Inst    4.5       4.4                      4.1         3.8

-----

		Memory (Mb)	Data Cache	Instruction Cache	Cache

c2	          1024
fps500		   128
iris4d/410vgx      128          64K             64K  Sec Cache   1MB 
alliant		    64							512K
mips		    64
iris4d/310vgx	    32		64K		64K
iris4d/Crim32 	    32		 8K		 8K  Sec Cache   1MB
rs6000/530 	    16
irisPersonal        12		32K		64K
iris4d/80gt	     8		32K		64K
dec5000/200	    32

*Automatic parallelization directives were invoked in the 
Alliant compilation. The machine has 8 processors. I do not
know what the parallel timings mean, but am impressed that
correct results were obtained on all tests except polarization. -Bill Ross

**compiled with the -lfastm "fastmath" lib. 

----
Bill Ross