From chemistry-request@ccl.net Mon Nov 18 02:51:25 1991
Date: 18 Nov 91 18:23:03 U
From: "Frank Larkins" <frank_larkins@muwayf.unimelb.edu.au>
Subject: Post Doc Available Australi
To: CHEMISTRY@ccl.net
Status: R

                       Subject:                               Time:5:08 PM
  OFFICE MEMO          Post Doc Available Australia           Date:11/18/91
            I have  a post-doctoral fellowship available in Australia for at
least two years from early 1992 in the general area of theoretical chemical
physics. A sound knowledge of quantum chemistry especially molecular orbital
methods is essential.  We are working on atomic and molecular problems in
photoelectron,Auger electron and X-ray emission spectroscopies. E.g. see 
Physica Scripta 41,827,1990;  J Phys B 24,741,1991;  J Chem Phys  86,3239, 1987
For more information contact:
Professor Frank Larkins,  School of Chemistry
University of Melbourne, Parkville 3052 Australia
Fax (613) 347 6883 PH (613) 344 4181
e-mail Frank_Larkins.REGISTRAR@muwayf.unimelb.edu.au


From chemistry-request@ccl.net Mon Nov 18 08:44:49 1991
From: senese@schug.larc.nasa.gov (Fred Senese)
Date: Mon, 18 Nov 1991 08:20:56 -0500
To: chemistry@ccl.net
Subject: Re: HP vs. IBM
Status: R

Mark Murcko <markm@predator.vpharm.com> writes:
>We are currently starting an evaluation of the HP 720 and 730 vs. the
>IBM 550.  Does anyone have any information that they would like to
>share with the group? 

Here are a few timings for the NDSU GAMESS codes ported to
the HP Cobra workstations.  The benchmarks are the standard BENCH*.INP
tests included on the 10/90 GAMESS distribution tape; they represent "typical"
quantum chemical computations using RHF, MCSCF, CI, and gradient methods. 

Identical codes were used on all machines, but native blas routines and
the highest level of compiler optimization were used whenever possible.  
Machines were *not* dedicated for benchmarking (but with the exception
of the YMP (see below) the additional load was very low).  All jobs 
were run with a limit of 750 000 words in the GAMESS dynamic memory pool.

Times are given as user/system times in seconds.  The number in 
parentheses is the total time normalized to the performance of
the Cray YMP.


     Sun Sparcstation      Dec     IBM       HP 9000   HP 9000  CRAY
Test   SS1       SS2      DS5000   RS6000      720       730    YMP
     --------  --------  --------  --------  --------  -------  --------
0   4792/303  1099/145     ---      ---     737/ 39      ---    720/27
       (6.8)    (1.7)                         (1.0)              (1.0)

1    253/ 32   104/ 16    80/ 21    46/ 7    36/  4     32/ 3    18/ 2
      (14.3)    (6.0)      (5.1)    (2.7)     (2.0)     (1.8)    (1.0) 

2    260/ 20   112/ 11   103/ 17    99/ 8    72/  4     56/ 2    22/ 2
      (11.7)    (5.1)      (5.0)    (4.5)     (3.1)     (2.4)    (1.0) 

3    794/168   354/ 92   266/108   210/36   145/ 22    139/17    78/ 7
      (11.3)    (5.2)      (4.4)    (2.9)     (2.0)     (1.8)    (1.0)

4    936/146   402/ 84   295/ 98   169/48   128/ 20    124/16    72/ 6
      (13.9)    (6.2)      (5.0)    (2.8)     (1.9)     (1.8)    (1.0)  

5  12708/736  5612/424  3917/395  1737/191 2102/100   1592/82   316/47
      (37.0)    (16.6)     (11.9)   (5.3)     (6.1)     (4.6)    (1.0)

6    431/ 95   198/ 55   158/ 46   136/27    96/ 16     88/13    38/11
      (10.7)    (5.1)      (4.2)    (3.3)     (2.3)     (2.1)    (1.0) 

7   5256/378  2203/268   1963/243 1124/117 1110/ 62    857/52   267/31
      (18.9)    (8.3)      (7.4)    (4.2)     (3.9)     (3.1)    (1.0)   

8  22250/3148  9322/1775   ---    2903/816 2864/384     ---    1790/110
      (13.4)    (5.8)               (2.0)     (1.7)              (1.0) 

10  1015/ 68   431/ 37   332/ 41   185/ 14  134/  8    110/6     69/ 3
      (15.0)    (6.5)       (5.2)   (2.8)     (2.0)     (1.6)    (1.0)   

12  7546/995  3338/572     ---    1137/255  942/110     ---     475/32
      (16.8)    (7.7)               (2.7)     (2.1)              (1.0)   

MACHINES

SS1,SS2 Sun Sparcstation 1, 2 (quantum, mermaid) running OpenWindows 2.0,
        SunOS 4.1.1,  f77 1.3.1 / cc 1.0, -cg89 -dalign -Bstatic -O3 -libmil

DS5000  DecStation 5000 (ultra)
        Ultrix 4.1, MIPS f77 2.10 -O2 -G 0.

RS6000  IBM RS6000 m530 (ibmr6000)
        AIX 3.1, xlf 1.01, -O -L/lib -lblas.
        GAMESS modules scflib, gamess, statpt, hss2b, and inputb 
        GAVE INCOrrect results with xlf -O; these modules
        were compiled without optimization.  

HP 9000/720 HPUX 8.01, f77 -O 
HP 9000/730 HPUX 8.05, f77 -O    

CRAYYMP Cray YMP/332 (sabre)
        UniCOS 5.1, cft77 3.1.1 -O full,nozeroinc -Zp, cc -O -h
        intrinsics,olevel_3. libsci blas were used.  System load was such
        that these jobs received 5-10% of a 3-cpu machine.

BENCHMARKS

1.   GAMESS RHF SiC_2 H_6, with a 61 AO basis set. Mostly scalar.
     Uses 14 Mb of disk.
2.   GAMESS MCSCF, SiH2, with a 29 AO basis set and 51 CSF's. Vectorizable.
     Uses 5 Mb of disk.
3.   GAMESS second order CI, Si_2 H_4, with a 46 AO basis and 4600
     CSF's.  The calculation involves out-of-core sorts on large
     disk files. Uses 64 Mb of disk.
4.   GAMESS RHF SiC_3 H_8, with an 80 AO basis set. Mostly scalar.
     Uses 38 Mb of disk.
5.   GAMESS MCSCF + gradient, C_3H_4, with a 53 AO basis and 20 CSFs.
     MCSCF is vectorizable, gradient is mostly scalar. Uses 84 Mb
     of disk. 
6.   GAMESS CI transition, O2+, 60 AO basis, 504 CSF's. Vectorizable.
     Uses 43 Mb of disk.
7.   GAMESS MCSCF OHBr, with 49 AO's and 110 CSF's, requiring 55 Mb of disk.
     Vectorizable. 
8.   GAMESS GVB-PP, SnC_5H_6, with 96 AO's and 6 CSF's. Uses 105 Mb of disk. 
10.  GAMESS ROHF gradient calculation on P_2 H_4+, with a 56 AO basis set,
     requiring 14 Mb of disk.  
12.  GAMESS RHF + gradient, SbC4H4NO2, with 110 AO's, requiring 111 Mb of
     disk. Mostly scalar.
-----
Fred Senese, MS 234 (804) 864-4777 | senese@schug.larc.nasa.gov (128.155.22.47) 
Speaking from (but not for) NASA-LaRC, Hampton VA 23665-5225

Subliminal Message: 1. Anonymously ftp to schug.larc.nasa.gov
                    2. cd ~/resume ; get resume.[ps|tex|ascii].         
                    3. Hire me.   


From chemistry-request@ccl.net Mon Nov 18 10:54:28 1991
Date: Mon, 18 Nov 91 10:33:08 EST
From: bernhold@qtp.ufl.edu
To: senese@schug.larc.nasa.gov (Fred Senese), chemistry@ccl.net
Subject: Re: HP vs. IBM
Status: R

It would be nice to also know the hardware configurations of the
machines (amt. of memory, type of disks, etc.) since this can influence
the performance as well.

-- 
David Bernholdt			bernhold@qtp.ufl.edu
Quantum Theory Project		bernhold@ufpine.bitnet
University of Florida
Gainesville, FL  32611		904/392 6365

From chemistry-request@ccl.net Mon Nov 18 13:43:30 1991
Date: Mon, 18 Nov 91 10:25:01 PST
From: ross@zeno.mmwb.ucsf.EDU (Bill Ross)
To: chemistry@ccl.net
Status: R

Here is my promised summary of responses to my question about SHAKE.
Another question was raised for implementors of molecular dynamics:
does your program have multiple time steps? If not, do you plan to 
implement them? I'll summarize any answers here.

Bill Ross
UCSF


From: ross@zeno.mmwb.ucsf.edu (Bill Ross)
To: chemistry@ccl.net
Subject: SHAKE failure

I'm collecting interesting stories of SHAKE failure in molecular
dynamics runs - cases that were never figured out as well as ones
that were. References to anything written on this subject would
be welcome too. I'll summarize to the reflector.

Bill Ross
UCSF

[The inspiration for this question was occasional SHAKE failure
in Amber. Dave Pearlman diagnosed this as stemming from periodic
boundary conditions (constant pressure) where ions are treated 
as part of the solute, all solute-solute interactions are included 
(no cutoff applied) and so the solute is not imaged with itself: 
when an ion crosses the edge of the box it is translated to the 
other side, and if another ion is close by, they suddenly "see" 
each other and the resulting spike in the virial causes an 
expansion of the box to equalize the "pressure," which puts some
bond lengths beyond the ability of SHAKE to recover. In my
opinion, the best way to get around this is to either run with
constant volume or treat the ions as part of the solvent so they
are imaged. Neither way is completely satisfactory, since in
constant volume ions will still suddenly appear in proximity
and when ions are treated as part of the solvent cutoffs are
applied and long-range electrostatics are lost. I have always
run with enough water so that it hasn't happened to me, but
this is partially a matter of luck.]

>From STOUTEN@embl-heidelberg.de Wed Nov 13 01:44:46 1991
Subject: Re: SHAKE failure
To: ross@zeno.mmwb.ucsf.edu

Hi Bill,

I don't have any interesting stories. I wonder, however, why you do your 
investigation since SHAKE is being phased out and will be replaced by 
multiple time step algorithms. Even Wilfred takes this stand.

Cheers, Pieter Stouten

>From STOUTEN@embl-heidelberg.de Thu Nov 14 16:26:56 1991
Subject: Re: SHAKE failure
To: ross@zeno.mmwb.ucsf.edu

Hi Bill,

On Wed, 13 Nov 91 09:11:32 PST you wrote:

>why even ask? - sociology/folklore of science.
>
That sounds like a good reason. Still, I don't have exciting stories. When 
being far from equilibrium I had often problems or when applying heavy 
torsion constraints (this seems unrelated but is not). Then I just did not 
use shake.

>how soon do you expect multiple time steps to take over?
>
Hard to say. I am a bit away from the field now. I know that already 3 years 
ago people were talking about implementing it. As for me, I basically use 
GROMOS and considering how busy Wilfred c.s. are I don't know when the first 
official release after GROMOS 87 will see the light.

Cheers, Pieter Stouten.

#### #   # ###  #     European Molecular Biology Laboratory
#    ## ## #  # #     Biocomputing Programme
###  # # # ###  #     Meyerhofstrasse 1, D-6900 Heidelberg, Germany
#    #   # #  # #     e-mail: stouten@embl-heidelberg.de
#### #   # ###  ####  phone: +49-6221-387 472, fax: 387 517


>From balbes@osiris.rti.org Wed Nov 13 05:40:27 1991
To: ross@zeno.mmwb.ucsf.edu (Bill Ross)
Subject: Re:  SHAKE failure

Okay, this won't be much help, but...

I had shake fail after about 14 ps.  I had Tom Darden look at it, and he
said that because I was saving the steps in binary form, any restart
would fail exactly the same way.  He said (I think) that he saves
things in ascii, so that roundoff on restarting will get around any
failures of this type.  I have since been using tom's fast amber 3a,
and haven't had any more problems of this type.  Of course the files
have been long since purged. 

This was quite awhile ago, so details are fuzzy.

			Lisa


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% standard disclaimer %%%%
 Lisa M. Balbes, Ph.D.     		        	phone: 919-541-6563
 Research Triangle Institute, PO Box 12194     vmail: 919-541-6767, xt 6563    
 Research Triangle Park,  NC 27709-2194        email: balbes@osiris.rti.org 
  					     
- This came directly from a computer and should not be doubted or disbelieved.-


>From harris@athena.mit.edu Wed Nov 13 05:47:02 1991
To: ross@zeno.mmwb.ucsf.edu (Bill Ross)
Subject: Re: SHAKE failure 

This example concerns SHAKE's cousin, RATTLE applied to simulation of 50
hexane molecules at liquid density. It is more an example of the effects
of the timestep than a complete failure. Using a timestep of .007 ps, if one
compares the pressure and temperature computed with the molecular virial
with the similar quantities computed using the atomic virial discrepencies
are obversed. In a 1.120 nanosecond (160 000 time steps), the average
molecular temperature is 303.5 compared to an atomic temperature of 300.5
(there is a thermostat attached to maintain the average temperature at 300K by
a weak coupling to an external bath (i.e Berendsen et al. JCP 81, p3684 1984).
The pressure from the molecular virial is also about 20 atm higher than
that computed from the atomic virial. Cutting the timestep in half reduces
the difference between the two methods of computing the temperature and
pressure to 1 degree and 5 atm. It appears that the atomic versions of the
temperature and pressure are more accurate, but the statistics are too poor
to bury the question.


Jonathan G. Harris,   H. P. Meissner Assistant Professor,
Department of Chemical Engineering,  MIT Rm 66-450
25 Ames Street, Cambridge, MA 02139
harris@athena.mit.edu (617)253-5273  Fax 253-9695


-------

From: nobody@kodak.com
To: "amber@cgl.ucsf.edu"@kodak.com
Subject: RE: SHAKE nightmares.

>From:	NAME: Adi M. Treasurywala           
	FUNC: Biophys. & Compu. Chem.         
	TEL: (518)445-7042                    <TREASURYWALA@A1@DSRGVJ>
To:	NAME: Edward P. Jaeger <JAEGEREP@A1@DSRGVJ>,
	"amber@cgl.ucsf.edu"@KODAKR@MRGATE@WPC

          Bill,
          We have completed a fairly detailed study in AMBER3.0A of this
          problem. We ran into it because we were trying to do REAL
          classical dynamics (ie not constant temperature!!!). What happened
          was that in the initial runs on the molecule that we were
          interested in we found that the total energy simply did not
          stabilize but decreased steadily over the run at a fairly
          precipitous rate. There seemed to be no way to stabilize it. 
          We therefore took a small cyclic peptide WFGLMQ and essentially
          banged the heck out of it by trying many different conditions to
          narrow down the reason for this "bug". One of us at least was
          convinced that it must be something that WE were doing wrong
          rather than that AMBER could have a bug in it that was so glaring!
          Anyway, we discovered a LOT of other interesting things along
          the way such as objective criteria (vs semi religeously based
          recipies) for knowing when your thermalization was done, and some
          real reasons to start and finish the collection of MD data
          (incidentaly it is rather system size dependant...). After much
          trial and error we found (thanks are due here to George Seibel who
          REALLY put us on to this) that it was indeed the SHAKE option that
          had caused us all this grief. 
             If we turned SHAKE OFF on an otherwise identical run that had
          failed with SHAKE ON, it went just fine!! Everything stabilized
          and we were able to run real classical dynamics (this is sort of
          an emotional issue for us here!!). 
          We are in the process of preparing a manuscript covering this
          work. Let me just ask the gurus on this net... Would that be a
          good thing to do? Would it help anyone? Would it insult or offend
          anyone? Incidentaly I learned that the real problem with SHAKE was
          the integrator (leapfrog just can't handle classical MD with SHAKE
          turned on I was told by someone.).
          
          Hope to hear the other stories soon and to hear any discussions
          about classical vs constant temp runs!
          
          Adi T & Ed Jaeger.


From chemistry-request@ccl.net Mon Nov 18 18:33:58 1991
From: senese@schug.larc.nasa.gov (Fred Senese)
Date: Mon, 18 Nov 1991 14:09:48 -0500
To: chemistry@ccl.net
Subject: Re: HP vs. IBM
Status: R

bernhold@qtp.ufl.edu writes:
>It would be nice to also know the hardware configurations of the
>machines (amt. of memory, type of disks, etc.) since this can influence
>the performance as well.

SS1: 24 Mbytes of 80-ns DRAM memory.   
     scratch on a tmpfs file system spanning
        a Wren VI CDC 94191-766 and a Fujitsu M2263 drive.***
     20 MHz Sparc-based CPU with a 20MHz Weitek 3170-based FPU.  

SS2: 24 Mbytes of 80-ns DRAM memory.
     internal SUN 207 Mb 3.5'' SCSI drives
     scratch on a Wren VI CDC 94191-766
     40 MHz Sparc-based CPU with a 40 MHz TI TMS390C602A-based FPU.

Decstation 5000: 24 Mbytes of DRAM memory.
     scratch on a Wren VI CDC 94191-766
     25 MHz MIPS-based CPU  

HP/720: 32 Mbytes of 80-ns DRAM memory
     two internal Quantum 210 Mb 3.5'' SCSI drives
     scratch on external Fujitsu 1.4 Gb drive. 
     50 MHz PA RISC 1.1 CPU

I hope this helps. I don't have information on the HP/730 or IBM Model
530's disk/memory configurations; they were on evaluation here a few months
ago and aren't available to me now.  

***I don't recommend this configuration for Sparcstations doing
ab-initio work; a bug in the 4.1.1 tmpfs file system sometimes
put large jobs into a noninterruptable disk wait for very long periods.
(The 100174-01 OS patches haven't helped.) I now use ordinary 4.2 file
systems tuned for very large scratch files.

Here are some posts (clipped from comp.arch a few months ago) that
may be of interest:  

----- Begin Included Message -----
Article: 4642 of comp.arch
Xref: news.larc.nasa.gov comp.sys.hp:2596 comp.sys.apollo:2720 comp.arch:4642 comp.benchmarks:466
Path: news.larc.nasa.gov!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!kuling!irf
From: irf@kuling.UUCP (Bo Thide')
Newsgroups: comp.sys.hp,comp.sys.apollo,comp.arch,comp.benchmarks
Subject: Snakebytes (long -- and poisonous?).
Message-ID: <1998@kuling.UUCP>
Date: 27 Mar 91 00:48:19 GMT
Sender: news@kuling.UUCP
Reply-To: irf@kuling.DoCS.UU.SE (Bo Thide')
Organization: Dept. of Computer Systems, Uppsala University, Sweden
Lines: 95

Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let
loose, the official HP info has become available.  Some of this info follows.

There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and
730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They
come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later
OSF/1 will be available.

Clock: 50 MHZ (720) or 66 MHz (730, 750)

Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data.

Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics.
            HP-IB optional (via EISA!).

Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes (CRX).

Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA.

Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler.  FORTRAN compiler
	   with "+800" option for series 800 compatibility. Series 800
	   binaries run on series 700 machines.


Performance (with HP-UX 8.05) and comparison with other workstations:
-----------------------------------------------------------------------------
                            SPEC        Khorner-       Linp2P  x11-  Dhry-
                        mark int  fp    stones   MIPS  MFLOPS  perf  stone2.0
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX    72.2 51.0 91.0  143974   76    22.9    10460  114680
HP9000/720 G/CRX        55.5 39.0 70.2  119213   57    17.2     8244   87000
IBM 6000/550            54.3 34.5 73.5   n/a     56    23       n/a    n/a
IBM 6000/320            24.6 16.3 32.4   54661   29.5   8.5     1520   45250
Sun SPARCstation 2GX    21.0 20.2 21.5   27142   28.5   4.2     n/a    35590
DECstation 5000/200PXGT 18.5 19.0 18.5   26456   24.2   3.7     3256   38760
DECstation 3100         11.3 11.8 10.9   15285   14.9   1.6     1702   23470
Sun SPARCstation IPC    11.8 12.4 11.4   13329   15.7   1.7     n/a    22830
-----------------------------------------------------------------------------
Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled.
x11perf = geometric mean of the x11perf1.2 component tests (excluding 1
	  and 500 pixel tests).


Selected x11perf Tests:
-----------------------------------------------------------------------------
			         10 pixel  10*10   TR      create & map
			Dots     lines     rects   text    subwins (50 kids)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX    1630000  911000    278000  273000  6000
HP9000/720 G/CRX        1260000  874000    272000  245000  4500
DECstation 5000/200PXGT  370000  455000    256000   90900  1750
Sun SPARCstation 2GX     101100  147000     83500   49000  1050
-----------------------------------------------------------------------------


Graphics Performance:
-----------------------------------------------------------------------------
                          2D floating       3D floating pt
		    	  pt vectors/s      vectors/s (peak)
-----------------------------------------------------------------------------
HP9000/730,750 G/CRX      1120000           1150000
HP9000/720 G/CRX          1120000           1150000
DECstation 5000/200PXGT    300000            300000
Sun SPARCstation 2GX       450000            240000
-----------------------------------------------------------------------------


Sequential Disk Access Rates:
-----------------------------------------------------------------------------
                                       Read (kB/s)       Write (kB/s)
-----------------------------------------------------------------------------
HP9000/700, 1*210MByte disk            1120              1140
HP9000/700, 1*420MByte disk            1520              1510
HP9000/700, 2*210MByte disk            2070              1800
HP9000/700, 2*420MByte disk            2460              2140
Sun SPARCstation 2, 207MByte disk       744               794
-----------------------------------------------------------------------------


ANSYS SP-3 results (smaller = better):
-----------------------------------------------------------------------------
                            CPU seconds
-----------------------------------------------------------------------------
Cray 2                       27
HP9000/730,750 G/CRX         49
DEC VAX9000                  65
HP9000/720 G/CRX             66
IBM 6000/540                 68
DECstation 5000             145
IBM 6000/320                107
Sun SPARCstation 1+         311
Sun SPARCstation 2          225
-----------------------------------------------------------------------------
HP numbers were measured with series 800 compiler code. No series 700 
specific optimizations used.


>From news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley Wed Mar 27 10:03:36 EST 1991
Article: 4644 of comp.arch
Path: news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley
From: linley@hpcuhe.cup.hp.com (Linley Gwennap)
Newsgroups: comp.arch
Subject: Re: Snake
Message-ID: <32580006@hpcuhe.cup.hp.com>
Date: 26 Mar 91 22:35:14 GMT
References: <69465@brunix.UUCP>
Organization: PA-RISC Marketing Central
Lines: 104

Due to popular demand, here is an article comparing the new Snakes CPU to
IBM's "America" chip (used in the RS/6000 series).  I have deleted the
section on America.  I would be happy to post more info if this is useful.

						--Linley Gwennap
						  Hewlett-Packard
HP SNAKES CPU

HP's high-performance chip set consists of the  "Snakes"  CPU  chip  and  a
floating  point  coprocessor  ("FPC")  jointly developed with Texas Instru-
ments[1].  These are the first chips to implement the PA-RISC 1.1 architec-
ture.   They  use  a  traditional RISC approach to achieve industry-leading
performance of 72 SPECmarks with a 66 MHz clock.

PA-RISC 1.1, an extension to the original  PA-RISC  architecture,  includes
several  new instructions, many of which accelerate graphics operations[2].
A multiply-and-add instruction (as in IBM's POWER) is  included.  In  addi-
tion,  the  page  size was doubled to 4 KB to reduce the TLB miss rate, and
eight "shadow" registers were added to provide quick context switching  for
the TLB miss handler.

The CPU contains all integer  instruction  processing,  cache  control  and
memory  management  functions.   All  cache  memory is included in external
SRAMs connected directly to the CPU.  Snakes has a 64-bit path  to  the  D-
cache,  just  like  the  R4000.   Both  the I- and D-caches can be accessed
simultaneously, resulting in a total cache bandwidth of 792 MB  per  second
(peak).   The  FPC implements all floating point instructions.  It receives
instructions and data from the caches at the same time as the CPU, and  du-
plicates parts of the CPU's instruction pipeline, eliminating the penalties
often incurred by separate CPU and FPC chips.  Snakes is designed  to  work
with a variety of memory and I/O interfaces.

The CPU uses a five-stage pipeline to reduce cycle time.  The penalties  in
this  pipeline  have been minimized.  For example, conditional branches are
executed with no delay if their outcome is predicted  correctly,  and  with
only  a  single  cycle penalty otherwise.  The branch prediction algorithm,
more advanced than America's, predicts forward branches to be  untaken  and
backward  branches  taken, thus optimizing for loops. The load penalty is a
maximum of one cycle and the store penalty a maximum of two;  these  penal-
ties can usually be avoided by the compiler. All other integer instructions
(except a few rare system control functions) are always executed in a  sin-
gle  cycle.   This uncomplicated design is reflected by a simple, efficient
compiler.

Although Snakes is not superscalar, PA-RISC instructions such  as  ADD  AND
BRANCH,  MOVE  AND  BRANCH and COMPARE AND BRANCH allow a similar amount of
parallelism as America for integer-only applications; in fact, the ratio of
Integer  SPECmarks  to  MHz  for  Snakes (65/66) actually exceeds America's
(35/42).

FPC is a full 64-bit implementation.  It contains  two  parallel  execution
units:   the ALU (addition, conversion) and the MPY unit (multiply, divide,
square root).  Each unit can start a new operation on every other cycle, so
FPC  can  accept one floating point instruction per cycle provided that ALU
and MPY instructions are alternated.

The external caches are direct mapped and are protected by  parity,  making
them  slightly less robust than America's ECC cache.  Cache coherency flags
are included to facilitate multiprocessor operation.  A write-back protocol
is  used  to reduce writes to main memory.  Although Snakes does not imple-
ment America's complex "critical word first" algorithm on cache misses,  it
will  begin  processing  as soon as the critical word is obtained, reducing
the miss penalty by as much  as  seven  cycles.   Snakes  supports  a  wide
variety  of  off-the-shelf  SRAMs  and can be configured with anywhere from
8 KB to 3 MB of external cache.  At  its  maximum  operating  frequency  of
66 MHz, it requires 12 ns SRAMs.

The I- and D-TLBs are fully associative and contain 96  entries  each.   In
addition, each TLB implements four variable size "block" entries capable of
mapping up to 16 MB each, which can be  used  for  large  portions  of  the
operating system and/or graphics frame buffers.  The memory system supports
48 bits (256 terabytes) of virtual address space and 32 bits  (4 gigabytes)
of  real address space.  (This is a subset of the full 64-bit virtual space
allowed by PA-RISC).  Two addressing modes support 1 GB or 4 GB  data  seg-
ments, significantly larger than America's segments.

A separate bus provides access to memory, I/O and,  if  desired,  graphics.
This bus is a synchronous, dedicated interface with a peak transfer rate of
264 MB per second, about one-half the speed  of  America's  memory  system.
The bus bandwidth is limited by its width of 32 bits, but a wider bus would
have required a larger, more expensive package.  Snakes's cache miss penal-
ty,  measured  in cycles, is much higher than America's, due to the shorter
clock cycle time. Snakes compensates for these penalties  by  allowing  for
large  external caches to reduce the miss rate; the performance numbers for
Snakes assume a 128 KB instruction cache and 256 KB data cache.

The CPU is fabricated in HP's CMOS-26 process (a  1.0 micron,  three  metal
layer  process)  and  packaged in a 408-pin PGA.  FPC is fabricated in TI's
0.8 micron CMOS process and placed in  a  207-pin  PGA.   These  PGAs  were
custom-designed  to  allow  high  frequency operation with wide CMOS buses.
The CPU contains about 577,000 transistors, while FPC  uses  640,000.   For
lower-cost  systems,  the  chip set is designed to run at frequencies below
66 MHz, allowing lower-speed SRAMs to be used.  FPC can also be  eliminated
to further reduce costs.

REFERENCES AND NOTES

[1]  "CMOS  PA-RISC  Processor  for  a  New  Family  of  Workstations"   by
M. Forsyth,  S. Mangelsdorf,  E. DeLano,  C. Gleason and J. Yetter, COMPCON
Spring 91 Digest of Technical Papers, February 1991.

[2] "Architecture and Compiler Enhancements for  PA-RISC  Workstations"  by
D. Odnert,  R. Hansen,  M. Dadoo and M. Laventhal, COMPCON Spring 91 Digest
of Technical Papers, February 1991.


----- End Included Message -----
-----
Fred Senese, MS 234 (804) 864-4777 | senese@schug.larc.nasa.gov
(128.155.22.47)
Speaking from (but not for) NASA-LaRC, Hampton VA 23665-5225

Subliminal Message: 1. Anonymously ftp to schug.larc.nasa.gov
                    2. cd ~/resume ; get resume.[ps|tex|ascii].
                    3. Hire me.