From chemistry-request@ccl.net Mon Nov 18 02:51:25 1991 Date: 18 Nov 91 18:23:03 U From: "Frank Larkins" Subject: Post Doc Available Australi To: CHEMISTRY@ccl.net Status: R Subject: Time:5:08 PM OFFICE MEMO Post Doc Available Australia Date:11/18/91 I have a post-doctoral fellowship available in Australia for at least two years from early 1992 in the general area of theoretical chemical physics. A sound knowledge of quantum chemistry especially molecular orbital methods is essential. We are working on atomic and molecular problems in photoelectron,Auger electron and X-ray emission spectroscopies. E.g. see Physica Scripta 41,827,1990; J Phys B 24,741,1991; J Chem Phys 86,3239, 1987 For more information contact: Professor Frank Larkins, School of Chemistry University of Melbourne, Parkville 3052 Australia Fax (613) 347 6883 PH (613) 344 4181 e-mail Frank_Larkins.REGISTRAR@muwayf.unimelb.edu.au From chemistry-request@ccl.net Mon Nov 18 08:44:49 1991 From: senese@schug.larc.nasa.gov (Fred Senese) Date: Mon, 18 Nov 1991 08:20:56 -0500 To: chemistry@ccl.net Subject: Re: HP vs. IBM Status: R Mark Murcko writes: >We are currently starting an evaluation of the HP 720 and 730 vs. the >IBM 550. Does anyone have any information that they would like to >share with the group? Here are a few timings for the NDSU GAMESS codes ported to the HP Cobra workstations. The benchmarks are the standard BENCH*.INP tests included on the 10/90 GAMESS distribution tape; they represent "typical" quantum chemical computations using RHF, MCSCF, CI, and gradient methods. Identical codes were used on all machines, but native blas routines and the highest level of compiler optimization were used whenever possible. Machines were *not* dedicated for benchmarking (but with the exception of the YMP (see below) the additional load was very low). All jobs were run with a limit of 750 000 words in the GAMESS dynamic memory pool. Times are given as user/system times in seconds. The number in parentheses is the total time normalized to the performance of the Cray YMP. Sun Sparcstation Dec IBM HP 9000 HP 9000 CRAY Test SS1 SS2 DS5000 RS6000 720 730 YMP -------- -------- -------- -------- -------- ------- -------- 0 4792/303 1099/145 --- --- 737/ 39 --- 720/27 (6.8) (1.7) (1.0) (1.0) 1 253/ 32 104/ 16 80/ 21 46/ 7 36/ 4 32/ 3 18/ 2 (14.3) (6.0) (5.1) (2.7) (2.0) (1.8) (1.0) 2 260/ 20 112/ 11 103/ 17 99/ 8 72/ 4 56/ 2 22/ 2 (11.7) (5.1) (5.0) (4.5) (3.1) (2.4) (1.0) 3 794/168 354/ 92 266/108 210/36 145/ 22 139/17 78/ 7 (11.3) (5.2) (4.4) (2.9) (2.0) (1.8) (1.0) 4 936/146 402/ 84 295/ 98 169/48 128/ 20 124/16 72/ 6 (13.9) (6.2) (5.0) (2.8) (1.9) (1.8) (1.0) 5 12708/736 5612/424 3917/395 1737/191 2102/100 1592/82 316/47 (37.0) (16.6) (11.9) (5.3) (6.1) (4.6) (1.0) 6 431/ 95 198/ 55 158/ 46 136/27 96/ 16 88/13 38/11 (10.7) (5.1) (4.2) (3.3) (2.3) (2.1) (1.0) 7 5256/378 2203/268 1963/243 1124/117 1110/ 62 857/52 267/31 (18.9) (8.3) (7.4) (4.2) (3.9) (3.1) (1.0) 8 22250/3148 9322/1775 --- 2903/816 2864/384 --- 1790/110 (13.4) (5.8) (2.0) (1.7) (1.0) 10 1015/ 68 431/ 37 332/ 41 185/ 14 134/ 8 110/6 69/ 3 (15.0) (6.5) (5.2) (2.8) (2.0) (1.6) (1.0) 12 7546/995 3338/572 --- 1137/255 942/110 --- 475/32 (16.8) (7.7) (2.7) (2.1) (1.0) MACHINES SS1,SS2 Sun Sparcstation 1, 2 (quantum, mermaid) running OpenWindows 2.0, SunOS 4.1.1, f77 1.3.1 / cc 1.0, -cg89 -dalign -Bstatic -O3 -libmil DS5000 DecStation 5000 (ultra) Ultrix 4.1, MIPS f77 2.10 -O2 -G 0. RS6000 IBM RS6000 m530 (ibmr6000) AIX 3.1, xlf 1.01, -O -L/lib -lblas. GAMESS modules scflib, gamess, statpt, hss2b, and inputb GAVE INCOrrect results with xlf -O; these modules were compiled without optimization. HP 9000/720 HPUX 8.01, f77 -O HP 9000/730 HPUX 8.05, f77 -O CRAYYMP Cray YMP/332 (sabre) UniCOS 5.1, cft77 3.1.1 -O full,nozeroinc -Zp, cc -O -h intrinsics,olevel_3. libsci blas were used. System load was such that these jobs received 5-10% of a 3-cpu machine. BENCHMARKS 1. GAMESS RHF SiC_2 H_6, with a 61 AO basis set. Mostly scalar. Uses 14 Mb of disk. 2. GAMESS MCSCF, SiH2, with a 29 AO basis set and 51 CSF's. Vectorizable. Uses 5 Mb of disk. 3. GAMESS second order CI, Si_2 H_4, with a 46 AO basis and 4600 CSF's. The calculation involves out-of-core sorts on large disk files. Uses 64 Mb of disk. 4. GAMESS RHF SiC_3 H_8, with an 80 AO basis set. Mostly scalar. Uses 38 Mb of disk. 5. GAMESS MCSCF + gradient, C_3H_4, with a 53 AO basis and 20 CSFs. MCSCF is vectorizable, gradient is mostly scalar. Uses 84 Mb of disk. 6. GAMESS CI transition, O2+, 60 AO basis, 504 CSF's. Vectorizable. Uses 43 Mb of disk. 7. GAMESS MCSCF OHBr, with 49 AO's and 110 CSF's, requiring 55 Mb of disk. Vectorizable. 8. GAMESS GVB-PP, SnC_5H_6, with 96 AO's and 6 CSF's. Uses 105 Mb of disk. 10. GAMESS ROHF gradient calculation on P_2 H_4+, with a 56 AO basis set, requiring 14 Mb of disk. 12. GAMESS RHF + gradient, SbC4H4NO2, with 110 AO's, requiring 111 Mb of disk. Mostly scalar. ----- Fred Senese, MS 234 (804) 864-4777 | senese@schug.larc.nasa.gov (128.155.22.47) Speaking from (but not for) NASA-LaRC, Hampton VA 23665-5225 Subliminal Message: 1. Anonymously ftp to schug.larc.nasa.gov 2. cd ~/resume ; get resume.[ps|tex|ascii]. 3. Hire me. From chemistry-request@ccl.net Mon Nov 18 10:54:28 1991 Date: Mon, 18 Nov 91 10:33:08 EST From: bernhold@qtp.ufl.edu To: senese@schug.larc.nasa.gov (Fred Senese), chemistry@ccl.net Subject: Re: HP vs. IBM Status: R It would be nice to also know the hardware configurations of the machines (amt. of memory, type of disks, etc.) since this can influence the performance as well. -- David Bernholdt bernhold@qtp.ufl.edu Quantum Theory Project bernhold@ufpine.bitnet University of Florida Gainesville, FL 32611 904/392 6365 From chemistry-request@ccl.net Mon Nov 18 13:43:30 1991 Date: Mon, 18 Nov 91 10:25:01 PST From: ross@zeno.mmwb.ucsf.EDU (Bill Ross) To: chemistry@ccl.net Status: R Here is my promised summary of responses to my question about SHAKE. Another question was raised for implementors of molecular dynamics: does your program have multiple time steps? If not, do you plan to implement them? I'll summarize any answers here. Bill Ross UCSF From: ross@zeno.mmwb.ucsf.edu (Bill Ross) To: chemistry@ccl.net Subject: SHAKE failure I'm collecting interesting stories of SHAKE failure in molecular dynamics runs - cases that were never figured out as well as ones that were. References to anything written on this subject would be welcome too. I'll summarize to the reflector. Bill Ross UCSF [The inspiration for this question was occasional SHAKE failure in Amber. Dave Pearlman diagnosed this as stemming from periodic boundary conditions (constant pressure) where ions are treated as part of the solute, all solute-solute interactions are included (no cutoff applied) and so the solute is not imaged with itself: when an ion crosses the edge of the box it is translated to the other side, and if another ion is close by, they suddenly "see" each other and the resulting spike in the virial causes an expansion of the box to equalize the "pressure," which puts some bond lengths beyond the ability of SHAKE to recover. In my opinion, the best way to get around this is to either run with constant volume or treat the ions as part of the solvent so they are imaged. Neither way is completely satisfactory, since in constant volume ions will still suddenly appear in proximity and when ions are treated as part of the solvent cutoffs are applied and long-range electrostatics are lost. I have always run with enough water so that it hasn't happened to me, but this is partially a matter of luck.] >From STOUTEN@embl-heidelberg.de Wed Nov 13 01:44:46 1991 Subject: Re: SHAKE failure To: ross@zeno.mmwb.ucsf.edu Hi Bill, I don't have any interesting stories. I wonder, however, why you do your investigation since SHAKE is being phased out and will be replaced by multiple time step algorithms. Even Wilfred takes this stand. Cheers, Pieter Stouten >From STOUTEN@embl-heidelberg.de Thu Nov 14 16:26:56 1991 Subject: Re: SHAKE failure To: ross@zeno.mmwb.ucsf.edu Hi Bill, On Wed, 13 Nov 91 09:11:32 PST you wrote: >why even ask? - sociology/folklore of science. > That sounds like a good reason. Still, I don't have exciting stories. When being far from equilibrium I had often problems or when applying heavy torsion constraints (this seems unrelated but is not). Then I just did not use shake. >how soon do you expect multiple time steps to take over? > Hard to say. I am a bit away from the field now. I know that already 3 years ago people were talking about implementing it. As for me, I basically use GROMOS and considering how busy Wilfred c.s. are I don't know when the first official release after GROMOS 87 will see the light. Cheers, Pieter Stouten. #### # # ### # European Molecular Biology Laboratory # ## ## # # # Biocomputing Programme ### # # # ### # Meyerhofstrasse 1, D-6900 Heidelberg, Germany # # # # # # e-mail: stouten@embl-heidelberg.de #### # # ### #### phone: +49-6221-387 472, fax: 387 517 >From balbes@osiris.rti.org Wed Nov 13 05:40:27 1991 To: ross@zeno.mmwb.ucsf.edu (Bill Ross) Subject: Re: SHAKE failure Okay, this won't be much help, but... I had shake fail after about 14 ps. I had Tom Darden look at it, and he said that because I was saving the steps in binary form, any restart would fail exactly the same way. He said (I think) that he saves things in ascii, so that roundoff on restarting will get around any failures of this type. I have since been using tom's fast amber 3a, and haven't had any more problems of this type. Of course the files have been long since purged. This was quite awhile ago, so details are fuzzy. Lisa %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% standard disclaimer %%%% Lisa M. Balbes, Ph.D. phone: 919-541-6563 Research Triangle Institute, PO Box 12194 vmail: 919-541-6767, xt 6563 Research Triangle Park, NC 27709-2194 email: balbes@osiris.rti.org - This came directly from a computer and should not be doubted or disbelieved.- >From harris@athena.mit.edu Wed Nov 13 05:47:02 1991 To: ross@zeno.mmwb.ucsf.edu (Bill Ross) Subject: Re: SHAKE failure This example concerns SHAKE's cousin, RATTLE applied to simulation of 50 hexane molecules at liquid density. It is more an example of the effects of the timestep than a complete failure. Using a timestep of .007 ps, if one compares the pressure and temperature computed with the molecular virial with the similar quantities computed using the atomic virial discrepencies are obversed. In a 1.120 nanosecond (160 000 time steps), the average molecular temperature is 303.5 compared to an atomic temperature of 300.5 (there is a thermostat attached to maintain the average temperature at 300K by a weak coupling to an external bath (i.e Berendsen et al. JCP 81, p3684 1984). The pressure from the molecular virial is also about 20 atm higher than that computed from the atomic virial. Cutting the timestep in half reduces the difference between the two methods of computing the temperature and pressure to 1 degree and 5 atm. It appears that the atomic versions of the temperature and pressure are more accurate, but the statistics are too poor to bury the question. Jonathan G. Harris, H. P. Meissner Assistant Professor, Department of Chemical Engineering, MIT Rm 66-450 25 Ames Street, Cambridge, MA 02139 harris@athena.mit.edu (617)253-5273 Fax 253-9695 ------- From: nobody@kodak.com To: "amber@cgl.ucsf.edu"@kodak.com Subject: RE: SHAKE nightmares. >From: NAME: Adi M. Treasurywala FUNC: Biophys. & Compu. Chem. TEL: (518)445-7042 To: NAME: Edward P. Jaeger , "amber@cgl.ucsf.edu"@KODAKR@MRGATE@WPC Bill, We have completed a fairly detailed study in AMBER3.0A of this problem. We ran into it because we were trying to do REAL classical dynamics (ie not constant temperature!!!). What happened was that in the initial runs on the molecule that we were interested in we found that the total energy simply did not stabilize but decreased steadily over the run at a fairly precipitous rate. There seemed to be no way to stabilize it. We therefore took a small cyclic peptide WFGLMQ and essentially banged the heck out of it by trying many different conditions to narrow down the reason for this "bug". One of us at least was convinced that it must be something that WE were doing wrong rather than that AMBER could have a bug in it that was so glaring! Anyway, we discovered a LOT of other interesting things along the way such as objective criteria (vs semi religeously based recipies) for knowing when your thermalization was done, and some real reasons to start and finish the collection of MD data (incidentaly it is rather system size dependant...). After much trial and error we found (thanks are due here to George Seibel who REALLY put us on to this) that it was indeed the SHAKE option that had caused us all this grief. If we turned SHAKE OFF on an otherwise identical run that had failed with SHAKE ON, it went just fine!! Everything stabilized and we were able to run real classical dynamics (this is sort of an emotional issue for us here!!). We are in the process of preparing a manuscript covering this work. Let me just ask the gurus on this net... Would that be a good thing to do? Would it help anyone? Would it insult or offend anyone? Incidentaly I learned that the real problem with SHAKE was the integrator (leapfrog just can't handle classical MD with SHAKE turned on I was told by someone.). Hope to hear the other stories soon and to hear any discussions about classical vs constant temp runs! Adi T & Ed Jaeger. From chemistry-request@ccl.net Mon Nov 18 18:33:58 1991 From: senese@schug.larc.nasa.gov (Fred Senese) Date: Mon, 18 Nov 1991 14:09:48 -0500 To: chemistry@ccl.net Subject: Re: HP vs. IBM Status: R bernhold@qtp.ufl.edu writes: >It would be nice to also know the hardware configurations of the >machines (amt. of memory, type of disks, etc.) since this can influence >the performance as well. SS1: 24 Mbytes of 80-ns DRAM memory. scratch on a tmpfs file system spanning a Wren VI CDC 94191-766 and a Fujitsu M2263 drive.*** 20 MHz Sparc-based CPU with a 20MHz Weitek 3170-based FPU. SS2: 24 Mbytes of 80-ns DRAM memory. internal SUN 207 Mb 3.5'' SCSI drives scratch on a Wren VI CDC 94191-766 40 MHz Sparc-based CPU with a 40 MHz TI TMS390C602A-based FPU. Decstation 5000: 24 Mbytes of DRAM memory. scratch on a Wren VI CDC 94191-766 25 MHz MIPS-based CPU HP/720: 32 Mbytes of 80-ns DRAM memory two internal Quantum 210 Mb 3.5'' SCSI drives scratch on external Fujitsu 1.4 Gb drive. 50 MHz PA RISC 1.1 CPU I hope this helps. I don't have information on the HP/730 or IBM Model 530's disk/memory configurations; they were on evaluation here a few months ago and aren't available to me now. ***I don't recommend this configuration for Sparcstations doing ab-initio work; a bug in the 4.1.1 tmpfs file system sometimes put large jobs into a noninterruptable disk wait for very long periods. (The 100174-01 OS patches haven't helped.) I now use ordinary 4.2 file systems tuned for very large scratch files. Here are some posts (clipped from comp.arch a few months ago) that may be of interest: ----- Begin Included Message ----- Article: 4642 of comp.arch Xref: news.larc.nasa.gov comp.sys.hp:2596 comp.sys.apollo:2720 comp.arch:4642 comp.benchmarks:466 Path: news.larc.nasa.gov!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!kuling!irf From: irf@kuling.UUCP (Bo Thide') Newsgroups: comp.sys.hp,comp.sys.apollo,comp.arch,comp.benchmarks Subject: Snakebytes (long -- and poisonous?). Message-ID: <1998@kuling.UUCP> Date: 27 Mar 91 00:48:19 GMT Sender: news@kuling.UUCP Reply-To: irf@kuling.DoCS.UU.SE (Bo Thide') Organization: Dept. of Computer Systems, Uppsala University, Sweden Lines: 95 Now that the Snakes (HP9000/700 series HP-PA 1.1 RISC workstations) are let loose, the official HP info has become available. Some of this info follows. There are three models, the desktop (114mm*508mm*470mm) 720 (Cobra) and 730 (King Cobra) and the deskside (610mm*220mm*595mm) 750 (Coral). They come initially with HP-UX 8.01 to be upgraded to HP-UX 8.05 in June. Later OSF/1 will be available. Clock: 50 MHZ (720) or 66 MHz (730, 750) Cache: 128 kB instr/256 kB data (720, 730), 256 kB instr/256 kB data. Interfaces: SCSI-II, EISA, LAN, RS-232 (to 460.8 kbaud), HP-HIL, Centronics. HP-IB optional (via EISA!). Monitors: 72 Hz, 19" 1280x1024 8-bit grayscale (GRX) or 8+8 color planes (CRX). Software: X11R4, OSF/Motif1.2 (not 1.1!), VUE, NCS, NFS, 4.3BSD TCP/IP, ARPA. Languages: C, C++, Pascal, FORTRAN, ANSI C, Assembler. FORTRAN compiler with "+800" option for series 800 compatibility. Series 800 binaries run on series 700 machines. Performance (with HP-UX 8.05) and comparison with other workstations: ----------------------------------------------------------------------------- SPEC Khorner- Linp2P x11- Dhry- mark int fp stones MIPS MFLOPS perf stone2.0 ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 72.2 51.0 91.0 143974 76 22.9 10460 114680 HP9000/720 G/CRX 55.5 39.0 70.2 119213 57 17.2 8244 87000 IBM 6000/550 54.3 34.5 73.5 n/a 56 23 n/a n/a IBM 6000/320 24.6 16.3 32.4 54661 29.5 8.5 1520 45250 Sun SPARCstation 2GX 21.0 20.2 21.5 27142 28.5 4.2 n/a 35590 DECstation 5000/200PXGT 18.5 19.0 18.5 26456 24.2 3.7 3256 38760 DECstation 3100 11.3 11.8 10.9 15285 14.9 1.6 1702 23470 Sun SPARCstation IPC 11.8 12.4 11.4 13329 15.7 1.7 n/a 22830 ----------------------------------------------------------------------------- Linp2P = Linpack Double precision, 100*100 FORTRAN BLAS, rolled. x11perf = geometric mean of the x11perf1.2 component tests (excluding 1 and 500 pixel tests). Selected x11perf Tests: ----------------------------------------------------------------------------- 10 pixel 10*10 TR create & map Dots lines rects text subwins (50 kids) ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 1630000 911000 278000 273000 6000 HP9000/720 G/CRX 1260000 874000 272000 245000 4500 DECstation 5000/200PXGT 370000 455000 256000 90900 1750 Sun SPARCstation 2GX 101100 147000 83500 49000 1050 ----------------------------------------------------------------------------- Graphics Performance: ----------------------------------------------------------------------------- 2D floating 3D floating pt pt vectors/s vectors/s (peak) ----------------------------------------------------------------------------- HP9000/730,750 G/CRX 1120000 1150000 HP9000/720 G/CRX 1120000 1150000 DECstation 5000/200PXGT 300000 300000 Sun SPARCstation 2GX 450000 240000 ----------------------------------------------------------------------------- Sequential Disk Access Rates: ----------------------------------------------------------------------------- Read (kB/s) Write (kB/s) ----------------------------------------------------------------------------- HP9000/700, 1*210MByte disk 1120 1140 HP9000/700, 1*420MByte disk 1520 1510 HP9000/700, 2*210MByte disk 2070 1800 HP9000/700, 2*420MByte disk 2460 2140 Sun SPARCstation 2, 207MByte disk 744 794 ----------------------------------------------------------------------------- ANSYS SP-3 results (smaller = better): ----------------------------------------------------------------------------- CPU seconds ----------------------------------------------------------------------------- Cray 2 27 HP9000/730,750 G/CRX 49 DEC VAX9000 65 HP9000/720 G/CRX 66 IBM 6000/540 68 DECstation 5000 145 IBM 6000/320 107 Sun SPARCstation 1+ 311 Sun SPARCstation 2 225 ----------------------------------------------------------------------------- HP numbers were measured with series 800 compiler code. No series 700 specific optimizations used. >From news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley Wed Mar 27 10:03:36 EST 1991 Article: 4644 of comp.arch Path: news.larc.nasa.gov!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpda!hpcuhb!hpcuhe!linley From: linley@hpcuhe.cup.hp.com (Linley Gwennap) Newsgroups: comp.arch Subject: Re: Snake Message-ID: <32580006@hpcuhe.cup.hp.com> Date: 26 Mar 91 22:35:14 GMT References: <69465@brunix.UUCP> Organization: PA-RISC Marketing Central Lines: 104 Due to popular demand, here is an article comparing the new Snakes CPU to IBM's "America" chip (used in the RS/6000 series). I have deleted the section on America. I would be happy to post more info if this is useful. --Linley Gwennap Hewlett-Packard HP SNAKES CPU HP's high-performance chip set consists of the "Snakes" CPU chip and a floating point coprocessor ("FPC") jointly developed with Texas Instru- ments[1]. These are the first chips to implement the PA-RISC 1.1 architec- ture. They use a traditional RISC approach to achieve industry-leading performance of 72 SPECmarks with a 66 MHz clock. PA-RISC 1.1, an extension to the original PA-RISC architecture, includes several new instructions, many of which accelerate graphics operations[2]. A multiply-and-add instruction (as in IBM's POWER) is included. In addi- tion, the page size was doubled to 4 KB to reduce the TLB miss rate, and eight "shadow" registers were added to provide quick context switching for the TLB miss handler. The CPU contains all integer instruction processing, cache control and memory management functions. All cache memory is included in external SRAMs connected directly to the CPU. Snakes has a 64-bit path to the D- cache, just like the R4000. Both the I- and D-caches can be accessed simultaneously, resulting in a total cache bandwidth of 792 MB per second (peak). The FPC implements all floating point instructions. It receives instructions and data from the caches at the same time as the CPU, and du- plicates parts of the CPU's instruction pipeline, eliminating the penalties often incurred by separate CPU and FPC chips. Snakes is designed to work with a variety of memory and I/O interfaces. The CPU uses a five-stage pipeline to reduce cycle time. The penalties in this pipeline have been minimized. For example, conditional branches are executed with no delay if their outcome is predicted correctly, and with only a single cycle penalty otherwise. The branch prediction algorithm, more advanced than America's, predicts forward branches to be untaken and backward branches taken, thus optimizing for loops. The load penalty is a maximum of one cycle and the store penalty a maximum of two; these penal- ties can usually be avoided by the compiler. All other integer instructions (except a few rare system control functions) are always executed in a sin- gle cycle. This uncomplicated design is reflected by a simple, efficient compiler. Although Snakes is not superscalar, PA-RISC instructions such as ADD AND BRANCH, MOVE AND BRANCH and COMPARE AND BRANCH allow a similar amount of parallelism as America for integer-only applications; in fact, the ratio of Integer SPECmarks to MHz for Snakes (65/66) actually exceeds America's (35/42). FPC is a full 64-bit implementation. It contains two parallel execution units: the ALU (addition, conversion) and the MPY unit (multiply, divide, square root). Each unit can start a new operation on every other cycle, so FPC can accept one floating point instruction per cycle provided that ALU and MPY instructions are alternated. The external caches are direct mapped and are protected by parity, making them slightly less robust than America's ECC cache. Cache coherency flags are included to facilitate multiprocessor operation. A write-back protocol is used to reduce writes to main memory. Although Snakes does not imple- ment America's complex "critical word first" algorithm on cache misses, it will begin processing as soon as the critical word is obtained, reducing the miss penalty by as much as seven cycles. Snakes supports a wide variety of off-the-shelf SRAMs and can be configured with anywhere from 8 KB to 3 MB of external cache. At its maximum operating frequency of 66 MHz, it requires 12 ns SRAMs. The I- and D-TLBs are fully associative and contain 96 entries each. In addition, each TLB implements four variable size "block" entries capable of mapping up to 16 MB each, which can be used for large portions of the operating system and/or graphics frame buffers. The memory system supports 48 bits (256 terabytes) of virtual address space and 32 bits (4 gigabytes) of real address space. (This is a subset of the full 64-bit virtual space allowed by PA-RISC). Two addressing modes support 1 GB or 4 GB data seg- ments, significantly larger than America's segments. A separate bus provides access to memory, I/O and, if desired, graphics. This bus is a synchronous, dedicated interface with a peak transfer rate of 264 MB per second, about one-half the speed of America's memory system. The bus bandwidth is limited by its width of 32 bits, but a wider bus would have required a larger, more expensive package. Snakes's cache miss penal- ty, measured in cycles, is much higher than America's, due to the shorter clock cycle time. Snakes compensates for these penalties by allowing for large external caches to reduce the miss rate; the performance numbers for Snakes assume a 128 KB instruction cache and 256 KB data cache. The CPU is fabricated in HP's CMOS-26 process (a 1.0 micron, three metal layer process) and packaged in a 408-pin PGA. FPC is fabricated in TI's 0.8 micron CMOS process and placed in a 207-pin PGA. These PGAs were custom-designed to allow high frequency operation with wide CMOS buses. The CPU contains about 577,000 transistors, while FPC uses 640,000. For lower-cost systems, the chip set is designed to run at frequencies below 66 MHz, allowing lower-speed SRAMs to be used. FPC can also be eliminated to further reduce costs. REFERENCES AND NOTES [1] "CMOS PA-RISC Processor for a New Family of Workstations" by M. Forsyth, S. Mangelsdorf, E. DeLano, C. Gleason and J. Yetter, COMPCON Spring 91 Digest of Technical Papers, February 1991. [2] "Architecture and Compiler Enhancements for PA-RISC Workstations" by D. Odnert, R. Hansen, M. Dadoo and M. Laventhal, COMPCON Spring 91 Digest of Technical Papers, February 1991. ----- End Included Message ----- ----- Fred Senese, MS 234 (804) 864-4777 | senese@schug.larc.nasa.gov (128.155.22.47) Speaking from (but not for) NASA-LaRC, Hampton VA 23665-5225 Subliminal Message: 1. Anonymously ftp to schug.larc.nasa.gov 2. cd ~/resume ; get resume.[ps|tex|ascii]. 3. Hire me.