CCL: W:hardware for computational chemistry calculations
- From: "Perry E. Metzger" <perry:piermont.com>
- Subject: CCL: W:hardware for computational chemistry
- Date: Fri, 30 Sep 2005 11:49:50 -0400
Sent to CCL by: "Perry E. Metzger" [perry:piermont.com]
--Replace strange characters with the "at" sign to recover email
"Luis Manuel Simon" writes:
> Probably, there have been many discussions about hardware on this
> list, but somehow I need some update about few questions:
> Which is the better performance/price choice for computational
> chemistry (quantum chemistry and mollecular dynamic simulations)?
> Is it better a PC-cluster or a workstation? Several PC nodes in a
> cluster or several processors in a board?
> AMD 64, OPTERON, intel PIV, Xeon, SPARC? Are dual processors worth?
> How important is microprocessor cache size?
> How important is RAM size? And registered DDR modules?
> Is it RAID worth? Does SCSI hard drive really perform better than SATA?
> What about operating system, linux distribution, etc?
> Is there any comparison, next to spec.org, about different configurations?
> From what I can tell, at the moment (and this could change at any
time), the price/performance of AMD 64 equipment ("Opteron" is just a
model of AMD 64) is by far the best. I know people who do a lot of
number crunching who are buying AMD 64 whiteboxes by the fleet, and
putting them in clusters.
As to the rest, the Sparc is very very far behind on the
price/performance curve at this point. I wouldn't even vaguely
Cache size is of course quite important on modern machines, because
main memory continues to lag far behind the speed of the cache. Taking
a cache miss means your processor sits around twiddling its thumbs for
a very long time, so you want as few cache misses as possible. That
said, the precise size of cache that is important to you is totally
dependent on the specific problem you are working on. Some problems
fit in fairly small caches, some need very large caches. "It depends."
Of course, since you can't buy cache separately from the processor in
a modern machine, generally speaking this isn't something to do
separately from the processor decision -- if you've benchmarked a
particular processor and it works best, you know the cache it has is
the best for your problem.
RAM size is also an issue. On a modern machine, you want to avoid ever
getting page faults because paging something in is Very Very Very
Slow. That means you want enough RAM that your entire working set fits
in memory nicely. RAM is also used these days as buffer cache for file
data, so the more RAM you have, the faster file i/o ends up being if
you're doing lots of it. So, again, how much is enough depends a lot
on your problem. Different sized problems will eat different amounts
of RAM successfully. Luckily, these days RAM is dirt cheap. The
biggest issue you will have is in dealing with problems that need very
large memory spaces -- if you need an address space for your problem
with more than a gig or two of memory, you need a 64 bit
processor. (Technically, a 32 bit processor can handle 4G of address
space, but remember that in most OSes, a big chunk (sometimes half) of
the address space (not memory, address space!) is used by the OS, and
mappings for the stack, shared libraries, etc., will make it
impossible to truly use all the rest.) The 32 bit x86 processors can
use more than 4G of RAM, but they can't devote it all to the address
space of one process -- it is only really useful if you have multiple
processes that can make use of it -- so again, if you have a problem
that needs a big address space, you want AMD 64 or the Intel stuff
with the same 64 bit extensions (but those processors aren't as fast.)
On the question of clusters versus MP in a single machine, again,
"that depends". Multiple processors or machines in general will only
help you if your problem is easily parallelized. If it is easily
parallelized but requires extremely tight coupling (i.e. you've got
some small places where you could vectorize but you'll lose if you
have to take a communications hit), even more than one processor won't
help. If you have somewhat looser communications constraints but need
shared RAM in order to operate effectively, you have no choice but to
use an MP machine. Here again, though, keep in mind that you have
strong limits to where you will get that way -- price/performance of
MP machines goes down sharply with the number of processors, and
affordable MP machines rarely have more than 4. Even in clusters,
though, sometimes dual processor machines make sense if you save
enough on things like power supplies, cases, etc. that you've saved
money over all. If your problem is embarrassingly parallel and you can
put it over a cluster, by all means, use a cluster. Especially with
gig E networking cards as cheap as they are now, you'll win big.
On disk performance and controllers: in computational problems, it is
very rare that you are actually remotely I/O bound. Most computational
chemistry isn't moving 200G of data set in and out of RAM as fast as
possible, it is loading a problem into memory and the problem then
stays in memory. If your problem is a factor of 2 too big for memory,
don't get a faster disk controller, get more RAM -- it will make far
more of a difference. In other applications -- if you were building a
database server, say, or you have a rare chem problem that actually is
disk bound -- I'd give a different answer. In general, though, these
days I go with SATA when I can -- the price/performance of SCSI
doesn't justify it any more except at the very high end of I/O
needs. RAID is nice on a server because it can save you when you have
trouble and such, but really, RAID or SCSI on machines in your compute
cluster would be silly, since they're not going to do much disk i/o if
you can help it! If your cluster machine isn't doing much disk i/o,
just use the SATA or IDE controller on the mother board and be done
On RAM, for large memories, you really *need* ECC. The odds are just
too high that you'll get a single bit error somewhere and ruin days of
number crunching with it. Skimping on fancy cases for your computers
is one thing, skimping on ECC is another. Incidently, not all chip
sets actually pay attention to ECC! Make sure your motherboards do, or
you'll be spending extra money for nothing.
By the same token, incidently, do not get crappy power supplies --
they are by far and away (in my experience) the biggest source of
"mysterious trouble" in modern machines. A nice ANTEC or other quality
supply properly rated for the power consumption of your cluster member
will make it a whole lot happier long run, which means less trouble
for you. Similarly, clean AC power going in, and a nice clean machine
room (google for "zinc whiskers" some time) will make you far happier.
As for operating systems, that's a matter of taste. So long as you
aren't Windows, you'll be fine. And yes, I'm quite serious about "not
Windows". You would imagine that since we're talking about something
compute bound that barely cares about the kernel, all OSes would be
the same, but you would be wrong.
There are multiple reasons to stay away from Windows for compute
clusters. First, there are dumb architectural mistakes in Windows that
make it do too much I/O -- it pages too often, and doesn't use RAM as
buffer cache efficiently. Linux and BSD also need tuning to minimize
paging, but you at least *can* effectively do the tuning. On Windows,
you'll be fiddling with the registry forever, trying to figure out why
it is that you can't get the buffer cache large enough and can't keep
your executable pages in RAM long enough even though there are gigs of
free memory, and then you'll spend six hours on the phone with
Redmond, and then you'll give up. Give up in advance -- you will be
happier. Also, network I/O performance just can't be tuned up as far
as you want on Windows. Lastly, and almost most significantly, you can
set up Linux or NetBSD or FreeBSD so that you just shove another box
into the rack, let it PXE boot, and walk away from it, never having to
think about it again until the day your management script says you
need to replace it because it died. You just can't do that with
Windows -- it is simply not architected right to allow you to mass
manage that way. I've set up systems for mass managing hundreds of
machines in Unix clusters without human intervention, and I know of no
one who has ever come close on that level of automation with
Windows. Keep away. Even if you're managing one box, it is still too
much of a pain. By the way, all this is even ignoring the needless
expense of Windows licenses that could be better spent on faster
processors or more RAM. I know hedge funds where money is no object
and still none of them run Windows on compute clusters.
Things are a bit different if you're just buying a personal
workstation. In that case, I'd buy a Mac. :)
All that said, if you are building a compute cluster, if you use a
fairly late model Linux, NetBSD, FreeBSD, etc., you can make it work
well. I'm a NetBSD bigot myself, but that is not for reasons that
would impact computational clusters substantially. You can do fine
with any of them if you're doing what is largely number crunching. The
one to pick is the one that you or your admin staff feel most
comfortable about managing -- and most importantly, automating the
management so you don't ever have to log on to 10 or 50 or even 5
boxes to fix something.
A strong recommendation though that I'll bring up here because it is
vaguely OS related -- do NOT use more threads than processors in your
app if you know what is good for you. Thread context switching is NOT
instant, and you do not want to burn up good computation cycles on
useless thread switching. If you have one processor machines in your
cluster, stick to one computing process/thread on it. If you have dual
processor boxes, one process with two threads or two processes with
one thread make fine sense, but 40 would not be a good use of your
cycles. If you need to talk to 60 TCP sockets to share data between
boxes, use event driven code, not 60 threads, if you want performance.
A final recommendation about buying stuff.
If you are buying one workstation for yourself, buy a Mac.
If you are buying just a couple of compute servers to run an open
source Unix, buy Dell -- for whatever reason, their prices are now
totally insanely below what I can manage anywhere else in low
quantities -- but be careful about which of the "small business" and
"academic" discounts are lower at the moment. Often the academic price
is insanely high for no obvious reason.
If you are buying 50 and especially 500 machines, either spec and buy
parts and assemble the things, or spec the parts and work with a
reputable white box manufacturer to get them built for you. For large
clusters, you want to get *exactly* what you want. You especially
don't want to find out months down the line that half the machines are
subtly different from the other half. Also, for large clusters, the
maint. contracts you can get with things like Dells are useless
expenses -- just buy spare machines and spare parts, they're cheaper
-- and keep in mind that these days, the biggest single limiting
factor you'll get in the average machine room for a large cluster is
getting enough power in and enough heat out.
For truly huge clusters, you can start doing truly odd things to get
better price/performance. Google does a couple of rather radical
things -- for example they don't put their machines in real cases (it
would cost more money and would lower the rate at which their racks
can extract heat from the systems, not to mention that it slows down
getting at the machines). You don't have to go that far, but even in
smaller clusters thinking about mechanics can help. Nylon thumb screws
instead of metal screws make getting stuff in and out faster. Velcro
wraps instead of nylon tie wraps can make it far easier to pull things
in and out. Properly planning your cabling (and labeling it!) so you
can be sure that the connectors on the ethernet switch systematically
correspond to the machines in the rack yields surprising benefits. Oh,
and always remember, automate everything in the systems administration
you can possibly automate.