Lecture 20 - Internet/Grid Computing, Parallel Architectures
- Announcements
- Selection of times for final project presentations
- Midterm comments
- Internet/Grid Computing
- Parallel architectures
- There are templates for a paper and a talk in /home/faculty/terescoj/latexexample and /home/faculty/terescoj/latextalk, respectively, for those who would
like to use latex for your writeup or to create presentation slides.
- Upcoming Homework/Lab schedule.
Condor and Globus notes from previous lecture.
We will hear about another approach to internet computing, the actor
model, in one of the final project presentations.
As a start of our discussion of architecture-aware parallel
programming, let's consider the kinds of computers where we might want
to run parallel programs.
- a single processor
- a cluster of uniprocessor systems that are connected by a slow
network
- a cluster of uniprocessor systems that are connected by a fast
network
- a symmetric multiprocessor (SMP)
- a cluster consisting of identical SMP nodes connected by a network
- a heterogeneous cluster of processing nodes connected by a
network
- a "cluster of clusters" where there is a network hierarchy
The bullpen cluster falls into the "heterogenous cluster" category.
Some issues for the clustered systems:
Mid-late 1990's - Department of Energy program to push toward a
teraflop system.
Coming Soon
First 100+ Teraflop system contract recently awarded to IBM to
build ASCI
Purple.
Something Different
Tera/Cray
MTA. Only one MTA-1 in
production, SDSC. OS is MTX, a fully-distributed Unix variant. The
system is thoroughly multithreaded, with each "processor" actually
consisting of a number of streams, each of which is fed the code and
data to do some part of the computation. A sufficiently multithreaded
application can keep most of these streams busy, meaning that the
penalty we usually pay for memory latency is gone. In fact, the
system has no traditional memory cache, since it can rely on this
multithreading to mask memory latency. The MTA-2 has since
arrived. Its architecture is similar to the MTA-1, but it's all CMOS
instead of GaAs.
A little more from a Cray developer:
The MTA-2 is quite different. The instruction set is the same as in
the MTA-1. But the MTA-2 is built from all-CMOS parts. It also has a
completely different mechanical structure, and a completely different
network structure. It can also handle more memory, I believe.
Processor boards come in groups of 16 called "cages." Each processor
board (we call them system boards) has one CPU, two memory controllers,
one IOP. Those four entities are called "resources." Each resource is
attached to a network node. The network nodes are connected to each
other in a ring on the system board, and to other boards with various
kinds of inter-board connections.
The MTA system is fully scannable, i.e., with few exceptions,
every flop in the machine can be written and read by diagnostic
control programs. This is how we boot the machine - essentially we
write the state we want into the machine, and then say "go." When
we bring the machine down we read the state out and can get diagnostic
information that way. We use the scan system heavily - for booting,
part testing, development, all kinds of things.
HTMT (Hybrid Technology Multhithreaded
Architecture) Petaflop machine,
2007? NASA JPL among others. Yes, that's petaflop. 1 million
billion floating point operations per second. Highlights:
Superconducting 100 GHz Processors, running at 4 degrees Kelvin, with
smart in-processor memories. Even some physicists think 4 K is kind
of chilly. SRAM section cooled by liquid nitrogen. Optical packet
switching - the "data vortex." Massive storage - high-density
holographic storage system.