Computer Science 341-02
Parallel Processing

Mount Holyoke College
Fall 2007

Lecture 21: Meeting: Discuss modern parallel architectures; Working on Distributed Quadtree implementation
Date: Thursday, December 6, 2007

Agenda

Announcements
- Baking at the Dobosh house, Tuesday, 12/11
- Final exam will be a take-home, out Tueday 12/11, due back 12/19.
Schedule
- Last meeting: Tuesday, December 11 - final project presentations
- All work to be considered for grading should be submitted by Monday, December 17
Parallel architectures
Working on distributed quadtree implementation
Course evalulations

Parallel Architectures

Historical Systems
- Connection Machine, by Thinking Machines. These are long, long gone. The CM-2 had 64K processors, connected in a mesh network. 2 GB memory, 10 GB disk typically. It was massive for the time (1990). CM-5 has some of the best LED displays.
- Cray, vector supercomputing. Their systems often followed the SPMD model. Cray was bought by SGI and later sold to Tera, which renamed itself Cray.
- Intel Paragon. Paragon was another mesh-interconnected system. The processors were partitioned among jobs. Intel has since gotten out of the supercomputer business, but its processors are used in many of the world's fastest computers today.
ASCI-class supercomputers: Teraflop Computing Mid-late 1990's - the Accelerated Strategic Computing Initiative (ASCI) program from the Department of Energy program made the push for a teraflop system.
- Intel Cluster. ASCI Red, Sandia National Labs (SNL). World's first teraflop system. Linux cluster, but different from most. Its OS is based on linux, but was customized significantly. They stripped out anything that was not needed to make sure as much physical memory as possible was available for user jobs.
- SGI Origin 2000. ASCI Blue Mountain, Los Alamos National Labs. 48 128-way SMP systems.
- IBM SP. ASCI Blue Pacific, Lawrence Livermore National Labs (LLNL).
Turn-of-the-Millennium Era Systems
- PC Cluster. Try this at home. Many fall into the category of Beowulf Clusters. Our cluster would be considered one of these.
- IBM SP. Basically a cluster with high-end RS6000/PowerPC processors and a high-speed switch interconnect. The switch was the key to its performance and the cause of its very high pricetag. Many of the world's top supercomputers in the early 2000's were IBM SP systems.
- SGI Origin 3000, and its predecessor, the SGI Origin 2000. The ccNUMA and NUMA-Flex systems provide non-uniform shared memory access. These systems were popular among those who preferred threads to MPI.
- IBM was selected to build ASCI White, at LLNL, the second generation ASCI platform. A smaller version is Blue Horizon at San Diego Supercomputing Center (SDSC), available to a wider group. Larger clusters built using larger SMPs. Single-node processor utilization is important, but may be difficult to achieve.
- Number one on the Top 500 Supercomputers for a while was Terascale at Pittsburgh Supercomputing Center. It was nicknamed "LeMieux" for obvious reasons. This system was built from Alpha processors.
Issues for Cluster-based Systems
- Assignment of nodes to jobs - we want to keep as many nodes as possible busy computing, but we don't want to let small jobs (few processors) starve larger jobs (many processors).
- Some jobs may require special resources available only on some nodes - for example in the bullpen cluster, some nodes have 4 processors while others have two, some nodes have more memory than others, and some nodes have a gigabit interconnect and others don't.
- A scheduler might require jobs submissions to include a maximum running time to be able to schedule more intelligently.
- Scheduling is likely non-preemptive, but we can consider a "shortest job first" approach.
- Another issue: are compute nodes granted exclusively to one job, or are they shared?
Today's Best
- Check out the most recent Top 500 Supercomputers released last month at SC'07. In particular, look at the architectures of the top 10.
- IBM Blue Gene systems are the leaders: high-density clusters, essentially
- Lots of cores!
- Issues:
  - how to make use of nodes with many cores
  - how to lower power consumption and heat generation
  - how to program a machine with 100,000+ processors
Something Different: TeraGrid: An Internet "cluster of clusters". Many new issues arise here - it is no longer a resource stored in one building in one place with one adminstrative entity. We need to worry about scheduling on a larger scale. User authentication. Use of computers at multiple sites simultaneously will involve slow wide-area network links. How to get the data to the computers, where to store results?

Computer Science 341-02 Parallel Processing

Mount Holyoke College Fall 2007

Agenda

Parallel Architectures

Computer Science 341-02
Parallel Processing

Mount Holyoke College
Fall 2007