Computer Science 335
Parallel Processing and High Performance Computing
Fall 2024, Siena College
In this lab, you will learn how to run interactive and batch jobs with MPI on the Stampede3 production nodes.
You must work individually on this lab, so everyone learns how to use Stampede3.
Learning goals:
Much of what we'll be doing is based on the Stampede3 User Guide.
Getting Set Up
In Canvas, you will find a link to follow to set up your GitHub repository, which will be named stampedempi-lab-yourgitname, for this lab. Only one member of the group should follow the link to set up the repository on GitHub, then others should request a link to be granted write access.
You may answer the lab questions right in the README.md file of your repository, or use the README.md to provide a link to a Google document that has been shared with your instructor or the name of a PDF of your responses that you would upload to your repository.
Using a Stampede3 Login Node
Using the procedure from the earlier lab, log into the Stampede3 at the Texas Advanced Computing Center (TACC).
You can now clone a copy of this lab's repository if you haven't already done so, and your repository from Programming Project 4: Collective Communication [HTML] [PDF] onto Stampede3.
The default compiler and MPI configurations should be sufficient for most or all of our purposes. Compile with mpicc and run programs with mpirun.
We can compile but not run MPI programs on the login nodes. They are not intended for that purpose. But let's try anyway.
Parallel computations must be run on Stampede3's compute nodes, not the login nodes. These are managed by a queueing system called Slurm, which grants access to subsets of the computer to various users. When you are allocated a set of compute nodes, no other users can run jobs on them. But, it counts against our service allocation, so we only want to do this when you really need to.
You can see current summaries of the jobs in the queueing system with the commands sinfo and squeue.
You can see the types of nodes available through the queueing system with the command
sinfo -o "%18P %8a %16F"
This will show a compact summary of the status of the queues. See here to interpret the output.
Before moving on, also compile your programs from Programming Project 4: Collective Communication [HTML] [PDF] on your login node.
Using a Compute Node Interactively
A user can and should only log into compute nodes allocated to that user at a given time. We can gain exclusive access to one with the idev command.
Run idev. After a (hopefully) short time, you should get a command prompt on a compute node.
Run the mpihello program and the mpirds-reduce program with N=1073741824 on 64 processes on a compute node, redirecting your output to files mpihello-out64.txt and mpirds-out64.txt, respectively.
Make sure both of these are in your repository for this lab and are committed and pushed to GitHub.
The proper way to run MPI programs on the Stampede3 compute nodes is with a different command: ibrun. Run mpihello on your compute node.
ibrun ./mpihello
Please log out from the compute node as soon as you are finished with the tasks above. The time the node is allocated to you is charged against our class allocation, regardless of whether you are actively using the CPUs.
Submitting Batch Jobs to the Compute Nodes
The most common way to use the compute nodes of this or any supercomputer is by submitting batch jobs. The idea is that you get your program ready to run, then submit it to be executed when resources to do so come available.
To set up a batch job, you first need a batch script that can be
configured to run your program. A script that will run the
mpihello program with 32 processes one of Stampede3's "SKX"
(Skylake) production nodes is provided in your repository in the file
hellotest.mpi.slurm. Examine this file and make sure you understand
each of the lines that start with #SBATCH
. These lines define
the parameters of your batch submission. You might also use
this page
to find more information about Slurm batch files.
Modify this file to replace the -mail-user option with your own email address.
Submit the job to the queueing system with the command
sbatch hellotest.mpi.slurm
Next, let's run with more nodes and processes. Modify the Slurm script to request 4 nodes, and run 48 processes per node (so the -n value in your script should be 192).
Include the last batch script (3 points) and the last output file (4 points) in your repository.
Now let's do the same for the mpirds-reduce program. Create another Slurm script with appropriate values to run mpirds-reduce for N=1536000000 on 8 nodes and a total of 384 processes. Include this batch script and the output file in your repository for this lab (not the previous one where you wrote mpirds-reduce). (5 points)
Submission
Commit and push!
Grading
This assignment will be graded out of 35 points.
Feature | Value | Score |
Question 1 | 2 | |
Question 2 | 2 | |
Question 3 | 2 | |
mpihello-out64.txt | 2 | |
mpirds-out64.txt | 2 | |
Question 5 | 1 | |
Question 6 | 1 | |
Question 7 | 1 | |
Question 8 | 2 | |
Question 9 | 3 | |
Question 10 | 1 | |
Question 11 | 1 | |
Question 12 | 3 | |
mpihello batch script | 3 | |
mpihello batch output file | 4 | |
mpirds batch script and output file | 5 | |
Total | 35 | |