Computer Science 400
Parallel Processing and High Performance Computing
Fall 2017, Siena College
For this lab, you will be learning how to compile and run jobs on the Stampede2 system. You will need to complete this lab individually so everyone has the experience of running MPI jobs on the supercomputer.
Much of what we'll be doing is based on the Stampede2 User Guide.
Getting Set Up
You will receive an email with the link to follow to set up your GitHub repository stampedempi-lab-yourgitname for this Lab.
Using a Stampede2 Login Node
Using the procedure from the earlier lab, use the XSEDE single sign on system to connect to the Stampede2 system at TACC.
Clone a copy of your repository from Lab 5 so you have some MPI programs available.
The default compiler and MPI configurations should be sufficient for most or all of our purposes. Compile with mpicc and run programs with mpirun.
We generally will not run MPI programs on the login nodes. They are not intended for that purpose. But since the programs we are trying out here are very short, we can do so.
Real computations will be done on Stampede2's compute nodes, not the login nodes. These are managed by a queueing system called Slurm, which grant access to subsets of the computer to various users. When you are allocated a set of compute nodes, no other users can run jobs on them.
You can see current summaries of the jobs in the queueing system with the commands sinfo and squeue.
You can see the types of nodes available through the queueing system with the command
sinfo -o "%18P %8a %16F"
This will show a compact summary of the status of the queues. See here to interpret the output.
Using a Compute Node Interactively
A user can and should only log into compute nodes allocated to that user at a given time. We can gain exclusive access to one with the idev command.
Run idev. When prompted, request the default of 68 tasks per node. After a (hopefully) short time, you should get a command prompt on a compute node.
Please log out from the compute node as soon as you are finished with the tasks above. The time the node is allocated to you is charged against our class allocation.
Submitting Batch Jobs to the Compute Nodes
The most common way to use the compute nodes of this or any supercomputer is by submitting batch jobs. The idea is that you get your program ready to run, then submit it to be executed when resources to do so come available.
To set up a batch job, you first need a batch script that can be configured to run your program.
Copy the file /share/doc/slurm/knl.mpi.slurm to your directory that contains your Hello, World executable. Edit it to use "hellotest" in place of "myjob", request 1 node and 32 MPI tasks, a run time of 5 minutes, and replace the -mail-user option with your own email address. You should remove the "SBATCH -A myproject" line, since we have just one allocation to be charged for our run. Finally, replace the ./mycode.exe with the name of your executable.
Submit the job to the queueing system with the command
sbatch knl.mpi.slurm
Next, let's run with more nodes and processes. We will request 4 nodes, and run 128 processes per node (so the -n value in your script should be 512).
Include the last batch script (3 points) and the last output file (2 points) in your repository.
Submitting
Your submission requires that all required deliverables are committed and pushed to the master for your repository on GitHub.
Grading
This assignment is worth 30 points, which are distributed as indicated above.