Computer Science 335
Parallel Processing and High Performance Computing
Fall 2021, Siena College
In this lab, you will learn how to run interactive and batch jobs with MPI on the Stampede2 production nodes.
You must work individually on this lab, so everyone learns how to use Stampede2.
Learning goals:
Much of what we'll be doing is based on the Stampede2 User Guide.
Getting Set Up
You can find the link to follow to set up your GitHub repository stampedempi-lab-yourgitname for this Lab in Canvas. One member of the group should follow the link to set up the repository on GitHub, then that person should email the instructor with the other group members' GitHub usernames so they can be granted access. This will allow all members of the group to clone the repository and commit and push changes to the origin on GitHub. At least one group member should make a clone of the repository to begin work.
You can create your repository and clone on your computer right away if you'd like, but don't clone it on Stampede2 until instructed to do so.
You may choose to answer the lab questions in the README.md file in the top-level directory of your repository, or upload a document with your responses to your repository, or add a link to a shared document containing your responses to the README.md file.
Better GitHub Interaction on Stampede2
As we have seen, GitHub authentication on systems where you cannot authenticate in a browser is inconvenient at best. On noreaster, we used the gh tool to authenticate with an authentication token. We will do the same on Stampede2.
Using the procedure from the earlier lab, use the XSEDE single sign on system to connect to the Stampede2 system at TACC. Recall that this is a two-step process, where first you connect to the XSEDE SSO with your username and password. This will require the DUO app for 2-factor authentication. Then you use gsissh to connect to stampede2.
On the Stampede2 login node, create a directory named bin within your home directory. Unix systems generally use directories named bin to hold executables, and these directories are added to your shell's search path. It is a common convention when you have executables to install within your own account (as opposed to in a system-wide location available to all users) that you place them in the bin directory at the top level of your account. On many systems, this directory is already in your search path, but this is not the case on Stampede2. So we will add it.
To do this, edit the .bashrc file in your account's home directory using your favorite text editor. Scroll down to line 63 and uncomment the line that modifies the PATH environment variable to include the bin directory you just created.
Now, that line will execute every time you log into Stampede2, and any programs in your bin directory will be available. There's nothing there yet. Let's put a shell script there to make sure things are set up as intended:
cat > binworks
#!/usr/bin/env bash echo "my bin works!"
chmod 755 binworks
. .bashrc
Now every time you log into Stampede2 anything you place in that bin directory will be in your search path.
The whole point of this was to be able to install a copy of the gh utility so you can use it to authenticate git to your GitHub account through your authentication token. You could download the cli repository from GitHub, configure, build, and install it in your Stampede2 account, but instead you can just copy in the executable I built.
cd ~/bin
wget --no-check-certificate https://courses.teresco.org/cs335_f21/gh
(you might get some warnings, but the file should transfer)
chmod 755 gh
Finally, you can authenticate to GitHub so you can clone, push to, and pull from repositories on Stampede2. You will likely need to repeat these steps periodically, as it seems the Stampede2 login nodes don't remain authenticated for long.
gh auth login
and choose the responses as shown here:
You can now clone a copy of this lab's repository and your repository from Lab 4 onto Stampede2.
Using a Stampede2 Login Node
The default compiler and MPI configurations should be sufficient for most or all of our purposes. Compile with mpicc and run programs with mpirun.
We can compile but not run MPI programs on the login nodes. They are not intended for that purpose. But let's try anyway.
Parallel computations will be done on Stampede2's compute nodes, not the login nodes. These are managed by a queueing system called Slurm, which grants access to subsets of the computer to various users. When you are allocated a set of compute nodes, no other users can run jobs on them. But, it counts against our service allocation, so we only want to do this when you really need to.
You can see current summaries of the jobs in the queueing system with the commands sinfo and squeue.
You can see the types of nodes available through the queueing system with the command
sinfo -o "%18P %8a %16F"
This will show a compact summary of the status of the queues. See here to interpret the output.
Using a Compute Node Interactively
A user can and should only log into compute nodes allocated to that user at a given time. We can gain exclusive access to one with the idev command.
Run idev. When prompted, request the default of 68 tasks per node. After a (hopefully) short time, you should get a command prompt on a compute node.
Please log out from the compute node as soon as you are finished with the tasks above. The time the node is allocated to you is charged against our class allocation, regardless of whether you are actively using the CPUs.
Submitting Batch Jobs to the Compute Nodes
The most common way to use the compute nodes of this or any supercomputer is by submitting batch jobs. The idea is that you get your program ready to run, then submit it to be executed when resources to do so come available.
To set up a batch job, you first need a batch script that can be configured to run your program.
Note that the Slurm script also says you should be running out of the
$SCRATCH
directory. This means you should copy your executable
to the $SCRATCH
directory in your account. This is a directory
on a shared partition on which the compute nodes have faster access.
cp mpihello $SCRATCH
Submit the job to the queueing system with the command
sbatch knl.mpi.slurm
Next, let's run with more nodes and processes. We will request 8 nodes, and run 64 processes per node (so the -n value in your script should be 512).
Include the last batch script (3 points) and the last output file (4 points) in your repository.
Now let's do the same for the mpirds program. Create another Slurm script with appropriate values to run mpirds on 4 nodes and a total of 256 processes. Include this batch script and the output file in your repository for this lab (not the previous one where you wrote mpirds). (5 points)
Submission
Commit and push!
Grading
This assignment will be graded out of 35 points.
Feature | Value | Score |
Question 1 | 2 | |
Question 2 | 2 | |
Question 3 | 2 | |
Question 4 | 2 | |
Question 5 | 4 | |
Question 6 | 1 | |
Question 7 | 2 | |
Question 8 | 3 | |
Question 9 | 1 | |
Question 10 | 1 | |
Question 11 | 3 | |
mpihello batch script | 3 | |
mpihellobatch output file | 4 | |
mpirds batch script and output file | 5 | |
Total | 35 | |