Computer Science 400
Parallel Processing and High Performance Computing

Fall 2017, Siena College

Lab 6: Running Jobs on Stampede2
Due: 11:59 PM, Monday, October 2, 2017

For this lab, you will be learning how to compile and run jobs on the Stampede2 system. You will need to complete this lab individually so everyone has the experience of running MPI jobs on the supercomputer.

Much of what we'll be doing is based on the Stampede2 User Guide.

Getting Set Up

You will receive an email with the link to follow to set up your GitHub repository stampedempi-lab-yourgitname for this Lab.

Using a Stampede2 Login Node

Using the procedure from the earlier lab, use the XSEDE single sign on system to connect to the Stampede2 system at TACC.

Question 1: What is the hostname of the node to which you are connected on Stampede2? (Hint: this is the output of the hostname command) (2 points)

Clone a copy of your repository from Lab 5 so you have some MPI programs available.

The default compiler and MPI configurations should be sufficient for most or all of our purposes. Compile with mpicc and run programs with mpirun.

We generally will not run MPI programs on the login nodes. They are not intended for that purpose. But since the programs we are trying out here are very short, we can do so.

Question 2: Compile the MPI hello world program on a Stampede2 login node and run it with 2 processes. What is your output? (2 points)

Question 3: Compile your mpirds program and run it with 2 processes. What is your output? (2 points)

Real computations will be done on Stampede2's compute nodes, not the login nodes. These are managed by a queueing system called Slurm, which grant access to subsets of the computer to various users. When you are allocated a set of compute nodes, no other users can run jobs on them.

You can see current summaries of the jobs in the queueing system with the commands sinfo and squeue.

You can see the types of nodes available through the queueing system with the command

sinfo -o "%18P %8a %16F"

This will show a compact summary of the status of the queues. See here to interpret the output.

Question 4: When you executed the above command, what was the output? How many normal nodes are in the Active state? (2 points)

Using a Compute Node Interactively

A user can and should only log into compute nodes allocated to that user at a given time. We can gain exclusive access to one with the idev command.

Run idev. When prompted, request the default of 68 tasks per node. After a (hopefully) short time, you should get a command prompt on a compute node.

Question 5: What is the host name of the login node you were allocated? (2 points)

Question 6: What is the output of the Hello, World program and the mpirds program when executed with 64 processes on a compute node? (4 points)

Please log out from the compute node as soon as you are finished with the tasks above. The time the node is allocated to you is charged against our class allocation.

Submitting Batch Jobs to the Compute Nodes

The most common way to use the compute nodes of this or any supercomputer is by submitting batch jobs. The idea is that you get your program ready to run, then submit it to be executed when resources to do so come available.

To set up a batch job, you first need a batch script that can be configured to run your program.

Copy the file /share/doc/slurm/knl.mpi.slurm to your directory that contains your Hello, World executable. Edit it to use "hellotest" in place of "myjob", request 1 node and 32 MPI tasks, a run time of 5 minutes, and replace the -mail-user option with your own email address. You should remove the "SBATCH -A myproject" line, since we have just one allocation to be charged for our run. Finally, replace the ./mycode.exe with the name of your executable.

Submit the job to the queueing system with the command

sbatch knl.mpi.slurm

Question 7: What output do you get at your terminal before you get your prompt back? (1 point)

Question 8: You should have received email when your program began executing, and again when it finished. What are the subject lines of those emails? (2 points)

Question 9: What file contains your program's output? How was it specified? Place this file in your repository for this lab submission (don't forget to add, commit and push it so it's on GitHub). (3 points)

Question 10: According your program's output, what was the host name on which your program executed? (1 point)

Next, let's run with more nodes and processes. We will request 4 nodes, and run 128 processes per node (so the -n value in your script should be 512).

Question 11: What were the host names allocated to your processes on this run? (1 point)

Question 12: Which ranks were assigned to each node? Hint: the grep command might be helpful here. (3 points)

Include the last batch script (3 points) and the last output file (2 points) in your repository.

Submitting

Your submission requires that all required deliverables are committed and pushed to the master for your repository on GitHub.

Grading

This assignment is worth 30 points, which are distributed as indicated above.