Computer Science 341-02 |
Assignment 6: Jacobi Iteration on a Distributed Quadtree
Due: 2:30 PM, Thursday, November 29, 2007
This document will take us through the next couple of weeks worth of meetings. Here is our meeting schedule for the remainder of November:
In addition to the readings contained in this document, we will be looking at three articles:
So far, we have looked at distributed data structures that use only arrays. Arrays usually are the easiest to distribute; we simply assign a range of subscripts to each process. Quinn spends a lot of time discussing local vs. global indexing.
In your Jacobi solver, solution points were distributed evenly through the domain:
------------------------------------- | | | . . . . . . . . | | | | . . . . . . . . | | | | . . . . . . . . | | | | . . . . . . . . | | | | . . . . . . . . | | | | . . . . . . . . | | | | . . . . . . . . | | | | . . . . . . . . | | | -------------------------------------
If you wanted a more accurate solution, you could add more solution points. But since your program used a uniform distribution of the points, better accuracy requires adding points everywhere.
------------------------------------- | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | | . . . . . . . . . . . . . . . . . | -------------------------------------
While this is possible, it increases the total amount of work very quickly.
In some cases, we really only need more points in just parts of the domain. In a heat distribution problem, if there is a heat source in the northwest corner, we may only need more accuracy near that heat source. Fewer points may provide a sufficiently accurate solution further from the source.
------------------------------------- | ..... . . . . . . . . . . | | ..... . . . . . . . . . . | | ..... . . . . . . . . . | | . . . . . . . . . . . | | . . . . . . . . . . . | | . . . . . . . . . . | | . . . . . . . . . . | | . . . . . . . . . | | . . . . . . . . . | | . . . . . . . . | | . . . . . . . | | . . . . . . . | | . . . . . | | . . . . . . . | | . . . | | . . . . . . | -------------------------------------
This is significantly more efficient in terms of the amount of computation we need to do to obtain a solution of acceptable accuracy.
If we know ahead of time where the extra work is needed, we could assign extra points there at the start of the computation. However, we often do not know this information. After all, we probably wouldn't be solving problems for which we already have a solution handy, so an adaptive approach is taken. Periodically, the accuracy of the solution can be checked, and extra points added as needed. In the context of the Jacobi solver, we may have a threshold for the greatest allowable difference in temperature between adjacent points. If the difference exceeds the threshold, points are added in that vicinity and the solution is recomputed.
Adaptivity with arrays is difficult, as the completely regular structure is lost. In many cases, a mesh structure, often implemented as a linked data structure, is used instead of arrays.
Meshes come in a variety of types. Meshes consisting of quadrilaterals (hexahedra in three dimensions) are called structured meshes. Meshes constructed from triangles (tetrahedra in three dimensions) are called unstructured meshes. There are big differences in terms of what happens when you solve on them and how hard they are to generate, but for our purposes here, the issues are similar.
Some terminology: a typical mesh consists of three-dimensional regions, and their bounding faces, edges, and vertices. A data structure implementing the mesh will often allow queries such as "what faces bound this region" and "what edges are incident on this vertex" to be made efficiently.
The term "element" often is used to refer to the highest-dimension entity, and in many cases is the entity with which the solution is stored.
In fact, our Jacobi example is really just a computation on a uniform quadrilateral structured mesh. Here, the squares serve as the mesh elements and these contain the solution values.
Here are some examples of three-dimensional unstructured meshes:
These sample unstructured meshes are stored in memory using a hierarchical linked structure. These structures are implemented in a software library called the SCOREC Mesh Database (MDB). We could write programs to operate entirely on these meshes (and many have been written) but it is not yet parallel.
A second library, the Parallel Mesh Database (PMDB) is built on top of MDB and provides for distributed meshes.
We would like our distributed meshes to provide all the functionality and flexibility that is provided by the regular MDB. See the description and figures in the Applied Numerical Mathematics article. PMDB takes care of all of the message passing to migrate the mesh entities as appropriate and to update all of the partition boundary structures.
A variety of algorithms can be used to determine just how to distribute mesh elements among a set of cooperating processes and several of these mesh partitioning algorithms will be topics of study for us very soon.
Adaptive mesh refinement of PMDB meshes is provided by the (very scary) refdref library.
We will pick up with more on these topics later in the semester.
Lab TasksYour final major lab programming task is to develop a parallel adaptive version of the Jacobi solver. This is a complex undertaking, and we will tackle it as a group (that is, all three of us) over the next few weeks. We will use a Subversion repository to facilitate our cooperative development effort. We will need to make a steady effort - there are many stages to the development and we will need time to coordinate our efforts.
We could approach this problem using an unstructured mesh as described above, but that can get complicated very quickly, even more so than the approach we will take. Instead, we will develop a C or C++ program to solve Laplace's equation on a square domain using Jacobi iteration that operates on an adaptive quadtree structure. We will then parallelize this adaptive Jacobi solver, using the SPMD model with MPI for message passing. Finally, we will implement a redistribution procedure to rebalance the load after imbalance is introduced by adaptivity.
A quadtree is a spatial tree data structure. Each node in the tree is called a quadrant. Leaf nodes are called terminal quadrants. These terminal quadrants will serve as the elements on which we will perform computation.
The tree's root represents the entire domain, which, in our case, is a square.
The Jacobi iteration on a given quadtree is similar to the iteration you've done in the array-based versions. However, since there may be a deeper quadtree in some parts of the domain, some leaf quadrants have neighbors more or less refined than themselves.
One obstacle to overcome is to locate neighboring quadrants efficiently. It is possible to maintain direct neighbor links within your quadtree structure, but you may find it more useful to search based on coordinate information. Each quadrant knows its own bounding box coordinates and from this can easily compute the coordinates of the adjacent quadrants at the same level. A nice feature of the quadtree structure is that the quadrants that contain any point in space can easily be found with a simple traversal from the root, determining the correct child at each step by comparing coordinates. If the neighbor point is outside the root quadrant, you know to apply a boundary condition.
To make the program more interesting in the context of adaptivity, our program should allow a wider range of initial and boundary conditions than your previous implementations.
These functions can be hard-coded into our program, but we will implement a system where initial and boundary conditions can be specified through a configuration file, time permitting.
The program should take several parameters:
Here is the result of my program when run on the unit square with the x=0 boundary condition set to 2y, the other side boundaries fixed at 0, and a special boundary condition setting the point at (0.75,0.75) to 3. The entire domain is initialized to 0. The initial quadtree is refined 3 levels (64 leaf quadrants), a Jacobi tolerance of 0.0001 with a maximum number of iterations of 2000, and a refinement threshold of 0.05 and a maximum number of refinement levels of 3.
The structure of the program might look something like this:
create quadtree to initial refinement level set initial solution values of leaf quadrants to initial conditions do { do { foreach leaf quadrant { if (!special boundary condition applied) do Jacobi iteration step } } while (error > Jacobi tolerance && jac_steps < jac_max) refine quadrants based on refinement threshold } while (any quadrants were refined && ref_steps < ref_max)
Even if we choose to write our program in C instead of C++, we should follow good object-oriented design. Separate the functionality of the quadtree from the solution process.
There are a few C macros and functions may be useful in the shared area for this assignment.
As difficult as it was to print your solutions in a meaningful way in the uniform grid version, it will be even more so once we include adaptivity. We will print solutions to a file in a format that includes the coordinates of each leaf quadrant and its solution value. These values may then be plotted with gnuplot. A script that I used to generate some of the solution plots in this handout is available in the class shared area and here. This is a very simple gnuplot script and I am sure you can do better.
This is, after all, an upper-level elective and a parallel processing course, so our ultimate goal in the assignment is to parallelize this program.
There are many choices to make when parallelizing a program such as this. Which parts of the computation are to be performed by each processor? What is the granularity of the units of work to be divided among the processors? What information must be maintained and what communication is needed to support the computation once the work has been divided? Will the workloads need to be adjusted following adaptive steps? How can such a rebalancing be performed?
In order to keep the task manageable, we will follow the following phased development process:
I would like to encourage enhancements, and we can discuss them to make sure they will not lead to unexpected complications.
Using the FreeBSD nodes of our cluster, conduct a performance study that, for a given problem setup (tolerances and boundary conditions), considers
Repeat the above, but for a larger number of processes, using one or more of the computing environments available on the TeraGrid.
Describe the experiment, the computing environments in which you conduct it, and summarize your findings in a file study.txt.
Since we are all working together, there is no formal submission process for the program. However, I would like each of you to generate and submit your own study.txt file describing the study, and a file contributions.txt that describes your contributions to the design and development of the program. Place these in a directory named assignment6 that you submit using the turnin utility.
Your grade for the program will be determined by correctness, design, documentation, style, and efficiency both overall and specifically for your contributions to the program we produce.