Computer Science 338
Parallel Processing

Williams College
Spring 2006

Tutorial Assignment 6: Jacobi on a Quadtree
Due: Tuesday, March 14, 2006 at 9:00 AM

Readings

In addition to reading this handout, please read:

Adaptivity and Distributed Linked Structures

So far, we have looked at distributed data structures that use only arrays. Arrays usually are the easiest to distribute; we simply assign a range of subscripts to each process.

In your Jacobi solver, solution points were distributed evenly through the domain:

-------------------------------------
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
|   .   .   .   .   .   .   .   .   |
|                                   |
-------------------------------------

If you wanted a more accurate solution, you could add more solution points. But since your program used a uniform distribution of the points, better accuracy requires adding points everywhere.

-------------------------------------
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
| . . . . . . . . . . . . . . . . . |
-------------------------------------

While this is possible, it increases the total amount of work very quickly.

In some cases, we really only need more points in just parts of the domain. In a heat distribution problem, if there is a heat source in the northwest corner, we may only need more accuracy near that heat source. Fewer points may provide a sufficiently accurate solution further from the source.

-------------------------------------
| ..... . . . . . .   .   .   .   . |
| ..... . . . . . .   .   .   .   . |
| ..... . . . .   .   .   .   .   . |
| . . . . .   .   .   .   .   .   . |
| . . . . .   .   .   .   .   .   . |
| . . .   .   .   .   .   .   .   . |
| . . .   .   .   .   .   .   .   . |
| .   .   .   .   .   .   .   .   . |
| .   .   .   .   .   .   .   .   . |
| .   .   .   .   .   .   .       . |
| .   .   .   .   .   .   .         |
| .   .   .   .   .       .       . |
| .   .   .   .   .                 |
| .   .   .   .   .       .       . |
| .   .   .                         |
| .   .   .       .       .       . |
-------------------------------------

This is significantly more efficient in terms of the amount of computation we need to do to obtain a solution of acceptable accuracy.

If we know ahead of time where the extra work is needed, we could assign extra points there at the start of the computation. However, we often do not know this information. After all, we probably wouldn't be solving problems for which we already have a solution handy, so an adaptive approach is taken. Periodically, the accuracy of the solution can be checked, and extra points added as needed. In the context of the Jacobi solver, we may have a threshold for the greatest allowable difference in temperature between adjacent points. If the difference exceeds the threshold, points are added in that vicinity and the solution is recomputed.

Adaptivity with arrays is difficult, as the completely regular structure is lost. In many cases, a mesh structure, often implemented as a linked data structure, is used instead of arrays.

Meshes come in a variety of types. Meshes consisting of quadrilaterals (hexahedra in three dimensions) are called structured meshes. Meshes constructed from triangles (tetrahedra in three dimensions) are called unstructured meshes. There are big differences in terms of what happens when you solve on them and how hard they are to generate, but for our purposes here, the issues are similar.

Some terminology: a typical mesh consists of three-dimensional regions, and their bounding faces, edges, and vertices. A data structure implementing the mesh will often allow queries such as "what faces bound this region" and "what edges are incident on this vertex" to be made efficiently.

The term "element" often is used to refer to the highest-dimension entity, and in many cases is the entity with which the solution is stored.

In fact, our Jacobi example is really just a computation on a uniform quadrilateral structured mesh. Here, the squares serve as the mesh elements and these contain the solution values.

A (hypothetical) adaptive refinement of such a mesh might look like:

Here is a simple unstructured mesh:

The use of adaptivity (and some other considerations) necessitate linked data structures for the mesh. Here is the triangular mesh after a hypothetical refinement operation:

Such mesh structures may be stored in memory as arrays of entities or with a full topological hierarchy. In your lab tasks this week, you will implement an adaptive structured mesh using a quadtree data structure.

Here are some examples of three-dimensional unstructured meshes:

Mesh of the space about the wing of an Onera M6 airplane. Contains 85,567 tetrahedral elements.
Mesh of a very fake airplane.
Mesh of a simplified perforated shock tube (well, a half cylinder joined with a larger quarter cylinder). Contains about 35,000 tertrahedral elements.
Mesh of human arteries, created from MRI data. Contains 1,103,018 tetrahedral elements.

These sample unstructured meshes are stored in memory using a hierarchical linked structure. These structures are implemented in a software library called the SCOREC Mesh Database (MDB).

The most recent versions of MDB are written mostly in C++, but we will look at it from a C interface only.

A version of this software is installed on the bullpen cluster in /dists/bullpen-active/terescoj/develop/model/scorecMesh/4.5. The public interface is given by the file MSops.h in the include subdirectory. The source code is in scorecMesh and its subdirectories.

The full mesh entity hierarchy is represented. All vertices, edges, faces, and regions are stored in lists. Each entity also maintains lists of its bounding sub- or super-entities.

We will not program using any of the mesh operators directly , but it is worth looking at them to get an idea of what is possible.

Here are what some of the operators do (found in MSops.h):

General library functionality: MS_init(), MS_exit()
Meshes and geometric models (type pMesh)
- Create model: GM_createFromFile(const char *)
- Create mesh: MS_newMesh(pGModel), M_load(pMesh, const char *), M_delete(pMesh)
- Create entities: M_createR(), M_createF(), M_createE(), M_createVP(),
- Query entity counts: M_numRegions(pMesh), M_numFaces(pMesh), M_numEdges(pMesh), M_numVertices(pMesh)
- Entity traversals: M_nextRegion(), M_nextFace(), M_nextEdge(), M_nextVertex()
Mesh entities:
- Region can query bounding faces, edges, vertices.
- Face can query bounding edges, vertices; regions it bounds.
- Edge can query bounding vertices; faces or regions it bounds.
- Vertex can query edges, faces, regions it bounds, also coordinates.
Load and store meshes

We could write programs to operate entirely on these meshes (and many have been written) but it is not yet parallel.

A second library, the Parallel Mesh Database (PMDB) is built on top of MDB and provides for distributed meshes.

PMDB is in /dists/bullpen-active/terescoj/parallel/pmdb/5.5 on bullpen. Its documentation is online.

We would like our distributed meshes to provide all the functionality and flexibility that is provided by the regular MDB.

See the description and figures in the Applied Numerical Mathematics article.

PMDB takes care of all of the message passing to migrate the mesh entities as appropriate and to update all of the partition boundary structures.

A variety of algorithms can be used to determine just how to distribute mesh elements among a set of cooperating processes and several of these mesh partitioning algorithms will be topics of study for us very soon.

Adaptive mesh refinement of PMDB meshes is provided by the (very scary) refdref library.

Refinement of tetrahedra:
Coarsening of tetrahedra:

We will pick up with more on these topics next week.

Lab Tasks

You may work alone or in groups of two or three on this week's programs. Any changes from last week's groups must be confirmed by an e-mail to me no later than Friday, March 10.

This week's task is to develop an adaptive version of the Jacobi solver. We could do this using an unstructured mesh as described above, but that can get complicated quickly.

Instead, you are to write a C or C++ program to solve Laplace's equation on a square domain using Jacobi iteration that operates on an adaptive quadtree structure.

Quadtrees

A quadtree is a spatial tree data structure. Each node in the tree is called a quadrant. Leaf nodes are called terminal quadrants. These terminal quadrants will serve as the elements on which we will perform computation.

The tree's root represents the entire domain, which, in our case, is a square.

The four children of the root each represent one quarter of the space taken by the root.

These children can then be divided in four, continuing down as many levels as desired. Different parts of the domain may be refined to different levels.

Computing on the Quadtree

The Jacobi iteration on a given quadtree is similar to the iteration you've done in the array-based versions. However, since there may be a deeper quadtree in some parts of the domain, some leaf quadrants have neighbors more or less refined than themselves.

In the above figure, the element "E" should compute its value based on the values of the shaded neighbor quadrants. Note that E's north neighbor is actually refined one level further, so it has two immediate north neighbors. Ideally, the north neighbor value should be the average of the two immediate quadrant neighbors, but it is sufficient to use the value at the shaded quadrant (which is the average of the four leaf quadrants).

One obstacle to overcome is to locate neighboring quadrants efficiently. It is possible to maintain direct neighbor links within your quadtree structure, but you may find it more useful to search based on coordinate information. Each quadrant knows its own bounding box coordinates and from this can easily compute the coordinates of the adjacent quadrants at the same level. A nice feature of the quadtree structure is that the quadrants that contain any point in space can easily be found with a simple traversal from the root, determining the correct child at each step by comparing coordinates. If the neighbor point is outside the root quadrant, you know to apply a boundary condition.

Program Requirements

To make the program more interesting in the context of adaptivity, your program should allow a wider range of initial and boundary conditions than your previous implementations.

Initial conditions should be specified by a C function that provides values given (x,y) coordinates of a point in the domain. Each leaf quadrant's solution value is initialized to the value obtained by passing the quadrant's centroid to this function.
Boundary conditions are also specified through C functions. The left and right boundaries take the y coordinate as a parameter and the top and bottom boundaries take the x coordinate as a parameter, allowing boundary conditions to be functions in addition to simple constants. When a quadrant needs to query a boundary condition when computing its value, it should call the appropriate function.
Special internal boundary conditions are specified by one more C function. The function takes a leaf quadrant and sets its value if the quadrant contains any points that have boundary values. For example, if there is a point heat source at (0.25,0.25) keeping a constant temperature of 2, this function will set the solution value of any quadrant containing (0.25,0.25) to 2. This function should be called on each leaf quadrant during each Jacobi iteration step. If the function sets the quadrant's solution value, that value should be used instead of the solution computed based on its neighbors.

These functions can be hard-coded into your program, but you may consider implementing a system where initial and boundary conditions would be specified through a configuration file.

Your program should take several parameters:

an initial global refinement level, which in turn determines the initial number of leaf quadrants
a Jacobi iteration tolerance, similar to the tolerance from the previous implementations
a limit on the number of Jacobi iteration steps
a refinement threshold, specifying the maximum difference in temperature allowed between adjacent leaf quadrants
a limit on the total number of refinement steps

Here is the result of my program when run on the unit square with the x=0 boundary condition set to 2y, the other side boundaries fixed at 0, and a special boundary condition setting the point at (0.75,0.75) to 3. The entire domain is initialized to 0. The initial quadtree is refined 3 levels (64 leaf quadrants), a Jacobi tolerance of 0.0001 with a maximum number of iterations of 2000, and a refinement threshold of 0.05 and a maximum number of refinement levels of 3.

Program Design and Implemenation

The structure of your program might look something like this:

  create quadtree to initial refinement level
  set initial solution values of leaf quadrants to initial conditions
  do {
    do {
      foreach leaf quadrant {
        if (!special boundary condition applied)
          do Jacobi iteration step
      }
    } while (error > Jacobi tolerance && jac_steps < jac_max)
    refine quadrants based on refinement threshold
  } while (any quadrants were refined && ref_steps < ref_max)

Even if you choose to write your program in C instead of C++, I encourage you to follow good object-oriented design. Separate the functionality of the quadtree from the solution process. If you have any questions about how to do this, please ask.

There are a few C macros and functions that you may find useful in the shared area for this assignment.

As difficult as it was to print your solutions in a meaningful way in the uniform grid version, it will be even more so once you include adaptivity. I suggest printing solutions to a file in a format that includes the coordinates of each leaf quadrant and its solution value. You can then plot these values with gnuplot. A script that I used to generate some of the solution plots in this handout is available in the class shared area and here. This is a very simple gnuplot script and I am sure you can do better. A couple bonus points to the group that creates the best gnuplot graphs of your solutions.

Looking Ahead: Parallelization

Since this is a parallel processing course, you will be parallelizing this program for your next assignment. We will be using MPI and the message passing paradigm, but we will spend some time during tutorial meetings next week discussing parallelization issues for a problem such as this. Before your tutorial meeting, please think about how you might go about creating and managing a distributed quadtree structure, how you will do the computation on such a structure, and what you will need to do to maintain a balanced parallel computation after adaptive steps.

Submission and Grading Guidelines

There are several files to turn in for this assignment. They should all be included in a file named tut06.tar that you submit using the turnin utility. Please use the filenames specified and be sure to include your name (and your partner's name, if you are working in a group) in each file.

Your submitted tar file should include your Makefile, your C or C++ source code, a brief README file that expains how to run your program and contains the answers to the questions. Please do not include object files or your executable in your tar file.

Honor code guidelines: While the program is to be done only by you (meaning your group, if you choose to work in a group), along the lines of a laboratory program, I want to encourage you to ask questions and discuss the programs with me and with classmates outside your group as you develop it. However, no sharing of code between groups is permitted. If you have any doubts, please check first and avoid honor code problems later.

Grading guidelines: Your grade for the programs will be determined by correctness, design, documentation, style, and efficiency. In particular, I will run each of your programs and the fastest program in my test cases will receive full efficiency credit, while slower programs will earn only a relative fraction of the efficiency credit.

Computer Science 338 Parallel Processing

Williams College Spring 2006