Homework 6 Sample Solutions
CS338, Fall 2000, Jim Teresco

Graded out of 15 points

1. Andrews, Exercise 5.17 (a), p. 259 (10 points)

monitor coaster {
  int seats_available=C;
  cond in_line;
  cond full_car;
  cond ride_over;

  procedure takeRide() {

    // wait in line as long as the car is full
    // when the car empties, we will be awakened and try to come
    // around the loop to get in again
    while (seats_available == 0) {
      wait(in_line);
    }

    // there's an open seat - take it
    seats_available--;

    // if we're the last one in, wake up the car
    if (seats_available == 0) {
      // we can do this and know we will stay awake until we
      // wait below since we're using signal-and-continue
      signal(car_full);
    }
    wait(ride_over);
  }

  procedure load() {

    // we just wait for the car to fill
    wait(car_full);
  }

  procedure unload() {

    // reset the seat counter, wake up all the riders, then wake up anyone
    // waiting in line.
    seats_available = C;
    signal_all(ride_over);
    signal_all(in_line);
  }
}

Passenger process:

while (1) {

  coaster.takeRide();
}

Car process:

while (1) {

  coaster.load();
  // ride
  coaster.unload();
}

2. How might a parallelizing compiler treat our three-dimensional
cellular automata simulator from Project 2?  Do you believe the
parallelism could be detected easily?  Do you think the resulting
program would be efficient? (5 points)

Many variations on the answer are possible but here are some key
points. There is little data dependence in the straightforward
three-dimensional loop.  A parallelizing compiler is likely to be able
to break up slice-by-slice much as we did.  Any dynamic memory
allocation (which I believe we all used) would make the job more
difficult and might necessitate "hints" to the parallelizing compiler.
Also, any bit manipulations implemented as functions to compute
offsets into our arrays would also introduce difficulties for
automated parallelization, as the compiler could not tell at compile
time exactly which array elements are being accessed.  The efficiency
would be determined by how intelligent the message passing code
generated was.  We have knowledge of the communication structure and
can send just 2 large messages (a whole slice) from each processor,
but the parallelizing compiler may not be able to determine this,
instead sending many smaller messages.  If this is the case, the code
is likely to be less efficient.  In a shared memory environment, this
may not matter.