Homework 6 Sample Solutions CS338, Fall 2000, Jim Teresco Graded out of 15 points 1. Andrews, Exercise 5.17 (a), p. 259 (10 points) monitor coaster { int seats_available=C; cond in_line; cond full_car; cond ride_over; procedure takeRide() { // wait in line as long as the car is full // when the car empties, we will be awakened and try to come // around the loop to get in again while (seats_available == 0) { wait(in_line); } // there's an open seat - take it seats_available--; // if we're the last one in, wake up the car if (seats_available == 0) { // we can do this and know we will stay awake until we // wait below since we're using signal-and-continue signal(car_full); } wait(ride_over); } procedure load() { // we just wait for the car to fill wait(car_full); } procedure unload() { // reset the seat counter, wake up all the riders, then wake up anyone // waiting in line. seats_available = C; signal_all(ride_over); signal_all(in_line); } } Passenger process: while (1) { coaster.takeRide(); } Car process: while (1) { coaster.load(); // ride coaster.unload(); } 2. How might a parallelizing compiler treat our three-dimensional cellular automata simulator from Project 2? Do you believe the parallelism could be detected easily? Do you think the resulting program would be efficient? (5 points) Many variations on the answer are possible but here are some key points. There is little data dependence in the straightforward three-dimensional loop. A parallelizing compiler is likely to be able to break up slice-by-slice much as we did. Any dynamic memory allocation (which I believe we all used) would make the job more difficult and might necessitate "hints" to the parallelizing compiler. Also, any bit manipulations implemented as functions to compute offsets into our arrays would also introduce difficulties for automated parallelization, as the compiler could not tell at compile time exactly which array elements are being accessed. The efficiency would be determined by how intelligent the message passing code generated was. We have knowledge of the communication structure and can send just 2 large messages (a whole slice) from each processor, but the parallelizing compiler may not be able to determine this, instead sending many smaller messages. If this is the case, the code is likely to be less efficient. In a shared memory environment, this may not matter.