TSP solves the traveling salesman problem using a branch and bound
algorithm.
The major data structures are a pool of
partially evaluated tours, a priority queue containing pointers to
tours in the pool, a stack of pointers to unused tour elements in the
pool, and the current shortest path.
The evaluation of a partial tour is composed mainly of two procedures,
get_tour
and recursive_solve
.
The subroutine get_tour
removes the most promising path from
the priority queue. If the path contains more than a threshold number of
cities, get_tour
returns this path. Otherwise, it extends the
path by one node, puts the promising paths generated by the
extension back on the priority queue, and calls itself recursively.
The subroutine get_tour
returns either when the most promising
path is longer than a threshold, or when the priority queue becomes
empty. The procedure recursive_solve
takes the path returned by
get_tour
, and tries all permutations of the remaining nodes
recursively. It updates the shortest tour if a complete tour is found that is shorter
than the current best tour.
In the TreadMarks version, all the major data structures are shared.
The subroutine get_tour
is guarded by a lock to guarantee
exclusive access to the tour pool, the priority queue, and the tour
stack. Updates to the shortest path are also protected by a lock.
The PVM version uses a master-slave arrangement.
With n processors, there are n
slave processes and 1 master process.
In other words, one processor runs both the master and one slave process,
while the remaining processors run only a slave process.
The master keeps all the major data structures in its private memory.
It executes get_tour
and keeps track of the optimal
solution.
The slaves execute recursive_solve
, and send messages
to the master either to request solvable tours, or to update the
shortest path.
We solved a 19 city problem, with a recursive_solve
threshold of
12. The speedups are shown in Figure 6.
The sequential program runs for 95 seconds. At 8 processors,
TreadMarks obtains a speedup of 6.21, which is 78%of the
speedup of 7.97 obtained by PVM.
At 8 processors, TreadMarks sends 11 times more messages and 150 times
more data than PVM.
The performance gap comes from the difference in programming styles.
In the PVM version of TSP, only the tours
directly solvable by recursive_solve
and the minimum
tour are exchanged between the slaves and the master.
These message exchanges take only 2 messages.
In contrast, in TreadMarks, all the major data
structures migrate among the processors.
In get_tour
, it takes at least 3 page faults to
obtain the tour pool, priority queue, and tour stack.
As for the amount of data, because of diff accumulation, on
average, a processor gets (n-1) diffs on each page fault, where n
is the number of processors in the system.
Furthermore, there is some contention for the lock protecting get_tour
.
On average, at 8 processors, each process spends 1.1 out of 15
seconds waiting at lock acquires.