Next: 9 Conclusions and Future Up: Using Parallel Program Characteristics Previous: 7 Considering the Knee

8 Combining Work and Efficiency

We now propose a new scheduling strategy that is designed to take into account both the efficiency of a job and its work. This algorithm has evolved from our observations that there are two main characteristics that contribute to the response time of a parallel application, the remaining work, , and the efficiency with which that work can be executed, , and that policies based on using either one of these characteristics in isolation are unable to significantly reduce mean response times over the wide range of potential workloads considered. While it may be desirable to give less processing power to jobs that execute large amounts of work, in order to prevent them from delaying jobs that execute smaller amounts of work, it is also necessary to consider the efficiency with which each job can utilize the allocated processors.

We use a two-phased algorithm to determine the number of processors, , to allocate to each job . Our strategy is to maintain a sorted list of jobs in the system (sorted by increasing ). This ordered list is used to prevent short jobs from being stuck behind large jobs. Jobs are considered for activation in this order and are assigned processors, where is determined by considering the efficiency with which they execute. During this first phase jobs are assigned processors until either all P processors have been assigned or until all jobs have been activated. If all jobs have been activated and all P processors have not been assigned a second phase is performed which assigns the remaining processors to activated jobs. A number of different assignment policies are possible for the first and second phases. In this paper we focus our investigation on policies for the first phase, since we believe that the number of unassigned processors left for the second phase will be relatively small (especially under workloads with relatively high loads -- assuming the average efficiency is not unreasonably low). The policy used for the second phase divides the remaining processors equally among the active jobs (since this is a safe approach).

An obvious strategy to use in the first phase is to assign processors to each job according to the knee of the execution time - efficiency profile ( ). Unfortunately, this approach has an important drawback: is greater than P for jobs with relatively low efficiency. In fact, for P = 100 processors, for all . This may be due to the execution rate function used. However, no matter what execution rate function is used there are likely to be some jobs whose communication and synchronization overhead is relatively small and therefore execute with high efficiency. In these cases their knee will be greater than the number of processors in the system. For such jobs the knee must be mapped into a number of processors . In future work we plan to consider alternative execution rate functions and to investigate the importance of appropriate mappings. One way to avoid this problem is to simply assign processors, rather than . However, this approach will allocate all P processors to any job for which . As expected, experiments showed that this approach did not perform well for workloads with high average efficiency, . For example, using this policy to execute a workload with , = 50-99%, and a load of 90% yields a mean response time that is 27% higher than obtained using equipartition. Therefore, we require a function that more evenly maps the range, , to the domain, .

We start by considering an alternate, simple and obvious mapping, . The advantage of this policy is that only jobs with high efficiency will be allocated a large number of processors. Also, as the average efficiency of the workload increases this algorithm asymptotically behaves optimally (i.e., if all jobs execute with perfect efficiency they would be executed in a LRWF fashion). We call the general form of our algorithm (Work and Efficiency). The form that uses we call , and the form that uses we call .

We have conducted a series of experiments to compare with equipartition. The results of these experiments are shown in Figure 6. These experiments were conducted using arrival rates that produced loads of approximately 30% and 90% respectively. An arrival rate was determined using and was then used for equipartition. Each bar shown in these graphs represents the mean response time of , normalized with respect to the mean response time of equipartition (represented by the dotted line, EQUI). The workload for each experiment is specified below the x-axis by displaying the parameters and the range .

As expected under low loads (30%), the reductions in mean response time obtained by using instead of equipartition are relatively small, but statistically significant. Again, this is because of the relatively low degree of multiprogramming, which reduces the allocation alternatives. Under this load equipartition is relatively robust since the maximum observed improvement is about 15%. As seen by the results obtained when a load of 90% is used, if the degree of multiprogramming is increased, the size of the reductions in mean response times increases.

Figure 6: Mean response time of for different workloads, normalized with respect to EQUI

In the next series of experiments (shown in Figure 7) we consider additional ranges of efficiency values by also considering 75-99% and 99-99%. These values are considered to emphasize that as the average efficiency of the jobs increases, the behaviour of our policy approaches that of an optimal algorithm. This can be seen by comparing the improvements obtained in these graphs with those obtained in Section 4. These ranges demonstrate the substantial reductions in mean response time that are possible when the average efficiency of the jobs is high and illustrate how the size of these reductions grows with increases in average efficiency.

Additionally we study the affect that the variation in job efficiencies, , might have on our results. This is done by fixing a mean efficiency and examining different efficiency ranges. We choose a fixed mean of 50%, since it offers the greatest potential variation for our distribution and compare the ranges 1-99%, 30-70%, and 50-50% (which produce of 0.57, 0.23, and 0.0 respectively). The results in Figure 7 show that as the variation in efficiencies decreases, the mean response time obtained using increases substantially when compared with equipartition.

Figure 7: Mean response time of for different workloads, normalized with respect to EQUI

Next we conducted a series of experiments in order to determine if this increase is because the general method, , is unable to handle workloads with low variation in efficiency or because the number of processors allocated during the first phase could be improved. We started by experimentally testing whether or not a value of could be found that improved the mean response time for the range 50-50% (under a load of 90%). We found that by using the value , we were able to produce mean response times that were lower than those obtained using equipartition. Further investigation found that for low and high efficiency ranges (e.g., 1-20% and 80-99%) allocating processors produced response times lower than for equipartition. Using these results we developed a simple assignment policy that we hoped would uniformly produce mean response times equal to or lower than equipartition.

This new policy (or mapping) was derived from our observations that performed reasonably well for and for and that performed well for the workload with the efficiency range 50-50%. Therefore, we (naively) chose a piecewise linear function as the basis for a new variation of our algorithm. This variation is called and is described below. The different forms of the algorithm are shown in Figure 8. Algorithmically this function is described (for P = 100) as follows:

displaymath334

As shown in Figure 8, when , there is a large difference between the number of processors allocated using (i.e., using the knee) and using either or . This seems to indicate that under these job and workload models, processor allocations in a dynamic scheduling environment might be better made at a point much lower than the knee.

Figure 8: Different allocation policies considered for

The results of using this policy, , for a wide variety of workloads are shown in Figure 9. These experiments show that this policy does produce mean response times that are at least as good as, and are in many cases substantially better than, those obtained with the equipartition policy. This policy therefore performs better across the variety of workloads considered than the , the and the equipartition policies. This suggests that in a dynamic scheduling environment, it may not be desirable to allocate processors according to the knee of the execution time - efficiency profile (especially if a wide variety of workloads are possible). (We plan to further investigate other factors, such as the load, the execution rate function, and the distribution used to generate the efficiency for each job.)

In the case when a load of 90% is used, the largest improvement seen is approximately 70%. A general trend that can be seen by examining the ranges 1-50%, 50-99%, 75-99% and 99-99% in this graph, is that the size of the reductions obtained by using increases as the average efficiency increases. It also seems that the size of the reduction is correlated with the coefficient of variation of work, , especially when the load and average efficiency of the jobs is relatively high. We also point out again that the reductions in mean response time are larger for the workload whose range of efficiency is 1-99% than for the range 30-70% even though the mean efficiency has not changed.

Figure 9: Mean response time of for different workloads, normalized with respect to EQUI

Although the algorithm can very likely be improved, since it was chosen naively, these experiments demonstrate that the strategy is an attractive approach to using job characteristics to make processor allocation decisions in a dynamic scheduling environment. If the number of processors to assign during the first phase of the algorithm is chosen properly, this method can produce mean response times that are significantly lower than those obtained by using equipartition across a wide variety of workloads. As well, the improvements are greater than those that can be obtained using only characteristics of work, , or efficiency, , in isolation. We believe that these results demonstrate the importance of using job characteristics correctly as well as the need for allocation policies that effectively use characteristics of both work and efficiency.

Next: 9 Conclusions and Future Up: Using Parallel Program Characteristics Previous: 7 Considering the Knee

Tim Brecht
Thu Jul 24 14:20:29 EDT 1997