In this paper, we consider the problem of scheduling a set of instructions on a single processor with multiple pipelined functional units. In a superscalar processor, the hardware can issue multiple instructions every cycle, providing a fine-grained parallelism for achieving order-ofmagnitude speed-ups. It is well known that the problem of scheduling a pipelined processor with uniform latencies, which is a subclass of the problem we consider here, belongs to the class of NP-Complete problems. We present an efficient lower bound algorithm that coniputes a tight lower bound on the length of an optimal schedule, and a new heuristic scheduling algorithm to provide a near optimal solution. The analysis of our lower bound computation reveals that if a task matches the hardware or the type of instructions is uniformly distributed, then issuing five instructions per cycle can achieve a speed-up; however, if the task is a bad match with the hardware, then issuing more than three instructions per cycle does not provide any speed-up. The simulation data shows that our lower bound is often very close to the solutioll obtained by our heuristic algorithm.


superscalar, pipeline scheduling, VLIW, lower bound

Date of this Version

September 1994