In this paper, we consider the problem of scheduling a set of instructions on a single processor with multiple pipelined functional units. In a superscalar processor, the hardware can issue multiple instructions every cycle, providing a fine-grained parallelism for achieving order-ofmagnitude speed-ups. It is well known that the problem of scheduling a pipelined processor with uniform latencies, which is a subclass of the problem we consider here, belongs to the class of NP-Complete problems. We present an efficient lower bound algorithm that coniputes a tight lower bound on the length of an optimal schedule, and a new heuristic scheduling algorithm to provide a near optimal solution. The analysis of our lower bound computation reveals that if a task matches the hardware or the type of instructions is uniformly distributed, then issuing five instructions per cycle can achieve a speed-up; however, if the task is a bad match with the hardware, then issuing more than three instructions per cycle does not provide any speed-up. The simulation data shows that our lower bound is often very close to the solutioll obtained by our heuristic algorithm.
superscalar, pipeline scheduling, VLIW, lower bound
Date of this Version