Pipelining the functional units and memory interface of processors can result in shorter cycle times and dramatic increases in performance, but only if the pipeline latency can be hidden by other useful operations. The portion of pipeline latency which is not hidden results in an extension of the total execution time, either implemented by hardware interlocks or by compile-time insertion of NOPs (Null Operations). By rearranging instructions, it is possible to minimize the total pipelined execution time, but the problem of finding this optimal code schedule is well known to be NP-complete. In this paper, we describe a code scheduler for multiple pipeline processors where each pipeline may have a different latency and enqueue time. Previous approaches simplify the search for a good schedule by arbitrarily imposing constraints which sacrifice optimality; the technique given in this paper uses a new set of pruning criteria which preserves optimality. Although, in the interest of reducing compile time, the new technique permits the search to be truncated, this truncation only rarely (in less than 2% of the cases examined) sacrifices optimality.


Pipelines, Code/instruction Scheduling, Optimizing Compilers, Pipeline Latency, Pipeline Enqueue Time

Date of this Version