Pipelining the functional units and memory interface of processors can result in shorter cycle times and dramatic increases in performance, but only if the pipeline delays can be hidden by other useful operations. The portion of pipeline delays which is not hidden results in an extension of the total execution time, either implemented by hardware interlocks or by compile-time insertion of NOPs (Null Operations). By rearranging instructions, it is possible to minimize the total pipelined execution time, but the problem of finding this optimal code schedule is well known to be NP-complete. In this thesis, we describe a code scheduler for multiple pipeline processors where each pipeline may have a different latency and enqueue time. Previous approaches simplify the search for a good schedule by arbitrarily imposing constraints which sacrifice optimality; the technique given in this paper uses a new set of pruning criteria which preserves optimality. Although, in the interest of reducing compile time, the new technique permits the search to be truncated, this truncation only rarely (in less than 2% of the cases examined) sacrifices optimality

Date of this Version