Techniques such as out-of-order issue and speculative execution aggressively exploit instruction level parallelism in modem superscalar processor architectures. The front end of such pipelined machines is concerned with providing a stream of schedulable instructions at a bandwidth that meets or exceeds the rate of instructions being issued and executed. As superscalar machines become increasingly wide, it is inevitable that the large set of instructions to be fetched every cycle will span multiple noncontiguous basic blocks. The mechanism to fetch, align, and pass this set of instructions down the pipeline must do so as efficiendy as possible, occupying a minimal number of pipeline cycles. The concept of trace cache has emerged as the most promising technique to meet this high-bandwidth, low-latency fetch requirement. This thesis presents the design, simulation and analysis of a microarchitecture simulator extension that incorporates trace cache. A new fill unit scheme, the Sliding Window Fill Mechanism is proposed. This method exploits trace continuity and identifies probable start regions to improve trace cache hit rate. A 7% hit rate increase was observed over the Rotenberg fill mechanism. Combined with branch promotion, trace cache hit rates experienced a 19% average increase along with a 17% average rise in fetch bandwidth.

Library of Congress Subject Headings

Cache memory; Microprocessors--Design and construction; Computer architecture

Publication Date


Document Type


Department, Program, or Center

Computer Engineering (KGCOE)


Czernikowski, Roy

Advisor/Committee Member

Hsu, Ken


Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: TK7895.M4 M857 2002


RIT – Main Campus