上次讲了1.2及1.3.1,主要讲了SPARC Core 的一些特性和组成部分(IFU、EXU、LSU、TLU、SPU、MU、FFU),这次我们来看IFU 和EXU 的综述。
1.3.1.1 Instruction Fetch Unit #取指令单元#
The thread selection policy is as follows – a switch between the available threads
every cycle giving priority to the least recently executed thread. The threads become
unavailable due to the long latency operations like loads, branch, MUL, and DIV, as
well as to the pipeline stalls like cache misses, traps, and resource conflicts. The
loads are speculated as cache hits, and the thread is switched-in with lower priority.
#线程选择策略,每个周期在可用的线程间切换,优先执行最近执行最少的线程。#
Instruction cache complex has a 16-Kbyte data, 4-way, 32-byte line size with a single
ported instruction tag. It also has dual ported (1R/1W) valid bit array to hold cache
line state of valid/invalid. Invalidates access the V-bit array, not the instruction tag.
A pseudo-random replacement algorithm is used to replace the cache line.
#指令cache有有效标志位,指示出cache line 的有效或无效。#
There is a fully associative instruction TLB with 64 entries. The buffer supports the
following page sizes: 8 Kbytes, 64 Kbytes, 4 Mbytes, and 256 Mbytes. The TLB uses
a pseudo least recently used (LRU) algorithm for replacement. Multiple hits in the
TLB are prevented by doing an autodemap on a fill.
#TLB 支持不同大小的的页,这里需要看一下《超标量处理器设计》的第三章有关TLB部分#
Two instructions are fetched each cycle, though only one instruction is issued per
clock, which reduces the instruction cache activity and allows for an opportunistic
line fill. There is only one outstanding missbuzhong per thread, and only four per core.
Duplicate misses do not issue requests to the L2-cache.
#每个周期取两个指令#
The integer register file (IRF) of the SPARC core has 5 Kbytes with 3 read/2 write/1
transport ports. There are 640 64-bit registers with error correction code (ECC). Only
32 registers from the current window are visible to the thread. Window changing in
background occurs under the thread switch. Other threads continue to access the IRF
(the IRF provides a single-cycle read/write access).
#这里大概了解一下取指、cache、miss\hit的概念和线程选择,可以参考《超标量处理器设计》的第2、3章内容,后面结合代码再讲解#
1.3.1.2 Execution Unit
The execution unit (EXU) has a single arithmetic logic unit (ALU) and shifter. The
ALU is reused for branch address and virtual address calculation. The integer
multiplier has a 5 clock latency, and a throughput of half-per-cycle for area saving.
One integer multiplication is allowed outstanding per core. The integer multiplier is
shared between the core pipe (EXU) and the modular arithmetic (SPU) unit on a
round-robin basis. There is a simple non-restoring divider, which allows for one
divide outstanding per SPARC core. Thread issuing a MUL/DIV will be rolled back
and switched out if another thread is occupying the MUL/DIV units.
#EXU执行单元,有一个算术逻辑单元和一个移位器,整数乘法器有5个时钟延迟,有一个简单的non-restoring除法器。#
总结,这里只是概述,后面会有更详细的描述。要想跟进下去,《计算机体系结构-量化研究方法》或《超标量处理器设计》的重要章节必须看一下,建议看《超标量处理器设计》,便于以后内容的理解,主要是RISC处理器的 Fetch、Thread Selection、Decode、Execute、Memory、Write Back,还有相关的cache 、页表、分支预测......