## IBM

## **PowerPC® Core - Combined VMX and FPU Unit**



- Extensive Clock Gating for power efficiency
- Instruction Sequencing
  - 2 simultaneous threads for both VMX and FPU
  - Delayed execution issue queue reduces VMX/FPU load latency to 2 cycles
  - Separate Load target buffer for VMX and FPU loads
    - LSU may execute out-of-order with respect to VMX/FPU
  - Decoupled FPU stores
    - Allows Stores to issue before store data available
    - Data cache 16 entry store queue
- Pipeline (11FO4)
  - 10 cycle Scalar DP FPU latency
  - 2 cycle Load latency
  - 4 cycle VMX simple/permute latency
  - 14 cycle VMX dot product latency
- New VMX128 Vector ISA
  - Focus on accelerating 3D graphics and physics
  - Extends VMX from 32 to 128 Vector Registers
  - Floating-point dot-product instructions
  - Permute-class instructions for data management
  - Direct3D Pack and unpack instructions
  - Misaligned storage access instructions
  - Maintains binary compatibility w/subset of VMX