1/21/2007

SH4 Notes - 03

Memory Management Unit (MMU)

Overview
  • 29-bit external memory space by providing 8-bit address space identifiers
  • 32-bit logical (virtual) address space
  • virtual address -> MMU -> physical address
  • 4 instruction TLB (ITLB) entries
  • 64 unified TLB (UTLB) entries
  • UTLB copies are stored in the ITLB by hardware
  • SH-4 there is support for 4 page sizes: 1-kbyte, 4-kbyte, 64-kbyte and 1-Mbyte.
Register descriptions
  • 6 MMU-related registers.
  • Page table entry high register (PTEH) : 32 bits
    - 0xFF00 0000 (P4)
    - 0x1F00 0000 (Area 7)
    -
  • PTEL (Page table entry low register) : 32 bits
  • TTB (Translation table base register) : 32 bits
  • TEA (Translation table address register) : 32 bits
  • MMUCR (MMU control register) : 32 bits

1/14/2007

Intel VT Notes - 01

  • VMX (Virtual Machine Extensions)
  • VMM (Virtual Machine Monitor)
  • Guest Software

[Basic VT Architecture]
Guest Software(VM)
---------
VMM(Virtual Machine Monitor)
---------
Hardware

  • VMX Operation
    - VMX root operation
    - VMX non-root operation
  • VMX Transition (VMX root operation <-> VMX non-root operation)
    - VM entries (VMX root operation ->VMX non-root operation)
    - VM exits (VMX non-root operation -> VMX root operation)
  • No software-visible bit whose setting indicates whether a logical processor is
    in VMX non-root operation.
  • Guest software can run at the privilege level for which it was originally designed.

1/13/2007

關於 Virtualization


[Type 1 Hypervisor]

//Without Host OS

Guest OS
----------
Hypervisor
----------
Hardware

[Type 2 Hypervisor]

//With Host OS

Guest OS
----------
Hypervisor
----------
Host OS
----------
Hardware


想法:

目前 PS3 Linux 應該就是 Type 1 Hypervisor 了。
但如果每一顆 SPE 都跑一個 VM? 不知道結果會怎麼樣。嘿嘿~

-END-

xptcall for SH4

今天成功完成了 xptcall 的 Invoke 部分,
Stub 的部分則要等下星期一了。
若都沒問題的話,就可以 contribute 回 mozilla 了。

先把重點記下來,免得以後忘了(我真的太健忘了)

1. SH4 Calling Convention
- R0 用來傳送 return value
- R1...R3 任意使用
- R4...R7 用來傳送整數及 pointer 參數,但 XPCOM 的 Method 都會需要將 R4 設為 that(this)。
- FR4...FR11 用來傳送float及 double 參數。
- 塞不進 register 時,再放入 stack 中。且 64 bits 參數不會有一半在 register 中,一半在 stack 中的情形出現。

2. jsr 需要 align 2

3. 需要 GCC 3.1 以上

4. R14 被當作 base pointer(call frame pointer) 使用,R15 則是 stack pointer。

5. JavaScript Component -> xptcinvoke -> XPCOM Component

6. XPCOM Component -> xptcstubs -> JavaScript Component

1/05/2007

SH4 Note - 02

Programming model

  • 2 processor modes : user mode and privileged mode
  • 4 kinds of registers:
    - general registers (R0 - R15, where R0 - R7 are banked registers)
    - system registers : access to these registers does not depend on the processor mode.
    - control registers
    - floating-point registers
    (FR0–FR15 and XF0–XF15 = FPR0_BANK0–FPR15_BANK0 and FPR0_BANK1–FPR15_BANK1).
General registers
  • R0_BANK0–R7_BANK0:
    - In user mode (SR.MD = 0), R0–R7 are always assigned to R0_BANK0–R7_BANK0.
    - In privileged mode (SR.MD = 1), R0–R7 are assigned to R0_BANK0–R7_BANK0 only when SR.RB = 0.
    Notes: SR (=Status Register). MD(=Mode)
  • R0_BANK1–R7_BANK1:
    - In user mode, R0_BANK1–R7_BANK1 cannot be accessed.
    - In privileged mode, R0–R7 are assigned to R0_BANK1–R7_BANK1 only when
    SR.RB = 1.
    Notes: RB (=General Register Bank specifier in privileged mode)
  • Programming Note:
    As the user’s R0–R7 are assigned to R0_BANK0–R7_BANK0, and after an exception or interrupt R0–R7 are assigned to R0_BANK1–R7_BANK1, it is not necessary for the interrupt handler to save and restore the user’s R0–R7
    (R0_BANK0–R7_BANK0).
System registers
  • MACH (32bit) : Multiply-and-accumulate register high
  • MACL (32bit) : Multiply-and-accumulate register low
  • PR (32bit) : Procedure register
    The return address is stored when a subroutine call using a BSR, BSRF or JSR instruction. PR is referenced by the subroutine return instruction (RTS).
  • PC (32bit) : Program Counter
  • FPSCR (32bit) : Floating-point status/control register
  • FPUL (32bit) : Floating-point communication register
    Data transfer between FPU registers and CPU registers is carried
    out via the FPUL register. The FPUL register is a system register, and is accessed from the CPU side by means of LDS and STS instructions. For example, to convert the integer stored in general register R1 to a single-precision floating-point number,
    the processing flow is as follows:
    R1 → (LDS instruction) → FPUL → (single-precision FLOAT instruction) → FR1
Control registers
  • SR (32bit) : Status Register
    - SR.T (bit[0]) : True/False condition or carry/borrow bit.
    - SR.S (bit[1]) : Specifies a saturation operation for a MAC instruction.
    - SR.IMASK (bit[4:7]) : Interrupt mask level.
    - SR.Q (bit[8]) : State for divide step.
    - SR.M (bit[9]) : State for divide step.
    - SR.FD (bit[15]) : FPU disable bit (cleared to 0 by a reset).
    - SR.BL (bit[28]) : Exception/interrupt block bit
    - SR.RB (bit[29]) : General register bank specifier in privileged mode
    - SR.MD (bit[30]) : Processor Mode (MD=0 : User Mode, MD=1 : Privileged mode
    - SR.RES (bit[[2:3],[10:14][16:27][31]) : Reserved
  • SSR (32bit) : Saved Status Register
    The current contents of SR are saved to SSR in the event of an exception or interrupt.
  • SPC (32bit) : Saved Program Counter
    The address of an instruction at which an interrupt or exception occurs is saved to SPC.
  • GBR (32bit) : Global Base Register
    GBR is referenced as the base address in a GBR-referencing MOV instruction.
  • VBR (32bit) : Vector Base Register
    VBR is referenced as the branch destination base address in the event of an exception or interrupt.
  • SGR (32bit) : Saved General Register
    The contents of R15 are saved to SGR in the event of an exception or interrupt.
    Notes: R15 被當做 Stack Pointer 來使用。(R14 則為 Base Pointer)
  • DBR (32bit) : Debug Base Register
Floating-point registers
  • Floating-point registers, FPRn_BANKi (32 registers)
  • Single-precision floating-point registers, FRi (16 registers)
    - FPSCR.FR = 0 : FR0–FR15 are assigned to FPR0_BANK0–FPR15_BANK0.
    - FPSCR.FR = 1 : FR0–FR15 are assigned to FPR0_BANK1–FPR15_BANK1.
  • Double-precision floating-point registers or single-precision floating-point
    register pairs, DRi (8 registers):
    DR0 = {FR0, FR1}, DR2 = {FR2, FR3}, DR4 = {FR4, FR5}, DR6 = {FR6, FR7},
    DR8 = {FR8, FR9}, DR10 = {FR10, FR11}, DR12 = {FR12, FR13}, DR14 = {FR14, FR15}
  • Single-precision floating-point vector registers, FVi (4 registers): An FV register
    comprises four FR registers:
    FV0 = {FR0, FR1, FR2, FR3}, FV4 = {FR4, FR5, FR6, FR7},
    FV8 = {FR8, FR9, FR10, FR11}, FV12 = {FR12, FR13, FR14, FR15}
  • Single-precision floating-point extended registers, XFi (16 registers)
    - FPSCR.FR = 0 : XF0-XF15 are assigned to FPR0_BANK1-FPR15_BANK1.
    - FPSCR.FR = 1 : XF0-XF15 are assigned to FPR0_BANK0-FPR15_BANK0.
  • Single-precision floating-point extended register pairs, XDi (8 registers): An XD
    register comprises two XF registers.
    XD0 = {XF0, XF1}, XD2 = {XF2, XF3}, XD4 = {XF4, XF5}, XD6 = {XF6, XF7},
    XD8 = {XF8, XF9}, XD10 = {XF10, XF11}, XD12 = {XF12, XF13}, XD14 = {XF14, XF15}
  • Single-precision floating-point extended register matrix, XMTRX: XMTRX
    comprises all 16 XF registers.

    Notes: 太酷了!日本人的想法真的蠻好玩的。
Memory-mapped registers
  • The control registers are double-mapped to the following two memory areas.
    All registers have two addresses.
    - 0x1F00 0000-0x1FFF FFFF
    - 0xFF00 0000-0xFFFF FFFF
  • 0x1F00 0000–0x1FFF FFFF
    - This area must be accessed in address translation mode using the TLB.
  • 0xFF00 0000–0xFFFF FFFF
    - Access to area 0xFF00 0000-0xFFFF FFFF in user mode will cause an address error.
    - Memory-mapped registers can be referenced in user mode by means of access that involves address translation.
Data format in registers
  • Register operands are always longwords (32 bits). When a memory operand is only a
    byte (8 bits) or a word (16 bits), it is sign-extended into a longword when loaded into
    a register.
Data formats in memory
  • Memory can be accessed in 8-bit byte, 16-bit word, or 32-bit longword form.
  • A word operand must be accessed starting from a word boundary(even address of a
    2-byte unit: address 2n)
  • A longword operand starting from a longword boundary (even address of a 4-byte unit: address 4n).
Processor states
  • Reset state:
    - power-on reset will cause all system components to be reset,
    - manual reset may, for example, avoid resetting DRAM controllers so that
    memory contents are preserved.
  • Exception-handling state:
    - In the case of a reset, the CPU branches to address 0xA000 0000 and starts
    executing the user-coded exception handling program.
    - In the case of a general exception or interrupt, the program counter (PC) contents are saved in the saved program counter (SPC), the status register (SR) contents are saved in the saved status register (SSR), and the R15 contents are saved in saved general register 15 (SGR). The CPU branches to the start address of the user-coded exception service routine, found from the sum of the contents of the vector base address and the vector offset.
  • Program execution state:
    CPU executes program instructions in sequence.
  • Power-down state:
    The power-down state is entered by executing a SLEEP instruction.

SH4 Note - 01

Overview

  • 32-bit RISC microprocessor
  • 16-bit fixed-length instruction set
  • 1 instruction cache
  • 1 operand cache (copy-back/write-through, 4-entry full-associative instruction TLB)
  • MMU (memory management unit) with 64-entry full-associative shared TLB.

CPU
  • 32-bit internal data bus
  • 32 general registers(32-bit)
  • 8 shadow registers(32-bit)
  • 7 control registers(32-bit)
  • 4 system registers(32-bit)
  • RISC
  • Load-store architecture
  • Delayed branch instructions
  • Conditional execution
  • Superscalar architecture: Parallel execution of two instructions
  • C-based instruction set(providing simultaneous execution of two instructions)
    including FPU
  • Instruction execution time: Maximum 2 instructions/cycle
  • Virtual address space: 4 Gbytes (448-Mbyte external memory space)
  • Space identifier ASIDs: 8 bits, 256 virtual address spaces
  • On-chip multiplier
  • Five-stage pipeline
FPU
  • On-chip floating-point coprocessor
  • Supports single-precision (32 bits) and double-precision (64 bits)
  • IEEE754-compliant
  • Two rounding modes: Round to Nearest and Round to Zero
  • Floating-point registers: 32 bits x 16 words x 2 banks
    (single-precision x 16 words or double-precision x 8 words) x 2 banks
  • 32-bit CPU-FPU floating-point communication register (FPUL)
  • Supports FMAC (multiply-and-accumulate), FDIV (divide) and FSQRT (square root) instructions.
  • Supports FLDI0/FLDI1 (load constant 0/1) instructions
  • Instruction execution times:
    - Latency (FMAC/FADD/FSUB/FMUL): 3 cycles (single-precision), 8 cycles (double-precision)
    - Pitch (FMAC/FADD/FSUB/FMUL): 1 cycle (single-precision), 6 cycles
    (double-precision)
    - Note: FMAC is supported for single-precision only.
  • 3-D graphics instructions (single-precision only):
    - 4-dimensional vector conversion and matrix operations (FTRV): 4 cycles
    (pitch), 7 cycles (latency)
    - 4-dimensional vector (FIPR) inner product: 1 cycle (pitch), 4 cycles (latency)
  • Five-stage pipeline
Power-down
  • Sleep mode
  • Standby mode
  • Module standby function
MMU
  • 4-Gbyte address space, 256 address space identifiers (8-bit ASIDs)
  • Single virtual mode and multiple virtual memory mode
  • Supports multiple page sizes: 1 kbyte, 4 kbytes, 64 kbytes, 1 Mbyte
  • 4-entry fully-associative TLB for instructions
  • 64-entry fully-associative TLB for instructions and operands
  • Supports software-controlled replacement and random-counter replacement algorithm
  • TLB contents can be accessed directly by address mapping

12/07/2006

SIXAXIS under Windows

PS3 六向手把可以在 Windows 下使用了。

http://forums.ps2dev.org/viewtopic.php?t=7099

PS3 入手

今天終於入手 PS3 了,想了這麼久,
在光華商場裡的印地安小賣店買的,
真是高興啊~

雖然要綁2片遊戲和另一隻手把,總共 23230 元。
但總算滿足了一下這半年多的心勞。

遊戲 1 - 鋼彈
遊戲 2 - RR7
主機 60 G 版本,聲音真的很小,真是美啊!MSD Duo 也不會卡住。
目前正在安裝 Fedora Core 6 PPC 中。

看來唯一的缺憾就是目前只是用 PS3 輸出到我的電視卡中,誰叫我沒
電視咧!用 480i 輸出,真是太浪費啦!不過好處是一個螢幕就搞定,
也許將來買張有支援 1080P 的電視卡也可以”暫時”取代那超貴的 LCD-TV。

SONY 對 PS3 Linux 的支援可真不小。
CELL-Linux-CL_20061110-ADDON 這張 CD 裡有比較完整的資訊,目前
看來 Kernel 都是架在 Hypervisor 之上,難怪我安裝 FC6 這麼慢。

12/05/2006

Plugin 的 Open Source

不只是說怎麼寫 Plugin ,而是怎麼替自己的程式建立一個能夠 Load Plugin 的介面。
參考資料
http://www.linux-ha.org/_cache/TechnicalPapers__pils.pdf

CVS Code:
http://cvs.linux-ha.org/viewcvs//viewcvs.cgi/linux-ha/lib/pils/

FlashPlayer 之加速 2

1. OnTimer() -> UpdateScreen() -> DrawScreen()
2. SYSFONT 會減慢速度,因為每次 UpdateRect() 都會 TestDeviceFont()。
3. DISABLE_FOCUS_RECT 在 UpdateRect() 中,若取消可加速,但應該不多,只是少個判斷式。
4. 關於 Mouse Cursor 的相關函數都可以取消以加快速度,目前反正沒 Mouse。

另外應呼叫 BlinkCursor() 來更新 Cursor。

FlashPlayer - raster 對速度影響

在做一些 Flash 動畫效果時發現了一個有趣的現象,
即使兩個變動的區塊 (Rect) 大小相同,但寬形的區塊
會比長形的區塊要快,我想是因為 FlashPlayer 是以 raster
方式來做運算的,較多的 raster (長形) 處理會較慢。

這點倒是可以加入最佳化的考量中。

(當然,在快速的機器中是很難發現這個結果的。)

另外也想到了一個重點,也許重新對 data 處理 alignment 可
以加速單一指令運算的速度。目前尚未看到 FlashPlayer 程式
中有對 alignment 處理。但對支援 SIMD 的機器倒是有做 Assembly
Level 的最佳化。
關於 Glibc malloc 的 alignment 可以參考這。
http://www.delorie.com/gnu/docs/glibc/libc_31.html

12/02/2006

Track+ : 專案管理系統

Track and Manage Issues

  • Easy and pleasant to use
  • Clean and well organized user interface
  • Fully web based, no client installation
  • Highly configurable
  • Email reminders for due tasks
  • Overview diagrams
  • Budgets and expenses (new!)
  • Nice template based PDF reports
  • Easy to setup and administer
  • Powerful access control
  • Manages hundreds of projects
  • More features...

http://www.trackplus.com/

OProfile - Open Source Profiler

OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

It consists of a kernel driver and a daemon for collecting sample data, and several post-profiling tools for turning data into information.

OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications.


http://oprofile.sourceforge.net/

IDEA - 加速 FlashPlayer 的一些方式

IDEA - 加速 FlashPlayer 的一些方式

  1. IXP425 也有 CP0 具有 SIMD 能力,應該可以用來做 Assembly Level 的加速。
    (RGBI -> RGBIL)
  2. 應該加速的對象:
    - CompositeRGB()
    - CompositeRGBSolid()
    - DrawSolidSlab32()
    - DrawRGBSlab32A()
    - CompositeSolidSlab()
    - CompositeGradientSlab()
    - CompositeBitmapSlab()
    - GetBackground32()
    - GetBackgroundWhite()
    - Blt32toI()
    - Blt32to32()
  3. 也許應該 disable SMOOTHBITS
  4. 使用 -O2 來 Compile
  5. Intel Xscale 微架構代碼優化建議
  6. Data Access Performance Optimization on the Intel® 80321 I/O Processor
  7. Intel XScale IOP Linux
  8. GCC Assembler Instructions with C Expression Operands
  9. GCC-Inline-Assembly-HOWTO
  10. Intel® XScale™ Microarchitecture Assembly Language Quick Reference Card (pdf)

CBEA Note - 07

7. MFC Commands

  1. MFC commands can either be issued by code running on the SPU, or by code running on another processor or device, such as the PPE.
  2. SPU executes a series of channel instructions to issue an MFC command.
  3. other processors or devices performs a series of memory-mapped I/O (MMIO) transfers to issue an MFC command to an SPE.
  4. The commands issued are queued to one of these command queues of the MFC:
    • MFC proxy command queue for any MMIO-initiated commands
    • MFC SPU command queue for any channel-initiated commands
  5. MFC commands that transfer data are referred to as MFC DMA commands.
  6. Transfer data into an SPE (from main storage to local storage) are considered get commands.
  7. Transfer data out of an SPE (from local storage to main storage) are considered put
    commands.
7.1 Command Classes

  1. Commands can be categorized into three classes, as follows:
    • Defined
    • Illegal
    • Reserved

7.1.1 Defined Commands
  1. Defined commands fall into one of three categories:
    • Data transfer commands
    – Data moved from local storage and placed in main storage (put commands)
    – Data moved into local storage from main storage (get commands)
    • SL1 cache-management commands
    • Synchronization commands
7.2 Command Exceptions

  1. Unaligned DMAs are not supported by the CBEA.
  2. If an unaligned DMA operation is encountered, the MFC command queue processing is suspended and an DMA alignment interrupt is generated.
7.4 DMA List Elements

  1. Commands with a suffix of “l” use list elements located in the local storage pointed to by the DMA list local storage address (LA) parameter of a list command.
  2. The element contains the lower order word of the effective address (LEAL) and the transfer size (LTS).
  3. The DMA list commands use a list of effective addresses and transfer size pairs, or list elements, stored in local storage as the parameters for the DMA transfer. These parameters are used for SPU-initiated DMA list commands, which are not supported on the MFC proxy command queue.

12/01/2006

CBEA Note - 06

6. Memory Flow Controller



  1. In a CBEA-compliant processor, the MFC serves as an interface to the system and to other elements for an SPU.
  2. It provides the primary mechanism for data transfer, protection, and synchronization between main storage and the local storage arrays.
  3. there is logically an MFC for each SPU in a processor.
  4. MFC has two interfaces to the SPU, two interfaces to the Bus Interface Unit (BIU), and two interfaces to an optional SL1 cache.
  5. SPU channel interface allows the SPU to access MFC facilities and to issue MFC commands.
  6. SPU local storage interfaces is used by the MFC to access the local storage in the SPU.
  7. One interface to the BIU allows memory-mapped I/O (MMIO) access to the MFC facilities. This interface also allows other processors to issue MFC
    commands. Commands issued using MMIO are referred to as MFC proxy commands.
  8. The other interface to the BIU carries the real address.
  9. The interfaces to the SL1 cache are mainly for data transfers.
  10. One interface is used by the MFC for access to the address translation tables in main storage and the other interface of the SL1 cache is used for the transfer of data between main storage and local storage.
As shown in Figure 6-1, the following are the main units in a typical MFC:
• MMIO interface
• MFC registers
• DMA controller
  1. The MMIO interface maps the MFC facilities of the SPU into the real address space of the system. (allows access to the MFC facilities from any processor, or any device in the system.)
  2. MMIO interface can be configured to map the local storage of the SPU into the real address space. (map local storage to real address space)
    - allows direct access to the local storage from any processor or any device in the system
    - enabling local-store-to local-store transfers
    - ability for I/O devices to directly access the local storage domain
  3. Coherency is not maintained between SPU and MMIO accesses of the local storage domain.


6.1 MFC Facilities

  1. Most of the MFC facilities are contained in the MFC Registers unit.
  2. Some facilities are contained in the Direct Memory Access Controller (DMAC).
  3. The facilities within the MFC:
    User mode environment facilities include:
    • Mailbox Facility (see page 90)
    • SPU Signal Notification Facility (see page 94)
    • Proxy Tag-Group Completion Facility (see page 82)
    • MFC Multisource Synchronization Facility (see page 96)
    • SPU Control and Status Facilities (see page 86)
    • SPU Isolation Facility (see page 163)

    Privileged mode environment facilities include:
    • MFC Privileged Facilities (see page 197)
    – MFC State Register One (see page 197)
    – MFC Logical Partition ID Register (see page 199)
    – MFC Storage Description Register (see page 200)
    – MFC Data Address Register (see page 201)
    – MFC Data Storage Interrupt Status Register (see page 202)
    – MFC Address Compare Control Register (see page 203)
    – MFC Local Storage Address Compare Facility (see page 205)
    – MFC Command Error Register (see page 207)
    – MFC Data Storage Interrupt Pointer Register (see page 208)
    – MFC Control Register (see page 209)
    – MFC Atomic Flush Register (see page 212)
    – SPU Outbound Interrupt Mailbox Register (see page 213)

    • SPU Privileged Facilities (see page 215)
    – SPU Privileged Control Register (see page 215)
    – SPU Local Storage Limit Register (see page 217)
    – SPU Configuration Register (see page 221)
    • SPE Context Save and Restore (see page 223)
  4. The SPEs and PPE instruct the MFC to perform these DMA operations by queuing DMA command requests to the MFC through one of the command queues:
    • Commands issued by an SPE are queued to the MFC SPU command queue
    • Commands issued by a PPE are queued to the MFC proxy command queue
  5. The MFC uses a MMU to perform all MFC address translations and MFC access protection checks required for the DMA transfers.
  6. The MMU handles MFC transfers in much the same way that the PPE storage addressing facility handles load-and-store operations.

CBEA Note - 05

5. Synergistic Processor Unit

  1. The intent of the SPU is to fill a void between general-purpose processors and special-purpose hardware.
  2. SPU aims to achieve leadership performance on critical workloads for game, media, and broadband systems.
  3. The intent of the SPU and the CBEA is to provide a high degree of control to expert (real-time) programmers while maintaining ease of programming.
  4. The SPU implements a new instruction set architecture (ISA).
  5. The main characteristics of this architecture are:
    Load-and-store architecture with sequential semantics, using a set of 128 registers, each of which is 128 bits wide.
    Single-instruction, multiple-data (SIMD) capability
    – Sixteen 8-bit integers
    – Eight 16-bit integers
    – Four 32-bit integer or four single-precision floating-point values
    – Two double-precision floating point
    Load-and-store access to an associated local storage.
    Channel input/output for MFC control (used for external data access).
  6. The SPU has the following restrictions:
    No direct access to main storage (access to main storage using MFC facilities only)
    No distinction between user mode and privileged state
    No access to critical system control such as page-table entries (this restriction should be enforced by PPE privileged software).
    No synchronization facilities for shared local storage access
  7. The intent of the SPU is to enable applications that require a high computational unit density.

CBEA Note - 04

4. PowerPC Processor Element

  1. The CBEA includes a PowerPC processor, which, with the MFC is known as the PowerPC Processor Element (PPE).
  2. The PPE must be a 64-bit implementation, all effective addresses and registers, except some special-purpose and memory-mapped I/O (MMIO) registers are 64 bits long.
  3. All implementations have two modes of operation: 64-bit mode and 32-bit mode.
  4. All instructions are available in both modes.
  5. The CBEA does not permit a PPE implementation that provides only the equivalent of 32-bit mode.

4.1 PowerPC Architecture Book I and Book II Compatibility



The PPE provides binary compatibility for PowerPC applications, except as described in Section 4.1.2 Incompatibilities with PowerPC Architecture, Book I on page 39.

4.1.1 Optional Features in PowerPC Architecture, Book I (Required for CBEA)
  1. The following facilities and instructions are considered optional in the PowerPC Architecture, but are required for the PPE by the CBEA user mode environment.
    • Floating reciprocal estimate single A-form (fres)
    • Floating reciprocal square-root estimate A-form (frsqte)
    • Vector/SIMD multimedia extension
4.1.2 Incompatibilities with PowerPC Architecture, Book I

  1. Currently there are no incompatibilities with PowerPC Architecture, Book I.

4.1.3 Optional Features in PowerPC Architecture, Book II (Required for CBEA)

  1. The following facilities and instructions are considered optional in the PowerPC Architecture, but are required in the CBEA.

    • Data cache block touch X-form (dcbt)

    This is an optional version of dcbt that permits a program to provide a hint that a sequence of data cache blocks is likely to be needed soon.

4.1.4 Incompatibilities with PowerPC Architecture, Book II

  1. Currently there are no incompatibilities with PowerPC Architecture, Book II.

4.1.5 Extensions to the PowerPC Architecture

  1. For information on extensions in the CBEA to the PowerPC Architecture, see Appendix E .

11/29/2006

CBEA Note - 03

Chapter 3. Storage Models

The CBEA-compliant processor implements two concurrent storage models for an application program:
  1. virtual storage model of the PPE (also used by MFCs for DMA operations) . The PPE virtual storage model allows privileged software to provide different views of the real memory
    and I/O devices for the PPE and any MFC unit DMA transfers. It is possible for multiple virtual address spaces to exist.
  2. local storage model of the SPU. The SPU local storage model is restricted to applications running on SPUs and data transfers handled by the MFC.
3.1 Virtual Storage Model

  1. Allows applications to exist within a virtual address space larger than either the effective address space or the real address space.
  2. In a typical CBEA-compliant processor system, the effective address space of each program is a subset of a larger virtual address space managed by privileged software.
  3. The privileged software manages the real storage resources of the system by setting up the tables and other information used by the hardware address translation facility.
  4. Access to the virtual pages can be read/write, read only, or no access.


基本上記憶體管理模型還是用 segmentation 加上 paging,只是與 x86 用詞上不太一樣:

CBEA 架構下是:

segmentation paging
virtual address -----------> effective address -----> real address

x86 架構下則是:

segmentation paging
logical address -----------> linear address -----> physical address


3.2 SPU Local Storage Model

  1. Each SPU has its own dedicated area of local storage.
  2. The individual local storage areas can be aliased to a real address within the main storage
    domain and any PPE can access these areas by using the appropriate effective address.
3.2.1 Local Storage Access

  1. The CBEA allows the local storage of an SPU to have an alias in the real address space in the main storage domain.
  2. This allows other processors in the main storage domain to access local storage through appropriately mapped effective address space.
3.2.1.1 Mapping Requirements
  1. Privileged software should access the aliased pages of local storage in the main storage domain.
  2. If not accessed as caching inhibited, software must explicitly manage the coherency of local storage with other system caches.
    〔若不以 cache inhibited 方式來存取 local storage,就必須由程式自己來處理 cache 與真正 local storage 間同步的問題了。〕
3.2.1.2 Local Storage Access Exceptions
  1. MFC commands, which access an effective address range that maps to its own local storage can produce an error or unpredictable results.
    〔 MFC command 若存取了對應到自己的 local storage 範圍時,則會產生錯誤會或不可預期的結果。因為此時 DMA 的來源與目的區塊是重疊的。〕
  2. Therefore, it is the programmer's and privileged software's responsibility to avoid an unintended overlap, which can result in the corruption of data.


3.3 Single-Copy Atomicity

  1. In the PowerPC Architecture, the following single register accesses are always atomic:
    • Byte accesses (all bytes are aligned on byte boundaries)
    • Halfword accesses aligned on halfword boundaries
    • Word accesses aligned on word boundaries
    • Doubleword accesses aligned on doubleword boundaries
    • Quadword accesses aligned on quadword boundaries
  2. Only quadword accesses of local storage are atomic.
    〔沒有 alignment 的存取也不是 atomic 的,而在存取 local storage 時,只有 quadword access 是 atomic〕
3.4 Cache Models
  1. Harvard-style cache of PPC.
  2. A location in the data cache is considered to be modified in that cache if the location has been modified (for example, by a store instruction) and the modified data has not been written to main storage.
  3. Cache management instructions allow programs to manage the caches when needed.
  4. The Cache Management Instructions allow programs to:
    • Invalidate the copy of storage in an instruction cache block (icbi)
    [invalid an instruction cache]
    • Provide a hint that the program will probably soon access a specified data cache block (dcbt, dcbtst)
    [set a cache hint]
    • Set the contents of a data cache block to zeros (dcbz)
    [clear cache]
    • Copy the contents of a modified data cache block to main storage (dcbst)
    [copy back]
    • Copy the contents of a modified data cache block to main storage and make the copy of the block in the data cache invalid (dcbf)
    [copy back and set invalid]
  5. The SL1 data cache commands allow programs to:
    • Bring a range of effective addresses into the SL1 (sdcrt)
    • Bring a range of effective addresses into the SL1 (sdcrtst)
    • Write zeros to the contents of a range of effective addresses (sdcrz)
    • Store the modified contents of a range of effective addresses (sdcrst)
    • Store the modified contents of a range of effective addresses and invalidate the block (sdcrf)
  6. Above instructions are treated as no operations (no-op) instructions in implementations without an SL1.
3.5 Memory Coherence

  1. 當一連串的存取同一個位址的 memory 時,並不能保證每次的 store 都會真的寫入到這個位址上,因為 cache 的關係,所以軟體必須要自己來控制 Memory coherence 的問題。
    (原文寫了很多,就是要說明這一點啦~)

3.6 Storage Control Attributes

  1. Storage control attributes are associated with units of storage that are multiples of the page size.
    〔以 page 為單位的 storage control attributes〕
  2. The storage control attributes are:
    • Write through Required
    〔從 cache 寫到了真的記憶體位址中〕
    • Caching Inhibited
    〔不使用 cache 〕
    • Memory Coherence Required
    〔需要 Memory Coherence〕
    • Guarded
3.7 Shared Storage

  1. The CBEA supports the sharing of storage between programs, between different instances of the same program, between SPUs, and between processors and other devices.
  2. It also supports access to a storage location by one or more programs using different effective addresses or DMA addresses.
    〔segmentation + paging 吧〕
  3. Storage is shared in blocks of an integral number of pages.
  4. When the same storage location has different effective addresses, the addresses are called aliases.
  5. Each application can be granted separate access privileges to aliased pages.

CBEA Note - 02

Part 2 User Mode Environment

Chapter 2

2.1 Instruction and Command Classes
  1. Both the PPE and the SPU components execute programs that consist of instructions that specify the type of actions they are to perform.
    〔PPE 與 SPU 需指定 type of actions〕
  2. The MFCs execute commands that specify the type of data copying or movement
    they are to perform.
    〔MFC command 則需指定 type of data〕
  3. These instructions and commands can be categorized into three classes, as follows:
    • Defined Class (see page 28)
    • Illegal Class (see page 28)
    • Reserved Class (see page 29)
  4. The class of an instruction or command is determined by examining the opcode.
    〔指令的類別由 opcode 來決定〕
  5. If an instruction opcode, or a combination of opcode and extended opcode, is not that of a defined or reserved instruction, then the instruction is illegal.
    〔不符合型態的指令會被認定為非法指令。〕
  6. In future versions of the CBEA, instructions or commands that are currently illegal can become defined (by being added to the architecture), or reserved (by being assigned to a special-purpose operation). Similarly, some instructions or commands that are currently reserved can become defined in a subsequent architecture release.
    〔非法指令的判定標準在未來版本的 CBEA 裡是會變動的。〕

2.1.1 Defined Class

  1. Defined instructions and commands are guaranteed to be provided in all implementations. The only deviations permitted are instructions or commands specifically identified in their descriptions as optional.
    〔屬於 Defined 類別中的指令或命令在未來也被保證能夠在所有的 CBEA implementation 中繼續執行。會變動的只有 optional 的部分。〕
  2. Defined instructions or commands can have preferred forms, or invalid forms, or both. These forms are also indicated in the relevant description.
    〔已定義的指令或命令有 preferred 格式或 invalid 格式,或是兩種格式兼具。格式會在相關的敘述中說明。〕

2.1.2 Illegal Class

  1. Any attempt to execute an illegal PPE instruction causes an exception interrupt, but has no other effect on the PPE operation.
    〔非法的 PPE 指令將導致一個 interrupt,但不影響此 PPE opertaion。〕
  2. Any SPU that encounters an illegal instruction immediately halts program execution, records the event in its status register, and requests an external interrupt.
    〔SPU 一旦執行到非法的指令將會停止執行整個程式,並把此事件紀錄在 status register中,最後產生一個 external interrupt。〕
  3. The illegal-instruction interrupt should be enabled and routed to a PPE. In either case, the exception interrupt should cause the illegal-instruction handler for the system to be invoked, which then takes appropriate action.
    〔非法指令產生的 interrupt 都應被導向 PPE ,由 illegal-instruction handler 來處理。〕

2.1.3 Reserved Class


  1. Reserved instructions are allocated to specific purposes outside the scope of the CBEA, or
    are intended for use in future extensions of the CBEA.
    〔Reserved instructions 是超出 CBEA 定義之外,但未來有可能被 CBEA 的 extensions 所使用的指令集。〕
  2. These are the only commands that should be used by implementation-dependent applications.
    〔這一種類的指令集應只被 implementation-dependent application 所使用。〕

2.2 Forms of Defined Instructions and Commands

In the defined set of instructions and commands, certain field or parameter settings can execute more efficiently, or can produce an error condition. The CBEA defines the field and parameter settings as preferred forms or invalid forms.


2.2.1 Preferred Forms
  1. Some defined instructions and commands have preferred forms. The preferred form of an instruction or command executes in an efficient manner; any other form can take significantly longer to execute.
    〔一些已定義的指令集有 preferred 格式,以此種格式被執行時會有較佳的效率。〕

2.2.2 Invalid Forms
  1. Some defined instructions and commands have invalid forms.
    〔一些 Defined instructions and commands 則有無效的格式。〕

2.2.3 Optional Forms
  1. Some of the defined instructions are optional. Any attempt to execute an optional instruction that is not provided by the implementation causes the system illegal-instruction interrupt handler to be invoked.
  2. Currently, there are no optional MFC commands or instructions, but there is an optional facility, the Isolation Facility.

2.2.4 Optional Fields
  1. Optional fields in the MFC commands are assumed to be zero if not explicitly set.
  2. Software does not have to set the optional fields if zeros achieve the desired results.

2.3 Exceptions

  1. Exceptions are the result of an operation that cannot be executed as requested. In the CBEA, there are four types of exceptions:
    • Exceptions caused directly by the execution of a PPE instruction
    • Exceptions caused by the execution of an SPU instruction
    • Exceptions caused by the execution of a MFC DMA command
    System-caused, asynchronous, external-event exceptions.
  2. An exception can set status information in a register, and can cause an interrupt handler of the system software in the PPE to be invoked.
  3. Exceptions caused by the execution of a PPE instruction are defined in PowerPC Architecture, Book I. :
    - caused by the execution of a PPE instruction
    - caused by an asynchronous event.
  4. Exceptions generated directly by the execution of an instruction include:
    • An attempt to execute an illegal instruction
    • An attempt to execute a privileged instruction from the user mode environment (PPE only)
    • The execution of a defined instruction using an invalid form
    • The execution of an optional instruction not supported by the implementation
    • An attempt to access storage with an effective address alignment that is invalid for the instruction (PPE only)
    • The execution of a system-call instruction (PPE only)
    • The execution of a trap instruction (PPE only)
    • The execution of a floating-point instruction that causes a floating-point exception that is enabled (PPE only)
    • The execution of a floating-point instruction that requires assistance from system software (PPE only)
    • The execution of an interrupt mailbox channel write instruction by the SPU
    • The execution of an SPU stop-and-signal instruction
  5. The exceptions generated by an MFC command include:
    • An attempt to execute an illegal MFC command
    • An attempt to execute a defined MFC command using an invalid form (that is, invalid parameters)
    • An attempt to execute a defined MFC command with an alignment error
    • The execution of an optional MFC command not supported by the implementation
    • An attempt to access storage not defined by the MFC-translation facility

2.4 SPU Events

The SPU supports an event facility that provides the capability to
  1. mask and to unmask events
  2. wait on events
  3. poll for events
  4. provide interrupts for specific events
If the SPU interrupts are enabled an occurrence of an unmasked event results in an SPU interrupt handler being invoked with the first instruction of the interrupt handler located at local storage address ‘0’.
〔嗯嗯~原來 SPU interrupt handler 在 LS address '0' 啊。〕