11/29/2006

CBEA Note - 03

Chapter 3. Storage Models

The CBEA-compliant processor implements two concurrent storage models for an application program:
  1. virtual storage model of the PPE (also used by MFCs for DMA operations) . The PPE virtual storage model allows privileged software to provide different views of the real memory
    and I/O devices for the PPE and any MFC unit DMA transfers. It is possible for multiple virtual address spaces to exist.
  2. local storage model of the SPU. The SPU local storage model is restricted to applications running on SPUs and data transfers handled by the MFC.
3.1 Virtual Storage Model

  1. Allows applications to exist within a virtual address space larger than either the effective address space or the real address space.
  2. In a typical CBEA-compliant processor system, the effective address space of each program is a subset of a larger virtual address space managed by privileged software.
  3. The privileged software manages the real storage resources of the system by setting up the tables and other information used by the hardware address translation facility.
  4. Access to the virtual pages can be read/write, read only, or no access.


基本上記憶體管理模型還是用 segmentation 加上 paging,只是與 x86 用詞上不太一樣:

CBEA 架構下是:

segmentation paging
virtual address -----------> effective address -----> real address

x86 架構下則是:

segmentation paging
logical address -----------> linear address -----> physical address


3.2 SPU Local Storage Model

  1. Each SPU has its own dedicated area of local storage.
  2. The individual local storage areas can be aliased to a real address within the main storage
    domain and any PPE can access these areas by using the appropriate effective address.
3.2.1 Local Storage Access

  1. The CBEA allows the local storage of an SPU to have an alias in the real address space in the main storage domain.
  2. This allows other processors in the main storage domain to access local storage through appropriately mapped effective address space.
3.2.1.1 Mapping Requirements
  1. Privileged software should access the aliased pages of local storage in the main storage domain.
  2. If not accessed as caching inhibited, software must explicitly manage the coherency of local storage with other system caches.
    〔若不以 cache inhibited 方式來存取 local storage,就必須由程式自己來處理 cache 與真正 local storage 間同步的問題了。〕
3.2.1.2 Local Storage Access Exceptions
  1. MFC commands, which access an effective address range that maps to its own local storage can produce an error or unpredictable results.
    〔 MFC command 若存取了對應到自己的 local storage 範圍時,則會產生錯誤會或不可預期的結果。因為此時 DMA 的來源與目的區塊是重疊的。〕
  2. Therefore, it is the programmer's and privileged software's responsibility to avoid an unintended overlap, which can result in the corruption of data.


3.3 Single-Copy Atomicity

  1. In the PowerPC Architecture, the following single register accesses are always atomic:
    • Byte accesses (all bytes are aligned on byte boundaries)
    • Halfword accesses aligned on halfword boundaries
    • Word accesses aligned on word boundaries
    • Doubleword accesses aligned on doubleword boundaries
    • Quadword accesses aligned on quadword boundaries
  2. Only quadword accesses of local storage are atomic.
    〔沒有 alignment 的存取也不是 atomic 的,而在存取 local storage 時,只有 quadword access 是 atomic〕
3.4 Cache Models
  1. Harvard-style cache of PPC.
  2. A location in the data cache is considered to be modified in that cache if the location has been modified (for example, by a store instruction) and the modified data has not been written to main storage.
  3. Cache management instructions allow programs to manage the caches when needed.
  4. The Cache Management Instructions allow programs to:
    • Invalidate the copy of storage in an instruction cache block (icbi)
    [invalid an instruction cache]
    • Provide a hint that the program will probably soon access a specified data cache block (dcbt, dcbtst)
    [set a cache hint]
    • Set the contents of a data cache block to zeros (dcbz)
    [clear cache]
    • Copy the contents of a modified data cache block to main storage (dcbst)
    [copy back]
    • Copy the contents of a modified data cache block to main storage and make the copy of the block in the data cache invalid (dcbf)
    [copy back and set invalid]
  5. The SL1 data cache commands allow programs to:
    • Bring a range of effective addresses into the SL1 (sdcrt)
    • Bring a range of effective addresses into the SL1 (sdcrtst)
    • Write zeros to the contents of a range of effective addresses (sdcrz)
    • Store the modified contents of a range of effective addresses (sdcrst)
    • Store the modified contents of a range of effective addresses and invalidate the block (sdcrf)
  6. Above instructions are treated as no operations (no-op) instructions in implementations without an SL1.
3.5 Memory Coherence

  1. 當一連串的存取同一個位址的 memory 時,並不能保證每次的 store 都會真的寫入到這個位址上,因為 cache 的關係,所以軟體必須要自己來控制 Memory coherence 的問題。
    (原文寫了很多,就是要說明這一點啦~)

3.6 Storage Control Attributes

  1. Storage control attributes are associated with units of storage that are multiples of the page size.
    〔以 page 為單位的 storage control attributes〕
  2. The storage control attributes are:
    • Write through Required
    〔從 cache 寫到了真的記憶體位址中〕
    • Caching Inhibited
    〔不使用 cache 〕
    • Memory Coherence Required
    〔需要 Memory Coherence〕
    • Guarded
3.7 Shared Storage

  1. The CBEA supports the sharing of storage between programs, between different instances of the same program, between SPUs, and between processors and other devices.
  2. It also supports access to a storage location by one or more programs using different effective addresses or DMA addresses.
    〔segmentation + paging 吧〕
  3. Storage is shared in blocks of an integral number of pages.
  4. When the same storage location has different effective addresses, the addresses are called aliases.
  5. Each application can be granted separate access privileges to aliased pages.

CBEA Note - 02

Part 2 User Mode Environment

Chapter 2

2.1 Instruction and Command Classes
  1. Both the PPE and the SPU components execute programs that consist of instructions that specify the type of actions they are to perform.
    〔PPE 與 SPU 需指定 type of actions〕
  2. The MFCs execute commands that specify the type of data copying or movement
    they are to perform.
    〔MFC command 則需指定 type of data〕
  3. These instructions and commands can be categorized into three classes, as follows:
    • Defined Class (see page 28)
    • Illegal Class (see page 28)
    • Reserved Class (see page 29)
  4. The class of an instruction or command is determined by examining the opcode.
    〔指令的類別由 opcode 來決定〕
  5. If an instruction opcode, or a combination of opcode and extended opcode, is not that of a defined or reserved instruction, then the instruction is illegal.
    〔不符合型態的指令會被認定為非法指令。〕
  6. In future versions of the CBEA, instructions or commands that are currently illegal can become defined (by being added to the architecture), or reserved (by being assigned to a special-purpose operation). Similarly, some instructions or commands that are currently reserved can become defined in a subsequent architecture release.
    〔非法指令的判定標準在未來版本的 CBEA 裡是會變動的。〕

2.1.1 Defined Class

  1. Defined instructions and commands are guaranteed to be provided in all implementations. The only deviations permitted are instructions or commands specifically identified in their descriptions as optional.
    〔屬於 Defined 類別中的指令或命令在未來也被保證能夠在所有的 CBEA implementation 中繼續執行。會變動的只有 optional 的部分。〕
  2. Defined instructions or commands can have preferred forms, or invalid forms, or both. These forms are also indicated in the relevant description.
    〔已定義的指令或命令有 preferred 格式或 invalid 格式,或是兩種格式兼具。格式會在相關的敘述中說明。〕

2.1.2 Illegal Class

  1. Any attempt to execute an illegal PPE instruction causes an exception interrupt, but has no other effect on the PPE operation.
    〔非法的 PPE 指令將導致一個 interrupt,但不影響此 PPE opertaion。〕
  2. Any SPU that encounters an illegal instruction immediately halts program execution, records the event in its status register, and requests an external interrupt.
    〔SPU 一旦執行到非法的指令將會停止執行整個程式,並把此事件紀錄在 status register中,最後產生一個 external interrupt。〕
  3. The illegal-instruction interrupt should be enabled and routed to a PPE. In either case, the exception interrupt should cause the illegal-instruction handler for the system to be invoked, which then takes appropriate action.
    〔非法指令產生的 interrupt 都應被導向 PPE ,由 illegal-instruction handler 來處理。〕

2.1.3 Reserved Class


  1. Reserved instructions are allocated to specific purposes outside the scope of the CBEA, or
    are intended for use in future extensions of the CBEA.
    〔Reserved instructions 是超出 CBEA 定義之外,但未來有可能被 CBEA 的 extensions 所使用的指令集。〕
  2. These are the only commands that should be used by implementation-dependent applications.
    〔這一種類的指令集應只被 implementation-dependent application 所使用。〕

2.2 Forms of Defined Instructions and Commands

In the defined set of instructions and commands, certain field or parameter settings can execute more efficiently, or can produce an error condition. The CBEA defines the field and parameter settings as preferred forms or invalid forms.


2.2.1 Preferred Forms
  1. Some defined instructions and commands have preferred forms. The preferred form of an instruction or command executes in an efficient manner; any other form can take significantly longer to execute.
    〔一些已定義的指令集有 preferred 格式,以此種格式被執行時會有較佳的效率。〕

2.2.2 Invalid Forms
  1. Some defined instructions and commands have invalid forms.
    〔一些 Defined instructions and commands 則有無效的格式。〕

2.2.3 Optional Forms
  1. Some of the defined instructions are optional. Any attempt to execute an optional instruction that is not provided by the implementation causes the system illegal-instruction interrupt handler to be invoked.
  2. Currently, there are no optional MFC commands or instructions, but there is an optional facility, the Isolation Facility.

2.2.4 Optional Fields
  1. Optional fields in the MFC commands are assumed to be zero if not explicitly set.
  2. Software does not have to set the optional fields if zeros achieve the desired results.

2.3 Exceptions

  1. Exceptions are the result of an operation that cannot be executed as requested. In the CBEA, there are four types of exceptions:
    • Exceptions caused directly by the execution of a PPE instruction
    • Exceptions caused by the execution of an SPU instruction
    • Exceptions caused by the execution of a MFC DMA command
    System-caused, asynchronous, external-event exceptions.
  2. An exception can set status information in a register, and can cause an interrupt handler of the system software in the PPE to be invoked.
  3. Exceptions caused by the execution of a PPE instruction are defined in PowerPC Architecture, Book I. :
    - caused by the execution of a PPE instruction
    - caused by an asynchronous event.
  4. Exceptions generated directly by the execution of an instruction include:
    • An attempt to execute an illegal instruction
    • An attempt to execute a privileged instruction from the user mode environment (PPE only)
    • The execution of a defined instruction using an invalid form
    • The execution of an optional instruction not supported by the implementation
    • An attempt to access storage with an effective address alignment that is invalid for the instruction (PPE only)
    • The execution of a system-call instruction (PPE only)
    • The execution of a trap instruction (PPE only)
    • The execution of a floating-point instruction that causes a floating-point exception that is enabled (PPE only)
    • The execution of a floating-point instruction that requires assistance from system software (PPE only)
    • The execution of an interrupt mailbox channel write instruction by the SPU
    • The execution of an SPU stop-and-signal instruction
  5. The exceptions generated by an MFC command include:
    • An attempt to execute an illegal MFC command
    • An attempt to execute a defined MFC command using an invalid form (that is, invalid parameters)
    • An attempt to execute a defined MFC command with an alignment error
    • The execution of an optional MFC command not supported by the implementation
    • An attempt to access storage not defined by the MFC-translation facility

2.4 SPU Events

The SPU supports an event facility that provides the capability to
  1. mask and to unmask events
  2. wait on events
  3. poll for events
  4. provide interrupts for specific events
If the SPU interrupts are enabled an occurrence of an unmasked event results in an SPU interrupt handler being invoked with the first instruction of the interrupt handler located at local storage address ‘0’.
〔嗯嗯~原來 SPU interrupt handler 在 LS address '0' 啊。〕

11/28/2006

CBEA Note - 01

Preface

1. CBEA = Cell Broadband Engine Architecture

2. CBEA supports only big-endian byte ordering.
a. PPE are not required to support little endian.
b. SPUs do not support little endian.
〔這種敘述方法挺怪的,PPE 八成是硬體有支援,但軟體沒驗證,才會這麼敘述吧!〕

3. MFC (Memory Flow Controller) 的 DMA(Direct Memory Access) 指令也不支援 little endian。
〔這...... 很多驅動程式可有得玩了。〕

Ch1 Introduction to Cell Broadband Engine Architecture

1. CBEA 定義的是 a process structure toward distributed processing (分散式運算架構的處理器 - 我自己亂翻的)

2. CBEA 分成兩種環境:
a. User Mode Environment (UME) - For application programmer
b. Privileged Mode Environment (PME) - For software in privileged mode (ex: Operating System)

3. The focus of this document is on the infrastructure around the computational, data movement, communication, synchronization, and resource management components.

* PowerPC Architecture, Books I - III define the PowerPC Processor Element (PPE) components.
* The Synergistic Processor Unit Instruction Set Architecture document defines the Synergistic Processor Unit (SPU) components.

〔這個檔案就 319 頁,不包含上面*的東東,很好很好,有得看了。〕


4. SPU 也可運作在非 CBEA 架構下,只是得自己K SPU Instruction Set 來玩了。
〔這倒是不錯啊~只是凡事都自己來,恐怕死得快!〕

Ch 1.1 Broadband Processor Organization

1. 就硬體上來看,一個相容 CBEA 的處理器可由以下幾種方式來組成:
a. a single-chip
b. multi-chip module (or modules)
c. multiple single-chip modules on a mother board
d. second-level package
〔一句話,讚到沒話說,果然是搞超級電腦的IBM啊~〕

2. 就邏輯上來看,CBEA 定義了下列四種 functional components
a. PowerPC Processor Element(PPE)
b. Synergistic Processor Unit (SPU)
c. Memory Flow Controller (MFC)
d. Internal Interrupt Controller (IIC).

3. 每一個 SPU 需要一個對應的 Local Store, 一個 MFC 來指定其 MMU與 Replacement Management Table (RMT)。這樣的組合就稱為 SPU Element (= SPE)。
〔原來這才是 SPE 的定義啊!網路文件果然不完整令人誤會~〕

4. CBEA-Compliant Processor
• One or more PowerPC Processor Elements (PPEs)
• One or more Synergistic Processor Elements (SPEs), which are the combination of a Synergistic Processor Units (SPUs), a local storage area, and a Memory Flow Controller (MFC)
• One Internal Interrupt Controller (IIC)
• One Element Interconnect Bus (EIB) for connecting units within the processor

5. A primary function of the PPEs is the management and allocation of tasks for the SPEs in a system.
〔PPE 的主要功能是管理 SPEs 的 Tasks〕

6. SPU have a single instruction, multiple data (SIMD) capability and typically process data and initiate any required data transfers (subject to access properties set up by a PPE) in order to perform their allocated tasks.
〔SPU 具有 SIMD 能力,並能要求傳輸資料的要求以完成被指定的工作。〕

7. The purpose of the SPU is to enable applications that require a higher computational unit density and can effectively use the provided instruction set.
〔SPU 的主要目的是讓需要高度運算的程式能有效的使用它的指令集。〕

8. MFC components are essentially the data transfer engines. They provide the primary method for data transfer, protection, and synchronization between main storage and the local storage.
〔MFC 控制 main storage 與 local storage 間的資料傳輸。〕

9. Each MFC can typically support multiple DMA transfers at the same time and can maintain and process multiple MFC commands.
〔每一個 MFC 能同時支援多比 DMA 傳輸,並 maintain 與處理多個MFC Command。〕

10. Each MFC provides one queue for the associated SPU (MFC SPU command queue) and one queue for other processors and devices (MFC proxy command queue).
〔MFC會為各別的元件準備一個 queue,並維護與處理 queue 中的MFC Command。〕

11. Each MFC DMA data transfer command request involves both a local storage address (LSA) and an effective address (EA). The local storage address can directly address only the local storage area of its associated SPU.
〔每一比MFC DMA的資料傳輸都與 local storage address(LSA) 與 effective address(EA)有關。LSA只能指到其 SPU 的 local storage 範圍內。〕

12. An MFC presents two types of interfaces: one to the SPUs and another to all other processors and devices in a processing group.
〔一個 MFC 代表兩種介面,一是 SPU,另一則是其它 processor與 device〕

• SPU Channel: The SPUs use a channel interface to control the MFC. In this case, code running on an SPU can only access the MFC SPU command queue for that SPU.
〔SPU Channel:SPU 使用 channel interface 來控制 MFC。而在 SPU 中的程式僅能存取其 MFC SPU Command queue〕
• Memory-Mapped Register: Other processors and devices control the MFC by using memory-mapped registers. It is possible for any processor and device in the system to control an MFC and to issue MFC proxy command requests on behalf of the SPU.
〔Memory-Mapped Register:代表是由非 SPU 的 processor 與 device 正使用 memory-mapped regisiters 在控制 MFC,並對 SPU 發出 MFC proxy command request。〕
(有夠難翻,大概只有我自己看得懂吧~看不懂的請看原文。)


13. The MFC also supports bandwidth reservation and data synchronization features.
〔太簡單就不翻了。〕

14. The IIC component manages the priority of the interrupts presented to the PPEs. The main purpose of the IIC is to allow interrupts from the other components in the processor to be handled without using the main system interrupt controller.
〔IIC 是由 PPE 來控制的,其主要功能就是代 main system 的 interrupt controller 來代勞。〕

15. In a CBEA-compliant system, software must first check the IIC to determine if the interrupt was sourced from an external system interrupt controller. The IIC is not intended to replace the main system interrupt controller for handling interrupts from all I/O devices.
〔在一個相容 CBEA 的系統中,程式必須先檢查 IIC 來確認 interrupt 是否來自系統之外的 interrupt controller。IIC 並不是用來取代原來 main system 中的 interrupt controller 的。〕

16. Local storage consists of one or more separate areas of memory storage, each one associated with a specific SPU.
〔LS 由一塊或多或分隔的記憶體所組成,而每一塊會對應到一個 SPU。〕

17. Each SPU can only execute instructions (including data load and data store operations) from within its own associated local storage domain.
〔SPU 僅能執行其對應的LSD中的指令。〕

18. Therefore, any required data transfers to, or from, storage elsewhere in a system must always be performed by issuing an MFC DMA command to transfer data between the local storage domain (of the individual SPU) and the main storage domain, unless local storage aliasing is enabled.
〔因此,超出 LS 範圍的資料都必須經由 MFC DMA Command 來要求 MFC 從 main storage domain 中傳入 LS 中,除非 local storage aliasing 被 enabled〕

19. An SPU program references its local storage domain using a local address. However, privileged software can allow the local storage domain of the SPU to be aliased into main storage domain by setting the D bit of the MFC_SR1 to ‘1’. Each local storage area is assigned a real address within the main storage domain. (A real address is either the address of a byte in the system memory, or a byte on an I/O device.) This allows privileged software to map a local storage area into the effective address space of an application to allow DMA transfers between the local storage of one SPU and the local storage of another SPU.
〔一旦 local storage aliasing 被 enabled,每一塊 local storage area 都會被指到 main storage domain。這讓 privileged software 能 map 一塊 local storage area 到某個程式的 effective address,也就允許了 SPU 到其它 SPU 間 local storage 的 DMA 傳輸。〕

20. Data transfers that use the local storage area aliased in the main storage domain should do so as caching inhibited, since these accesses are not coherent with the SPU local storage accesses (that is, SPU load, store, instruction fetch) in its local storage domain. Aliasing the local storage areas into the real address space of the main storage domain allows any other processors or devices, which have access to the main storage area, direct access to local storage. However, since aliased local storage must be treated as noncacheable, transferring a large amount of data using the PPE load and store instructions can result in poor performance. Data transfers between the local storage domain and the main storage domain should use the MFC DMA commands to avoid stalls.
〔重點就是,local storage alias 功能被打開時,是不可使用 cache 的,因為外部的 PPE 或是 device 可由此直接存取 local storage。 SPU 中的程式是 no cache 的設計,所以在沒有 cache 的狀況下最好用 MFC DMA 來存取 local storage ,而非 PPE 的 load/store 指令,就可以避免 PPE stalls。〕

21. Main Storage Addressing 得看 PowerPC Architecture, Book I - III 才能更清楚。應該就是 phsical address, virtual address, segments, pages 等等的東東了。每顆 CPU 實作上都不太相同。

Ch 1.2 Cache Replacement Management Facility


1. Tables for managing TLBs and caches are referred to as replacement management tables (RMTs).

2. An SPE group can also contain an optional cache hierarchy, the SL1 caches, which represent first level caches for DMA transfers.

3. The SL1 caches can also contain an optional RMTs.


Ch 1.3 Instructions, Commands, and Facilities
  1. Instructions define an operation to be performed by a processing element.
  2. Commands define operations to be performed by the MFC.
  3. Facilities is a term used to describe functionality accessed through the main storage domain for processors or devices having such access, or by SPUs through the use of channel instructions.


PS3 之 Cell BE 相關資訊

Cell BE 的強悍看來越後面越有好戲看了~


Official Documents on Cell Broadband Engine™ (CBE) technology and software components

IBM Education Assistant - IBM Cell BE

IBM Developerworks - Cell

Linux on Cell BE-based Systems


Programming The Cell Processor

PS3.QJ

PS3Linux Forum on QJ.NET

慢慢看,再慢慢寫些心得報告吧!