11/28/2006

CBEA Note - 01

Preface

1. CBEA = Cell Broadband Engine Architecture

2. CBEA supports only big-endian byte ordering.
a. PPE are not required to support little endian.
b. SPUs do not support little endian.
〔這種敘述方法挺怪的,PPE 八成是硬體有支援,但軟體沒驗證,才會這麼敘述吧!〕

3. MFC (Memory Flow Controller) 的 DMA(Direct Memory Access) 指令也不支援 little endian。
〔這...... 很多驅動程式可有得玩了。〕

Ch1 Introduction to Cell Broadband Engine Architecture

1. CBEA 定義的是 a process structure toward distributed processing (分散式運算架構的處理器 - 我自己亂翻的)

2. CBEA 分成兩種環境:
a. User Mode Environment (UME) - For application programmer
b. Privileged Mode Environment (PME) - For software in privileged mode (ex: Operating System)

3. The focus of this document is on the infrastructure around the computational, data movement, communication, synchronization, and resource management components.

* PowerPC Architecture, Books I - III define the PowerPC Processor Element (PPE) components.
* The Synergistic Processor Unit Instruction Set Architecture document defines the Synergistic Processor Unit (SPU) components.

〔這個檔案就 319 頁,不包含上面*的東東,很好很好,有得看了。〕


4. SPU 也可運作在非 CBEA 架構下,只是得自己K SPU Instruction Set 來玩了。
〔這倒是不錯啊~只是凡事都自己來,恐怕死得快!〕

Ch 1.1 Broadband Processor Organization

1. 就硬體上來看,一個相容 CBEA 的處理器可由以下幾種方式來組成:
a. a single-chip
b. multi-chip module (or modules)
c. multiple single-chip modules on a mother board
d. second-level package
〔一句話,讚到沒話說,果然是搞超級電腦的IBM啊~〕

2. 就邏輯上來看,CBEA 定義了下列四種 functional components
a. PowerPC Processor Element(PPE)
b. Synergistic Processor Unit (SPU)
c. Memory Flow Controller (MFC)
d. Internal Interrupt Controller (IIC).

3. 每一個 SPU 需要一個對應的 Local Store, 一個 MFC 來指定其 MMU與 Replacement Management Table (RMT)。這樣的組合就稱為 SPU Element (= SPE)。
〔原來這才是 SPE 的定義啊!網路文件果然不完整令人誤會~〕

4. CBEA-Compliant Processor
• One or more PowerPC Processor Elements (PPEs)
• One or more Synergistic Processor Elements (SPEs), which are the combination of a Synergistic Processor Units (SPUs), a local storage area, and a Memory Flow Controller (MFC)
• One Internal Interrupt Controller (IIC)
• One Element Interconnect Bus (EIB) for connecting units within the processor

5. A primary function of the PPEs is the management and allocation of tasks for the SPEs in a system.
〔PPE 的主要功能是管理 SPEs 的 Tasks〕

6. SPU have a single instruction, multiple data (SIMD) capability and typically process data and initiate any required data transfers (subject to access properties set up by a PPE) in order to perform their allocated tasks.
〔SPU 具有 SIMD 能力,並能要求傳輸資料的要求以完成被指定的工作。〕

7. The purpose of the SPU is to enable applications that require a higher computational unit density and can effectively use the provided instruction set.
〔SPU 的主要目的是讓需要高度運算的程式能有效的使用它的指令集。〕

8. MFC components are essentially the data transfer engines. They provide the primary method for data transfer, protection, and synchronization between main storage and the local storage.
〔MFC 控制 main storage 與 local storage 間的資料傳輸。〕

9. Each MFC can typically support multiple DMA transfers at the same time and can maintain and process multiple MFC commands.
〔每一個 MFC 能同時支援多比 DMA 傳輸,並 maintain 與處理多個MFC Command。〕

10. Each MFC provides one queue for the associated SPU (MFC SPU command queue) and one queue for other processors and devices (MFC proxy command queue).
〔MFC會為各別的元件準備一個 queue,並維護與處理 queue 中的MFC Command。〕

11. Each MFC DMA data transfer command request involves both a local storage address (LSA) and an effective address (EA). The local storage address can directly address only the local storage area of its associated SPU.
〔每一比MFC DMA的資料傳輸都與 local storage address(LSA) 與 effective address(EA)有關。LSA只能指到其 SPU 的 local storage 範圍內。〕

12. An MFC presents two types of interfaces: one to the SPUs and another to all other processors and devices in a processing group.
〔一個 MFC 代表兩種介面,一是 SPU,另一則是其它 processor與 device〕

• SPU Channel: The SPUs use a channel interface to control the MFC. In this case, code running on an SPU can only access the MFC SPU command queue for that SPU.
〔SPU Channel:SPU 使用 channel interface 來控制 MFC。而在 SPU 中的程式僅能存取其 MFC SPU Command queue〕
• Memory-Mapped Register: Other processors and devices control the MFC by using memory-mapped registers. It is possible for any processor and device in the system to control an MFC and to issue MFC proxy command requests on behalf of the SPU.
〔Memory-Mapped Register:代表是由非 SPU 的 processor 與 device 正使用 memory-mapped regisiters 在控制 MFC,並對 SPU 發出 MFC proxy command request。〕
(有夠難翻,大概只有我自己看得懂吧~看不懂的請看原文。)


13. The MFC also supports bandwidth reservation and data synchronization features.
〔太簡單就不翻了。〕

14. The IIC component manages the priority of the interrupts presented to the PPEs. The main purpose of the IIC is to allow interrupts from the other components in the processor to be handled without using the main system interrupt controller.
〔IIC 是由 PPE 來控制的,其主要功能就是代 main system 的 interrupt controller 來代勞。〕

15. In a CBEA-compliant system, software must first check the IIC to determine if the interrupt was sourced from an external system interrupt controller. The IIC is not intended to replace the main system interrupt controller for handling interrupts from all I/O devices.
〔在一個相容 CBEA 的系統中,程式必須先檢查 IIC 來確認 interrupt 是否來自系統之外的 interrupt controller。IIC 並不是用來取代原來 main system 中的 interrupt controller 的。〕

16. Local storage consists of one or more separate areas of memory storage, each one associated with a specific SPU.
〔LS 由一塊或多或分隔的記憶體所組成,而每一塊會對應到一個 SPU。〕

17. Each SPU can only execute instructions (including data load and data store operations) from within its own associated local storage domain.
〔SPU 僅能執行其對應的LSD中的指令。〕

18. Therefore, any required data transfers to, or from, storage elsewhere in a system must always be performed by issuing an MFC DMA command to transfer data between the local storage domain (of the individual SPU) and the main storage domain, unless local storage aliasing is enabled.
〔因此,超出 LS 範圍的資料都必須經由 MFC DMA Command 來要求 MFC 從 main storage domain 中傳入 LS 中,除非 local storage aliasing 被 enabled〕

19. An SPU program references its local storage domain using a local address. However, privileged software can allow the local storage domain of the SPU to be aliased into main storage domain by setting the D bit of the MFC_SR1 to ‘1’. Each local storage area is assigned a real address within the main storage domain. (A real address is either the address of a byte in the system memory, or a byte on an I/O device.) This allows privileged software to map a local storage area into the effective address space of an application to allow DMA transfers between the local storage of one SPU and the local storage of another SPU.
〔一旦 local storage aliasing 被 enabled,每一塊 local storage area 都會被指到 main storage domain。這讓 privileged software 能 map 一塊 local storage area 到某個程式的 effective address,也就允許了 SPU 到其它 SPU 間 local storage 的 DMA 傳輸。〕

20. Data transfers that use the local storage area aliased in the main storage domain should do so as caching inhibited, since these accesses are not coherent with the SPU local storage accesses (that is, SPU load, store, instruction fetch) in its local storage domain. Aliasing the local storage areas into the real address space of the main storage domain allows any other processors or devices, which have access to the main storage area, direct access to local storage. However, since aliased local storage must be treated as noncacheable, transferring a large amount of data using the PPE load and store instructions can result in poor performance. Data transfers between the local storage domain and the main storage domain should use the MFC DMA commands to avoid stalls.
〔重點就是,local storage alias 功能被打開時,是不可使用 cache 的,因為外部的 PPE 或是 device 可由此直接存取 local storage。 SPU 中的程式是 no cache 的設計,所以在沒有 cache 的狀況下最好用 MFC DMA 來存取 local storage ,而非 PPE 的 load/store 指令,就可以避免 PPE stalls。〕

21. Main Storage Addressing 得看 PowerPC Architecture, Book I - III 才能更清楚。應該就是 phsical address, virtual address, segments, pages 等等的東東了。每顆 CPU 實作上都不太相同。

Ch 1.2 Cache Replacement Management Facility


1. Tables for managing TLBs and caches are referred to as replacement management tables (RMTs).

2. An SPE group can also contain an optional cache hierarchy, the SL1 caches, which represent first level caches for DMA transfers.

3. The SL1 caches can also contain an optional RMTs.


Ch 1.3 Instructions, Commands, and Facilities
  1. Instructions define an operation to be performed by a processing element.
  2. Commands define operations to be performed by the MFC.
  3. Facilities is a term used to describe functionality accessed through the main storage domain for processors or devices having such access, or by SPUs through the use of channel instructions.


沒有留言: