12/07/2006

SIXAXIS under Windows

PS3 六向手把可以在 Windows 下使用了。

http://forums.ps2dev.org/viewtopic.php?t=7099

PS3 入手

今天終於入手 PS3 了,想了這麼久,
在光華商場裡的印地安小賣店買的,
真是高興啊~

雖然要綁2片遊戲和另一隻手把,總共 23230 元。
但總算滿足了一下這半年多的心勞。

遊戲 1 - 鋼彈
遊戲 2 - RR7
主機 60 G 版本,聲音真的很小,真是美啊!MSD Duo 也不會卡住。
目前正在安裝 Fedora Core 6 PPC 中。

看來唯一的缺憾就是目前只是用 PS3 輸出到我的電視卡中,誰叫我沒
電視咧!用 480i 輸出,真是太浪費啦!不過好處是一個螢幕就搞定,
也許將來買張有支援 1080P 的電視卡也可以”暫時”取代那超貴的 LCD-TV。

SONY 對 PS3 Linux 的支援可真不小。
CELL-Linux-CL_20061110-ADDON 這張 CD 裡有比較完整的資訊,目前
看來 Kernel 都是架在 Hypervisor 之上,難怪我安裝 FC6 這麼慢。

12/05/2006

Plugin 的 Open Source

不只是說怎麼寫 Plugin ,而是怎麼替自己的程式建立一個能夠 Load Plugin 的介面。
參考資料
http://www.linux-ha.org/_cache/TechnicalPapers__pils.pdf

CVS Code:
http://cvs.linux-ha.org/viewcvs//viewcvs.cgi/linux-ha/lib/pils/

FlashPlayer 之加速 2

1. OnTimer() -> UpdateScreen() -> DrawScreen()
2. SYSFONT 會減慢速度,因為每次 UpdateRect() 都會 TestDeviceFont()。
3. DISABLE_FOCUS_RECT 在 UpdateRect() 中,若取消可加速,但應該不多,只是少個判斷式。
4. 關於 Mouse Cursor 的相關函數都可以取消以加快速度,目前反正沒 Mouse。

另外應呼叫 BlinkCursor() 來更新 Cursor。

FlashPlayer - raster 對速度影響

在做一些 Flash 動畫效果時發現了一個有趣的現象,
即使兩個變動的區塊 (Rect) 大小相同,但寬形的區塊
會比長形的區塊要快,我想是因為 FlashPlayer 是以 raster
方式來做運算的,較多的 raster (長形) 處理會較慢。

這點倒是可以加入最佳化的考量中。

(當然,在快速的機器中是很難發現這個結果的。)

另外也想到了一個重點,也許重新對 data 處理 alignment 可
以加速單一指令運算的速度。目前尚未看到 FlashPlayer 程式
中有對 alignment 處理。但對支援 SIMD 的機器倒是有做 Assembly
Level 的最佳化。
關於 Glibc malloc 的 alignment 可以參考這。
http://www.delorie.com/gnu/docs/glibc/libc_31.html

12/02/2006

Track+ : 專案管理系統

Track and Manage Issues

  • Easy and pleasant to use
  • Clean and well organized user interface
  • Fully web based, no client installation
  • Highly configurable
  • Email reminders for due tasks
  • Overview diagrams
  • Budgets and expenses (new!)
  • Nice template based PDF reports
  • Easy to setup and administer
  • Powerful access control
  • Manages hundreds of projects
  • More features...

http://www.trackplus.com/

OProfile - Open Source Profiler

OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

It consists of a kernel driver and a daemon for collecting sample data, and several post-profiling tools for turning data into information.

OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications.


http://oprofile.sourceforge.net/

IDEA - 加速 FlashPlayer 的一些方式

IDEA - 加速 FlashPlayer 的一些方式

  1. IXP425 也有 CP0 具有 SIMD 能力,應該可以用來做 Assembly Level 的加速。
    (RGBI -> RGBIL)
  2. 應該加速的對象:
    - CompositeRGB()
    - CompositeRGBSolid()
    - DrawSolidSlab32()
    - DrawRGBSlab32A()
    - CompositeSolidSlab()
    - CompositeGradientSlab()
    - CompositeBitmapSlab()
    - GetBackground32()
    - GetBackgroundWhite()
    - Blt32toI()
    - Blt32to32()
  3. 也許應該 disable SMOOTHBITS
  4. 使用 -O2 來 Compile
  5. Intel Xscale 微架構代碼優化建議
  6. Data Access Performance Optimization on the Intel® 80321 I/O Processor
  7. Intel XScale IOP Linux
  8. GCC Assembler Instructions with C Expression Operands
  9. GCC-Inline-Assembly-HOWTO
  10. Intel® XScale™ Microarchitecture Assembly Language Quick Reference Card (pdf)

CBEA Note - 07

7. MFC Commands

  1. MFC commands can either be issued by code running on the SPU, or by code running on another processor or device, such as the PPE.
  2. SPU executes a series of channel instructions to issue an MFC command.
  3. other processors or devices performs a series of memory-mapped I/O (MMIO) transfers to issue an MFC command to an SPE.
  4. The commands issued are queued to one of these command queues of the MFC:
    • MFC proxy command queue for any MMIO-initiated commands
    • MFC SPU command queue for any channel-initiated commands
  5. MFC commands that transfer data are referred to as MFC DMA commands.
  6. Transfer data into an SPE (from main storage to local storage) are considered get commands.
  7. Transfer data out of an SPE (from local storage to main storage) are considered put
    commands.
7.1 Command Classes

  1. Commands can be categorized into three classes, as follows:
    • Defined
    • Illegal
    • Reserved

7.1.1 Defined Commands
  1. Defined commands fall into one of three categories:
    • Data transfer commands
    – Data moved from local storage and placed in main storage (put commands)
    – Data moved into local storage from main storage (get commands)
    • SL1 cache-management commands
    • Synchronization commands
7.2 Command Exceptions

  1. Unaligned DMAs are not supported by the CBEA.
  2. If an unaligned DMA operation is encountered, the MFC command queue processing is suspended and an DMA alignment interrupt is generated.
7.4 DMA List Elements

  1. Commands with a suffix of “l” use list elements located in the local storage pointed to by the DMA list local storage address (LA) parameter of a list command.
  2. The element contains the lower order word of the effective address (LEAL) and the transfer size (LTS).
  3. The DMA list commands use a list of effective addresses and transfer size pairs, or list elements, stored in local storage as the parameters for the DMA transfer. These parameters are used for SPU-initiated DMA list commands, which are not supported on the MFC proxy command queue.

12/01/2006

CBEA Note - 06

6. Memory Flow Controller



  1. In a CBEA-compliant processor, the MFC serves as an interface to the system and to other elements for an SPU.
  2. It provides the primary mechanism for data transfer, protection, and synchronization between main storage and the local storage arrays.
  3. there is logically an MFC for each SPU in a processor.
  4. MFC has two interfaces to the SPU, two interfaces to the Bus Interface Unit (BIU), and two interfaces to an optional SL1 cache.
  5. SPU channel interface allows the SPU to access MFC facilities and to issue MFC commands.
  6. SPU local storage interfaces is used by the MFC to access the local storage in the SPU.
  7. One interface to the BIU allows memory-mapped I/O (MMIO) access to the MFC facilities. This interface also allows other processors to issue MFC
    commands. Commands issued using MMIO are referred to as MFC proxy commands.
  8. The other interface to the BIU carries the real address.
  9. The interfaces to the SL1 cache are mainly for data transfers.
  10. One interface is used by the MFC for access to the address translation tables in main storage and the other interface of the SL1 cache is used for the transfer of data between main storage and local storage.
As shown in Figure 6-1, the following are the main units in a typical MFC:
• MMIO interface
• MFC registers
• DMA controller
  1. The MMIO interface maps the MFC facilities of the SPU into the real address space of the system. (allows access to the MFC facilities from any processor, or any device in the system.)
  2. MMIO interface can be configured to map the local storage of the SPU into the real address space. (map local storage to real address space)
    - allows direct access to the local storage from any processor or any device in the system
    - enabling local-store-to local-store transfers
    - ability for I/O devices to directly access the local storage domain
  3. Coherency is not maintained between SPU and MMIO accesses of the local storage domain.


6.1 MFC Facilities

  1. Most of the MFC facilities are contained in the MFC Registers unit.
  2. Some facilities are contained in the Direct Memory Access Controller (DMAC).
  3. The facilities within the MFC:
    User mode environment facilities include:
    • Mailbox Facility (see page 90)
    • SPU Signal Notification Facility (see page 94)
    • Proxy Tag-Group Completion Facility (see page 82)
    • MFC Multisource Synchronization Facility (see page 96)
    • SPU Control and Status Facilities (see page 86)
    • SPU Isolation Facility (see page 163)

    Privileged mode environment facilities include:
    • MFC Privileged Facilities (see page 197)
    – MFC State Register One (see page 197)
    – MFC Logical Partition ID Register (see page 199)
    – MFC Storage Description Register (see page 200)
    – MFC Data Address Register (see page 201)
    – MFC Data Storage Interrupt Status Register (see page 202)
    – MFC Address Compare Control Register (see page 203)
    – MFC Local Storage Address Compare Facility (see page 205)
    – MFC Command Error Register (see page 207)
    – MFC Data Storage Interrupt Pointer Register (see page 208)
    – MFC Control Register (see page 209)
    – MFC Atomic Flush Register (see page 212)
    – SPU Outbound Interrupt Mailbox Register (see page 213)

    • SPU Privileged Facilities (see page 215)
    – SPU Privileged Control Register (see page 215)
    – SPU Local Storage Limit Register (see page 217)
    – SPU Configuration Register (see page 221)
    • SPE Context Save and Restore (see page 223)
  4. The SPEs and PPE instruct the MFC to perform these DMA operations by queuing DMA command requests to the MFC through one of the command queues:
    • Commands issued by an SPE are queued to the MFC SPU command queue
    • Commands issued by a PPE are queued to the MFC proxy command queue
  5. The MFC uses a MMU to perform all MFC address translations and MFC access protection checks required for the DMA transfers.
  6. The MMU handles MFC transfers in much the same way that the PPE storage addressing facility handles load-and-store operations.

CBEA Note - 05

5. Synergistic Processor Unit

  1. The intent of the SPU is to fill a void between general-purpose processors and special-purpose hardware.
  2. SPU aims to achieve leadership performance on critical workloads for game, media, and broadband systems.
  3. The intent of the SPU and the CBEA is to provide a high degree of control to expert (real-time) programmers while maintaining ease of programming.
  4. The SPU implements a new instruction set architecture (ISA).
  5. The main characteristics of this architecture are:
    Load-and-store architecture with sequential semantics, using a set of 128 registers, each of which is 128 bits wide.
    Single-instruction, multiple-data (SIMD) capability
    – Sixteen 8-bit integers
    – Eight 16-bit integers
    – Four 32-bit integer or four single-precision floating-point values
    – Two double-precision floating point
    Load-and-store access to an associated local storage.
    Channel input/output for MFC control (used for external data access).
  6. The SPU has the following restrictions:
    No direct access to main storage (access to main storage using MFC facilities only)
    No distinction between user mode and privileged state
    No access to critical system control such as page-table entries (this restriction should be enforced by PPE privileged software).
    No synchronization facilities for shared local storage access
  7. The intent of the SPU is to enable applications that require a high computational unit density.

CBEA Note - 04

4. PowerPC Processor Element

  1. The CBEA includes a PowerPC processor, which, with the MFC is known as the PowerPC Processor Element (PPE).
  2. The PPE must be a 64-bit implementation, all effective addresses and registers, except some special-purpose and memory-mapped I/O (MMIO) registers are 64 bits long.
  3. All implementations have two modes of operation: 64-bit mode and 32-bit mode.
  4. All instructions are available in both modes.
  5. The CBEA does not permit a PPE implementation that provides only the equivalent of 32-bit mode.

4.1 PowerPC Architecture Book I and Book II Compatibility



The PPE provides binary compatibility for PowerPC applications, except as described in Section 4.1.2 Incompatibilities with PowerPC Architecture, Book I on page 39.

4.1.1 Optional Features in PowerPC Architecture, Book I (Required for CBEA)
  1. The following facilities and instructions are considered optional in the PowerPC Architecture, but are required for the PPE by the CBEA user mode environment.
    • Floating reciprocal estimate single A-form (fres)
    • Floating reciprocal square-root estimate A-form (frsqte)
    • Vector/SIMD multimedia extension
4.1.2 Incompatibilities with PowerPC Architecture, Book I

  1. Currently there are no incompatibilities with PowerPC Architecture, Book I.

4.1.3 Optional Features in PowerPC Architecture, Book II (Required for CBEA)

  1. The following facilities and instructions are considered optional in the PowerPC Architecture, but are required in the CBEA.

    • Data cache block touch X-form (dcbt)

    This is an optional version of dcbt that permits a program to provide a hint that a sequence of data cache blocks is likely to be needed soon.

4.1.4 Incompatibilities with PowerPC Architecture, Book II

  1. Currently there are no incompatibilities with PowerPC Architecture, Book II.

4.1.5 Extensions to the PowerPC Architecture

  1. For information on extensions in the CBEA to the PowerPC Architecture, see Appendix E .