Fermat's Library | Achieving Full Motion Video on the Nintendo 64 annotated/explained version.

Full Motion Video (FMV) refers to pre-recorded video sequences used...

For one frame: - Resolution: $320 \times 160$ pixels - Color de...

### PlayStation's MDEC (Motion Decoder) The MDEC (Motion Decoder...

### Intel 486 The Intel 486 was commonly used in personal computer...

### RSP The Reality Signal Processor (RSP) was a programmable micr...

### RGB Color Space RGB represents colors as combinations of Red, ...

The coefficient matrix is the 3x3 matrix used to transform RGB to Y...

To break it down: 1. Original Data Distribution: * Initially, Y, ...

### DCT The Discrete Cosine Transform (DCT) is a mathematical te...

## Overview of MPEG Compression MPEG (Moving Picture Experts Gr...

### Expansion Pak The N64 had an Expansion Pak (NUS-007), which ...

Some context on some of the terms used by the author:   **DMA (Dir...

### Shift instruction A shift instruction moves bits left or rig...

### Putting Curved Surfaces to Work on the Nintendo 64 Putting C...

### Nintendo 64 Programming languages The Nintendo 64's programm...

Discussion

For one frame: - Resolution: $320 \times 160$ pixels - Color depth: 24-bit = 3 bytes per pixel - Bytes per frame = $320 \times 160 \times 3 = 153,600$ bytes For 15 minutes of video: - Frames per second: 30Hz - Seconds: $15 \text{ minutes} \times 60 = 900$ seconds - Total frames = $900 \text{ seconds} \times 30 \text{ fps} = 27,000$ frames - Total size = $27,000 \text{ frames} \times 153,600 \text{ bytes} = 4,147,200,000$ bytes To break it down: 1. Original Data Distribution: * Initially, Y, Cb, and Cr each take up 1/3 of the image data * Total = 100% of original size 2. Compression Method: * Y (brightness) is kept at full resolution (1/3 of data) * Cb and Cr (color) are compressed by averaging each 2×2 pixel block * This reduces color information to 25% of original (1/4) * So Cb and Cr each become 1/12 of original data (1/3 × 1/4 = 1/12) 3. Final Size Calculation: * Y: 1/3 (unchanged) * Cb: 1/12 (reduced to 25%) * Cr: 1/12 (reduced to 25%) * Total: 1/3 + 1/12 + 1/12 = 1/2 Full Motion Video (FMV) refers to pre-recorded video sequences used in video games, which are played back in their entirety as part of the game experience. These videos are typically used for cutscenes, intros, or narrative sequences to tell parts of the story or present cinematic moments, as opposed to being rendered in real-time by the game's engine. ### Intel 486 The Intel 486 was commonly used in personal computers (PCs) during the early-to-mid 1990s. It was a standard processor in both home and business desktop computers, appearing in systems from manufacturers like IBM, Compaq, and Dell. It was particularly notable for being the processor that could comfortably run Windows 3.1 and early versions of Windows 95, making it a crucial chip during the transition from DOS to graphical user interfaces in personal computing. ![](https://i.imgur.com/ZQkUsqH.jpeg) ### Shift instruction A shift instruction moves bits left or right in a register - it's a fundamental CPU operation used for multiplying/dividing by powers of 2 efficiently and for manipulating binary data. The RSP's vector unit was focused on floating-point SIMD operations for 3D graphics and didn't include this basic integer operation. Without a shift instruction, you typically have to use alternatives like: 1. Multiplication/division operations (to simulate bit shifts) 2. Bitwise operations and multiple steps to achieve the same result 3. Lookup tables The coefficient matrix is the 3x3 matrix used to transform RGB to YCbCr: $$\left[\matrix{0.2989 & 0.5866 & 0.1145 \cr 0.1687 & -0.3312 & 0.5 \cr 0.5 & -0.4183 & -0.0816}\right]$$ It's called a "coefficient matrix" because each number in it is a coefficient that determines how much each RGB value contributes to the final YCbCr values. For example: For luminance (Y), red contributes 0.2989, green 0.5866, and blue 0.1145 These specific coefficients were chosen based on human perception - notice how green has the largest coefficient for luminance because our eyes are most sensitive to green light When the text talks about "inverting" this matrix, it means finding the matrix that will transform the values back to RGB - essentially undoing the original transformation for display. ### Expansion Pak The N64 had an Expansion Pak (NUS-007), which was released in 1998 and added 4MB of RAM to the system, doubling its memory from 4MB to 8MB. Notable games that REQUIRED the Expansion Pak: - Donkey Kong 64 - Perfect Dark (multiplayer and most single-player features) - The Legend of Zelda: Majora's Mask Games that were ENHANCED by the Expansion Pak: - Resident Evil 2 (higher resolution textures) - StarCraft 64 (enabled split-screen mode) - Tony Hawk's Pro Skater 2 (higher resolution mode) ![](https://i.imgur.com/DcnwlK0.jpeg) Some context on some of the terms used by the author:   **DMA (Direct Memory Access)**: A hardware feature that allows data to be moved between different types of memory without using the CPU. **DMEM (Data Memory)**: The RSP's dedicated 4KB memory space where it performs its calculations. Cache Invalidation: The process of marking cached data as invalid when it's been modified elsewhere, forcing a refresh from main memory. **SIMD (Single Instruction, Multiple Data)**: A type of parallel processing where one instruction can process multiple data points simultaneously. ### RSP The Reality Signal Processor (RSP) was a programmable microprocessor in the N64 that was primarily designed for 3D graphics and audio processing. It was part of the Reality Co-Processor (RCP) chip, which was co-developed by Nintendo and Silicon Graphics. The key difference from the PS1's architecture was that the RSP was programmable - developers could use it for tasks beyond its primary graphics purpose. This flexibility is what allowed the Resident Evil 2 team to use it for video decompression, running in parallel with the main CPU. They effectively repurposed this 3D graphics chip to help with FMV playback. The PS1 took a different approach - instead of a programmable chip, it had dedicated hardware for specific functions. ### PlayStation's MDEC (Motion Decoder) The MDEC (Motion Decoder) chip was a dedicated hardware decompressor found in the original PlayStation (PS1). Its specific purpose was to handle the decompression of video data in real-time, which was particularly important for playing full-motion video sequences from games. Key points about the MDEC: - It was a specialized hardware decoder that could decompress JPEG-like data for motion video - Offloaded video decompression from the main CPU, making FMV playback much more efficient - Could handle the mathematical operations needed for video decompression (like DCT - Discrete Cosine Transform) in hardware rather than software ![MDEC chip in the PlayStation 1](https://i.imgur.com/Lh5JXEt.png) *MDEC in the PS1* ### Putting Curved Surfaces to Work on the Nintendo 64 Putting Curved Surfaces to Work on the Nintendo 64 (DeLoura, 1999) provides a detailed technical breakdown of implementing Bézier surface rendering on the N64's Reality Signal Processor (RSP). The article is particularly significant as it was one of the first public disclosures of RSP microcode programming techniques, which were previously kept under NDAs. DeLoura demonstrates how to achieve efficient curved surface tessellation by exploiting the RSP's vector processing capabilities, showing a performance improvement from 272,500 CPU cycles (using standard CPU floating-point calculations) to just 16,600 cycles using RSP microcode. The article also offers valuable insights into the N64's architecture and parallel processing capabilities. For more details, you can find the full article [here](https://ultra64.ca/files/other/Game-Developer-Magazine/GDM_November_1999_Putting_Curved_Surfaces_to_Work_on_the_Nintendo_64.pdf). ### DCT The Discrete Cosine Transform (DCT) is a mathematical technique that converts spatial image data into frequency components, serving as a crucial step in JPEG/MPEG compression. #### What DCT Does: The transform works by taking an 8×8 block of pixels in the spatial domain and converting it into 8×8 coefficients in the frequency domain. These coefficients represent how much each frequency pattern contributes to the block, with low frequencies representing gradual changes in the image and high frequencies capturing rapid changes and fine details. #### Why Use DCT: DCT is particularly useful because human eyes are more sensitive to low frequencies than high frequencies. This means we can throw away (or heavily quantize) high-frequency coefficients with minimal visible quality loss. Natural images typically have most of their energy concentrated in low frequencies, making this especially effective. The resulting DCT coefficients compress well because many high-frequency coefficients end up being near zero, allowing for significant data reduction with minimal perceptible image degradation. The transformation effectively separates the image information into components of varying visual importance, allowing for more efficient compression by preserving what our eyes care about most while discarding what we're less likely to notice. For a more detailed explanation [here’s](https://www.youtube.com/watch?v=Q2aEzeMDHMA&list=PLzH6n4zXuckoAod3z31QEST1ZaizBuNHh&index=3) a great video series on JPEG compression from Computerfile. ## Overview of MPEG Compression MPEG (Moving Picture Experts Group) compression is a method used to reduce the size of digital video files by eliminating redundant or less noticeable information while maintaining acceptable video quality. MPEG achieves this using both spatial and temporal compression techniques: ### 1. Spatial Compression (Intra-frame compression) This is similar to JPEG image compression and works within a single video frame (picture). It reduces redundancy by compressing blocks of pixels that have similar colors or patterns. The technique primarily focuses on reducing data without significantly affecting visual quality in each individual frame. ### 2. Temporal Compression (Inter-frame compression) This technique reduces data by exploiting similarities between consecutive frames in a video sequence. Instead of storing each frame in full, MPEG stores only the changes (motion or differences) between a reference frame (called an "I-frame") and subsequent frames (P-frames and B-frames): - **I-frames (Intra-frames)**: These are full frames that contain all the data needed to reconstruct the image. They act as reference points. - **P-frames (Predicted frames)**: These contain only the difference from the preceding I-frame or P-frame, relying on temporal redundancy. - **B-frames (Bidirectional frames)**: These store the difference from both the previous and following frames, allowing for more efficient compression. ### How It Works - **I-frames** are inserted periodically to act as reset points. Every few frames (e.g., every second), a full frame is stored. - **P-frames** look at the closest preceding I-frame or P-frame and encode only the changes. - **B-frames** use both the preceding and succeeding frames to encode more precise differences. ### Nintendo 64 Programming languages The Nintendo 64's programming architecture offered developers different levels of hardware access, each with its own trade-offs between ease of use and performance. While most development could be done in C, achieving maximum performance often required diving into assembly or microcode programming. C Programming: The primary language for N64 development, C provided a good balance between performance and programmer productivity. Most game logic, system management, and non-performance-critical code was written in C using Nintendo's official development kit. Its relative ease of use made it the default choice for most development tasks. MIPS Assembly: The N64's R4300i CPU used the MIPS instruction set, and developers could write code directly in MIPS assembly language. While this offered maximum control and potential performance, it required significant expertise and development time. RSP Microcode: The Reality Signal Processor (RSP) could be programmed with custom microcode, offering direct access to its vector processing capabilities. This was typically used for graphics and audio processing, but creative developers could repurpose it for other tasks requiring parallel processing. This layered approach to N64 programming reflected a common pattern in console development: using higher-level languages for general tasks while reserving low-level programming for situations where maximum performance was essential. ### RGB Color Space RGB represents colors as combinations of Red, Green, and Blue components, similar to how a computer display works. Each pixel in the original video frames stored: - R: Red intensity (0-255) - G: Green intensity (0-255) - B: Blue intensity (0-255) ### YCbCr Color Space YCbCr represents the same colors but split into components that match human visual perception: - Y: Luminance (brightness) - Cb: Blue-difference - Cr: Red-difference The Y component (luminance) represents the overall brightness, essentially what you'd see on a black and white TV. The chrominance components (Cb and Cr) represent color differences: Cb measures how much the blue component differs from the luminance, while Cr does the same for red. When Cb is positive, it means the color has more blue than you'd expect from its brightness alone; when negative, it has less blue than expected; and when zero, the blue perfectly matches what you'd expect from the brightness level. ![](https://i.imgur.com/1lXNggW.png)

Comments

Products

Project