skywalker-1/docs/boot-debug-findings.md
Ryan Malloy bbdcb243dc Normalize line endings to LF across entire repository
Apply .gitattributes normalization to convert all CRLF line
endings inherited from Windows-origin source files to Unix LF.
175 files, zero content changes.
2026-02-20 10:55:50 -07:00

14 KiB

BOOT_8PSK Debugging Findings

Technical reference for the BCM4500 demodulator boot sequence on the Genpix SkyWalker-1 (Cypress FX2 CY7C68013A + Broadcom BCM4500), firmware v3.01.0. Documents the root cause analysis of a firmware hang during I2C initialization and the fixes applied.

Hardware: Genpix SkyWalker-1 USB 2.0 DVB-S receiver MCU: Cypress CY7C68013A (FX2LP), 8051 core at 48MHz Demodulator: Broadcom BCM4500 Firmware: Custom v3.01.0 (SDCC + fx2lib) I2C bus speed: 400kHz


The Problem

Custom firmware v3.01.0 implements vendor command BOOT_8PSK (bRequest=0x89, wValue=1), which powers on the BCM4500 demodulator and initializes it via I2C. When first tested, this command caused the FX2 firmware to hang for over 10 seconds, making the USB device completely unresponsive -- no vendor command would return, and the host-side USB stack would report timeout errors.

The initial suspicion was infinite I2C loops. The fx2lib I2C library uses bare while loops that poll hardware status bits with no timeout:

// fx2lib/lib/i2c.c -- original code
while ( !(I2CS & bmDONE) && !cancel_i2c_trans);

The cancel_i2c_trans variable is intended as an external abort mechanism, but nothing in the firmware sets it during normal operation. If the I2C controller never asserts bmDONE (for example, because a slave is holding SCL low), the firmware spins indefinitely in this loop.

Adding I2C timeout protection (described below) eliminated the infinite-hang symptom, but the boot sequence still failed: the BCM4500 probe read returned NACK, and all three register initialization blocks failed.

Root Cause: Spurious I2C STOP Condition

The boot function originally included a so-called "I2C bus reset" step before any I2C communication:

I2CS |= bmSTOP;
i2c_wait_stop();

This pattern appears in various FX2 example code and seems reasonable on its face -- send a STOP condition to ensure the I2C bus is in a known idle state before starting fresh. On the FX2's I2C controller hardware, this is incorrect.

Incremental Debug Modes

The root cause was discovered through a series of incremental debug modes added to the BOOT_8PSK vendor command handler. Each mode executes a subset of the full boot sequence, isolating which step introduces the failure:

wValue Action Result
0x80 No-op: return config_status and boot_stage only Works
0x81 GPIO + power + delays only (no I2C at all) Works
0x82 GPIO + power + bmSTOP + I2C probe read Fails
0x83 GPIO + power + bmSTOP + probe + init block 0 Fails (same root cause)
0x84 bcm_direct_read only (no GPIO, chip already powered) Works
0x85 GPIO + power + reset, no bmSTOP, then probe Works

Three observations clinch the diagnosis:

  1. Mode 0x82 fails but mode 0x85 succeeds. These two modes are identical except that 0x82 issues I2CS |= bmSTOP before the probe read and 0x85 does not. The bmSTOP is the only difference, and it is the only thing that breaks I2C.

  2. Mode 0x84 succeeds immediately after 0x82 fails. Mode 0x84 calls bcm_direct_read with no GPIO manipulation or bus reset -- just a plain I2C combined read. If called after a failed 0x82, it succeeds. This proves two things: the BCM4500 is alive and responding on I2C, and the i2c_combined_read function itself is correct. The failure in 0x82 is not a timing or power issue.

  3. Raw I2C reads via vendor command 0xB5 succeed after 0x82 fails. Command 0xB5 uses the same i2c_combined_read function as bcm_direct_read. Running it from the host side after a failed 0x82 returns valid data from the BCM4500. This confirms the chip was alive the whole time -- the FX2's I2C controller was in a bad state, not the bus or the slave.

The test scripts that drove this investigation are in the tools/ directory:

  • test_boot_debug.py -- sends debug modes 0x80 through 0x83 sequentially
  • test_i2c_debug.py -- powers on via 0x81, runs bus scans, tests probe timing
  • test_i2c_isolate.py -- tests whether re-reset or insufficient delay causes failure
  • test_i2c_pinpoint.py -- the definitive test: compares 0x84, 0x85, and 0x82

What Happens Inside the FX2 I2C Controller

The FX2's I2C master controller is a hardware peripheral accessed through the I2CS, I2DAT, and I2CTL SFRs. The controller implements an I2C state machine in silicon. Writing bmSTOP to I2CS instructs the hardware to generate a STOP condition (SDA rising while SCL is high).

When no I2C transaction is active -- no prior START has been issued, and the bus is idle -- writing bmSTOP puts the controller into an inconsistent internal state. The bmSTOP bit may not clear properly (it is supposed to self-clear when the STOP condition completes on the bus), and subsequent START conditions fail to generate proper clock sequences or detect ACK from slaves.

The Cypress TRM (EZ-USB Technical Reference Manual) does not explicitly warn against this, but the I2C chapter describes STOP as a step that follows a completed read or write transaction. It is not documented as a standalone bus-reset mechanism.

The correct way to ensure a clean I2C bus state on the FX2 is to simply proceed with a new START condition. If the bus is idle (which it will be after power-on or after the previous transaction completed normally), the START succeeds and the controller enters its normal operating state. The hardware handles bus arbitration automatically on START.

The Fix

The fix is a single deletion. Remove the spurious STOP from the boot sequence:

/* BEFORE (broken): */
I2CS |= bmSTOP;
i2c_wait_stop();

/* AFTER (correct): */
/* NOTE: Do NOT send I2CS bmSTOP here. Sending STOP when no transaction
 * is active corrupts the FX2 I2C controller state, causing subsequent
 * START+ACK detection to fail. The I2C bus will be in a clean state
 * when we reach the probe step -- any prior transaction ended with STOP. */

The corrected bcm4500_boot() function proceeds directly from GPIO/power setup to the I2C probe read without any bus-reset step:

static BOOL bcm4500_boot(void) {
    boot_stage = 1;
    cancel_i2c_trans = FALSE;

    /* P3.7, P3.6, P3.5 HIGH (idle state for control lines) */
    IOD |= 0xE0;

    /* Assert BCM4500 hardware RESET (P0.5 LOW) */
    OEA |= PIN_BCM_RESET;
    IOA &= ~PIN_BCM_RESET;

    /* No I2CS bmSTOP here -- see note above */

    /* Power on: P0.1 HIGH (enable), P0.2 LOW (disable off) */
    OEA |= (PIN_PWR_EN | PIN_PWR_DIS);
    IOA = (IOA & ~PIN_PWR_DIS) | PIN_PWR_EN;

    boot_stage = 2;
    delay(30);               /* power settle */

    IOA |= PIN_BCM_RESET;    /* release reset */
    delay(50);               /* BCM4500 POR + mask ROM boot */

    boot_stage = 3;
    /* I2C probe -- if this fails, the chip didn't come out of reset */
    if (!bcm_direct_read(BCM_REG_STATUS, &i2c_rd[0]))
        return FALSE;

    /* ... register init blocks follow ... */
}

I2C Timeout Protection

Even with the bmSTOP fix, timeout protection on all I2C operations is essential. The FX2's I2C controller has no hardware timeout -- if a slave device holds SCL low (clock stretching), or if an electrical fault prevents bmDONE from asserting, the firmware will spin forever in a polling loop.

The Problem with fx2lib

The fx2lib i2c_write() and i2c_read() functions poll bmDONE and bmSTOP with loops like:

while ( !(I2CS & bmDONE) && !cancel_i2c_trans);

The cancel_i2c_trans flag is declared as volatile __xdata BOOL and is set to FALSE at the start of each transaction. The library documentation says firmware can set it to TRUE from an interrupt to abort a stuck transaction. In practice, nothing in the firmware sets it, so these loops are effectively:

while (!(I2CS & bmDONE));  // infinite if bmDONE never asserts

Timeout-Protected Replacements

The custom firmware replaces all fx2lib I2C functions with timeout-protected wrappers:

#define I2C_TIMEOUT 6000

static BOOL i2c_wait_done(void) {
    WORD timeout = I2C_TIMEOUT;
    while (!(I2CS & bmDONE)) {
        if (--timeout == 0)
            return FALSE;
    }
    return TRUE;
}

static BOOL i2c_wait_stop(void) {
    WORD timeout = I2C_TIMEOUT;
    while (I2CS & bmSTOP) {
        if (--timeout == 0)
            return FALSE;
    }
    return TRUE;
}

A WORD counter of 6000, decremented in a tight SDCC-compiled loop at 48MHz (4 clocks per 8051 machine cycle, ~12 MIPS), gives approximately 5-10ms per wait. At 400kHz I2C, a single byte transfer (9 clock pulses) takes 22.5 microseconds, so this timeout provides well over 200x margin for normal operations while still bounding the worst case.

All BCM4500 I2C operations -- i2c_combined_read, i2c_write_timeout, i2c_write_multi_timeout -- use these timeout-protected waits and return FALSE on timeout, allowing the caller to report failure rather than hanging the firmware.

Kernel Driver Race Condition

The dvb_usb_gp8psk kernel module auto-loads via udev when VID:PID 09C0:0203 appears on the USB bus. This happens every time the FX2 re-enumerates after firmware load. The kernel driver races with the test tools and sends its own BOOT_8PSK command (along with other initialization), which interferes with debugging.

Symptoms of this race condition:

  • Test scripts report "resource busy" or "entity not found" errors
  • The BCM4500 enters an unexpected state because the kernel driver partially initialized it
  • The kernel driver detaches from the device mid-test

The fix is to blacklist the module:

# /etc/modprobe.d/blacklist-gp8psk.conf
blacklist dvb_usb_gp8psk
blacklist gp8psk_fe

After creating this file, run sudo modprobe -r dvb_usb_gp8psk gp8psk_fe to unload any currently-loaded instances. The blacklist prevents udev from auto-loading the module on device insertion, giving test tools exclusive access.

I2C Bus Scan Results

Vendor command 0xB4 performs a full 7-bit I2C bus scan by attempting a START + address + WRITE to every address from 0x01 to 0x77 and checking for ACK. Three devices were found:

Address Identity
0x08 BCM4500 demodulator. Status register 0xA2 returns valid data. This is the primary device for all demodulator operations.
0x10 Likely the tuner or LNB controller. The SkyWalker-1 uses a separate tuner IC (accessed through the BCM4500 in normal operation, but also directly addressable on the shared I2C bus).
0x51 Likely a configuration EEPROM. Many DVB-S receivers store tuner calibration data or device serial numbers in a small I2C EEPROM at addresses in the 0x50-0x57 range.

The BCM4500's 7-bit I2C address of 0x08 corresponds to 8-bit wire addresses of 0x10 (write) and 0x11 (read).

BCM4500 Boot Results After Fix

With the bmSTOP removed, the full boot sequence completes reliably:

  • Boot time: ~90ms total (30ms power settle + 50ms post-reset delay + ~10ms I2C init)
  • config_status: 0x03 (STARTED | FW_LOADED)
  • boot_stage: 0xFF (COMPLETE)
  • Direct registers 0xA2-0xA8: All return 0x02 (powered, not locked -- expected without a satellite signal)
  • Signal lock: 0x00 (no lock -- dish not aimed at satellite)
  • Signal strength: All zeros (same reason)
  • USB responsiveness: No hang. The firmware remains fully responsive to vendor commands throughout boot and afterward.

Firmware v3.01.0 Boot Sequence (Corrected)

The complete boot sequence as implemented in bcm4500_boot():

  1. Assert BCM4500 RESET -- Drive P0.5 LOW. This holds the BCM4500's digital logic in reset while power is applied.

  2. Power on -- Set P0.1 HIGH (power enable), P0.2 LOW (power disable off). The SkyWalker-1 has complementary power control pins.

  3. delay(30ms) -- Allow the power supply to settle and reach regulation. The stock firmware uses the same delay.

  4. Release RESET -- Drive P0.5 HIGH. The BCM4500 begins its internal power-on reset (POR) and mask ROM boot sequence.

  5. delay(50ms) -- Wait for the BCM4500's POR and internal initialization to complete. The chip needs time for its internal oscillator to stabilize and mask ROM to execute.

  6. I2C probe -- Read direct register 0xA2 (status) to verify the chip is alive and responding on I2C. If this fails, the boot aborts.

  7. Write init block 0 -- 7 bytes to BCM4500 indirect page 0, starting at register 0x06. Written via the 0xA6/0xA7/0xA8 indirect register protocol. Data: {0x06, 0x0b, 0x17, 0x38, 0x9f, 0xd9, 0x80}.

  8. Write init block 1 -- 8 bytes to page 0, starting at register 0x07. Data: {0x07, 0x09, 0x39, 0x4f, 0x00, 0x65, 0xb7, 0x10}.

  9. Write init block 2 -- 3 bytes to page 0, starting at register 0x0F. Data: {0x0f, 0x0c, 0x09}.

  10. Set config_status -- OR in BM_STARTED | BM_FW_LOADED (0x03). Subsequent vendor commands (tuning, signal strength readout, etc.) check this flag before operating.

The three initialization blocks were extracted from disassembly of the stock v2.06 firmware's FUN_CODE_0ddd routine, which performs the same indirect register writes.

FX2 Hardware Recovery Note

The FX2's CPUCS register at address 0xE600 controls the 8051 CPU's run/halt state. It is accessible via the standard vendor request bRequest=0xA0 (RAM read/write) even when the user firmware is completely hung in an infinite loop.

This works because bRequest=0xA0 is handled by the FX2 silicon's boot ROM, not by firmware. The boot ROM's USB handler runs in a hardware-priority context that preempts the 8051's main loop. Writing 0x01 to CPUCS halts the CPU, new firmware can be loaded into RAM, and writing 0x00 starts it again.

This means fw_load.py can reload firmware over a hung device without requiring a physical USB unplug/replug or power cycle. For iterative firmware development, this is significant -- a failed boot attempt that hangs the firmware can be recovered from the host side in seconds:

sudo python3 tools/fw_load.py load firmware/build/skywalker1.ihx --wait 3

The load sequence halts the CPU (CPUCS=0x01), writes new code into RAM, then restarts the CPU (CPUCS=0x00). The device re-enumerates with the new firmware.