From bb235f51fa81d93a9a4cc8faeadb43a8df7fe6ef Mon Sep 17 00:00:00 2001 From: Ryan Malloy Date: Tue, 17 Feb 2026 19:53:42 -0700 Subject: [PATCH] Add SP-GiST orbital trie design spec for satellite pass prediction index Evolved from the original KTrie custom AM proposal (preserved as KTRIE-SPEC-ORIGINAL.md). Key design decisions: 2-level trie (SMA + inclination) instead of 5, SP-GiST framework instead of custom AM, query-time RAAN filter instead of trie level, propagation-aware cost estimation via traversalValue. --- docs/ktrie/KTRIE-SPEC-ORIGINAL.md | 481 +++++++++++++++++ docs/ktrie/SP-GIST-ORBITAL-TRIE.md | 799 +++++++++++++++++++++++++++++ docs/ktrie/ktrie-layout.jsx | 544 ++++++++++++++++++++ 3 files changed, 1824 insertions(+) create mode 100644 docs/ktrie/KTRIE-SPEC-ORIGINAL.md create mode 100644 docs/ktrie/SP-GIST-ORBITAL-TRIE.md create mode 100644 docs/ktrie/ktrie-layout.jsx diff --git a/docs/ktrie/KTRIE-SPEC-ORIGINAL.md b/docs/ktrie/KTRIE-SPEC-ORIGINAL.md new file mode 100644 index 0000000..b176ef0 --- /dev/null +++ b/docs/ktrie/KTRIE-SPEC-ORIGINAL.md @@ -0,0 +1,481 @@ +# KTrie: Keplerian Patricia Trie — PostgreSQL Index Access Method + +## Purpose + +KTrie is a custom PostgreSQL index access method designed for spatiotemporal satellite queries. It indexes Two-Line Element (TLE) sets by decomposing Keplerian orbital element space into a hierarchical trie with Patricia path compression and adaptive branching. The goal is to prune the satellite catalog analytically before invoking the expensive SGP4/SDP4 orbital propagator, which is the dominant cost in any satellite query. + +This extension is part of a larger spatiotemporal PostgreSQL extension that implements the SGP4 algorithm and related orbital mechanics directly inside the database. The KTrie index eliminates 90%+ of the catalog from propagation consideration using only the orbital elements stored in the TLE, before any numerical propagation occurs. + +--- + +## Architecture Overview + +### Design Principles + +1. **Fixed level semantics, adaptive branching.** Each trie level always represents the same orbital element (semi-major axis at level 0, inclination at level 1, etc.), but the number of children at each node adapts to the local population density. This lets the PostgreSQL query planner push down predicates intelligently (it knows level 1 is always inclination) while still getting fine granularity where the objects actually cluster. + +2. **Patricia path compression.** When a subtree path has single-child nodes (very common in LEO where eccentricity and argument of perigee are near-uniform), the path is compressed into a single node that stores the skipped levels' bounds in its header. This reduces page reads by up to 40% for typical LEO queries. + +3. **Page-aligned to PostgreSQL's 8kB pages.** Every trie node fits in one 8kB page. The node header, entry format, and capacity are designed around this constraint. Split and merge operations maintain page fill targets. + +4. **Population-aware for cost estimation.** Every internal node entry carries a `population` count of objects in its subtree. This feeds directly into PostgreSQL's query planner so it can estimate how many SGP4 propagations a query plan will require — the actual expensive operation. + +5. **WGS-72 internal, WGS-84 external.** All orbital elements stored in the index use WGS-72 constants (because TLEs are fitted with WGS-72). Observer-facing query functions transform through TEME → ITRF using WGS-84. The extension enforces this pipeline so users cannot accidentally mix datums. + +### Trie Level Hierarchy + +``` +Level 0: Semi-Major Axis (a) — km, primary discriminator +Level 1: Inclination (i) — radians, most stable element +Level 2: RAAN (Ω) — radians, precesses rapidly (J2) +Level 3: Eccentricity (e) — dimensionless, 0–1 +Level 4: Arg. of Perigee (ω) — radians, precesses due to J2 + ↓ + Leaf nodes — TLE references with cached elements +``` + +### SGP4 vs SDP4 Routing + +Satellites with orbital period ≥ 225 minutes (semi-major axis threshold corresponding to the `.15625` day fraction in the original FORTRAN) are classified as deep-space and routed to SDP4 instead of SGP4. The index flags these with `HAS_RESONANT` on internal entries and `DEEP_SPACE` on leaf entries so the propagator dispatch is transparent to the user — they call a single function and the extension routes internally. + +--- + +## Data Structures + +### Node Types + +```c +typedef enum KTrieNodeType { + KTRIE_INTERNAL = 0, /* splits orbital element space into child ranges */ + KTRIE_LEAF = 1, /* holds TLE references at bottom of trie */ + KTRIE_COMPRESSED = 2 /* Patricia path-compressed: skips single-child levels */ +} KTrieNodeType; +``` + +### Node Header (72 bytes) + +Every trie node (one per 8kB page) begins with this header: + +```c +typedef struct KTrieNodeHeader { + uint8 type; /* KTRIE_INTERNAL | KTRIE_LEAF | KTRIE_COMPRESSED */ + uint8 level; /* which orbital element this level splits (0-4) */ + uint16 num_entries; /* current number of entries in this node */ + uint16 flags; /* DIRTY | NEEDS_SPLIT | RESONANT */ + uint16 padding; /* alignment */ + float8 range_low; /* this node covers [range_low, range_high) */ + float8 range_high; /* in units of the current level's orbital element */ + uint8 compressed_depth; /* Patricia: number of levels skipped (0 = not compressed) */ + uint8 pad[7]; /* alignment to 8-byte boundary */ + float8 skip_bounds[5]; /* bounds for each compressed/skipped level (unused slots = NaN) */ +} KTrieNodeHeader; /* total: 72 bytes */ +``` + +The `compressed_depth` and `skip_bounds` fields are the Patricia compression mechanism. When a node compresses levels, `compressed_depth` indicates how many levels were skipped, and `skip_bounds` stores the [low, high) range for each skipped level so the query engine can still check predicates against compressed levels without decompressing the path. + +### Internal Node Entry (24 bytes) + +```c +typedef struct KTrieChildEntry { + float8 lower_bound; /* element range start for this child */ + float8 upper_bound; /* element range end for this child */ + BlockNumber child; /* PostgreSQL block number → child page */ + uint16 population; /* total objects in this subtree (for planner cost estimation) */ + uint16 flags; /* SPARSE | DENSE | HAS_RESONANT */ +} KTrieChildEntry; /* total: 24 bytes */ +``` + +**Capacity:** (8192 - 24 PG header - 72 node header) / 24 = **337 max children per internal node.** Adaptive branching uses anywhere from 4 to 337 children depending on population density. + +### Leaf Node Entry (68 bytes) + +```c +typedef struct KTrieLeafEntry { + int32 norad_id; /* NORAD catalog number */ + float8 epoch; /* TLE epoch as Julian date */ + float8 sma; /* semi-major axis in km */ + float8 inc; /* inclination in radians */ + float8 raan; /* right ascension of ascending node in radians */ + float8 ecc; /* eccentricity (dimensionless) */ + float8 argp; /* argument of perigee in radians */ + float8 mean_anomaly; /* mean anomaly in radians */ + ItemPointerData tle_tid; /* 6-byte pointer → heap tuple containing full TLE */ + uint16 flags; /* DECAYING | MANEUVERING | DEEP_SPACE */ +} KTrieLeafEntry; /* total: 68 bytes */ +``` + +**Capacity:** (8192 - 24 - 72) / 68 = **119 max TLE entries per leaf page.** + +The leaf caches all six Keplerian elements so that coarse spatial filtering can happen without touching the heap. The `tle_tid` is a standard PostgreSQL tuple pointer to the heap row containing the full TLE text (both lines), bstar drag term, and any metadata needed by the SGP4 propagator. + +### Leaf Entry Flags + +```c +#define KTRIE_FLAG_DECAYING 0x0001 /* object in orbital decay, shorter TLE validity */ +#define KTRIE_FLAG_MANEUVERING 0x0002 /* recent maneuver detected, TLE may be stale */ +#define KTRIE_FLAG_DEEP_SPACE 0x0004 /* period >= 225 min, route to SDP4 */ +``` + +--- + +## Page Layout + +All nodes are page-aligned to PostgreSQL's 8kB (8192 byte) page size. + +``` +┌─────────────────────────────────────────────┐ +│ PageHeaderData (PostgreSQL standard) 24B │ +├─────────────────────────────────────────────┤ +│ KTrieNodeHeader 72B │ +│ type, level, num_entries, flags │ +│ range_low, range_high │ +│ compressed_depth, skip_bounds[5] │ +├─────────────────────────────────────────────┤ +│ Entries[] (KTrieChildEntry or │ +│ KTrieLeafEntry × N) │ +│ │ +│ Internal: up to 337 × 24B = 8,088B │ +│ Leaf: up to 119 × 68B = 8,092B │ +│ │ +├─────────────────────────────────────────────┤ +│ Special: free_offset, checksum ~4B │ +└─────────────────────────────────────────────┘ +Total: 8,192 bytes +``` + +--- + +## Level Semantics + +### Level 0 — Semi-Major Axis (a) + +The primary discriminator. Orbital altitude is the strongest differentiator between satellite regimes. + +- **Regime boundaries:** sub-LEO (<6,678 km / <300 km alt), LEO (6,678–8,378 km), MEO (8,378–41,378 km), GEO (41,378–42,578 km), super-GEO (>42,578 km). Note: all values are geocentric distance, not altitude. Altitude = SMA - 6,378 km (Earth's equatorial radius under WGS-72). +- **Adaptive fan-out:** 5–20 bins typical. LEO subdivides heavily because ~75% of the tracked catalog lives there. +- **Split strategy:** Equal-population splits, not equal-range. The altitude band 300–600 km might get 8 bins while 600–2000 km gets 3. +- **Query predicate mapping:** Altitude/SMA range queries prune instantly at this level. + +### Level 1 — Inclination (i) + +The most stable orbital element — it rarely changes except under powered thrust. Second-best discriminator after SMA. + +- **Key population clusters:** 0° (equatorial/GEO), 28.5° (Cape Canaveral launches), 51.6° (ISS orbit), 55° (GPS constellation), 63.4° (Molniya critical inclination), 97–98° (sun-synchronous), retrograde orbits up to 180°. +- **Adaptive fan-out:** 4–32 bins. Fine-grained near sun-synchronous (huge population at 97–98°), coarse at high retrograde. +- **Special handling:** Nodes containing 63.4° ± 0.5° are flagged `HAS_RESONANT` because the critical inclination causes singularities in the Brouwer mean element theory that SGP4 is based on. + +### Level 2 — RAAN (Ω) — Right Ascension of the Ascending Node + +- **Range:** 0°–360°, wraps around (circular topology). +- **Caution:** RAAN precesses rapidly due to J2 perturbation (~0.5–7°/day depending on altitude and inclination). Fine subdivision is pointless because RAAN drifts significantly between TLE updates (which arrive every few hours to days). +- **Adaptive fan-out:** 4–8 coarse bins only. +- **Reindex trigger:** When mean RAAN drift since last index build exceeds the bin width, the level should be rebuilt. This can be estimated analytically from J2 precession rates. +- **Patricia compression:** This level is frequently compressed away for near-circular LEO orbits where RAAN discrimination adds little query value. + +### Level 3 — Eccentricity (e) + +- **Range:** 0.0 (perfectly circular) to ~0.95 (extreme HEO like Molniya). +- **Distribution:** Massively skewed. 90%+ of LEO objects have e < 0.02. The distribution is essentially a spike at zero with a long thin tail. +- **Adaptive fan-out:** 2–4 bins. Usually just "near-circular" (e < 0.02), "moderately eccentric" (0.02–0.3), and "highly elliptical" (> 0.3). +- **Patricia compression:** Almost always compressed away in LEO branches where everything is near-circular. Decompresses on split only when a GEO-transfer or HEO object enters the branch. +- **Query relevance:** Eccentricity matters for pass prediction because eccentric orbits have variable ground speed and altitude. + +### Level 4 — Argument of Perigee (ω) + +- **Range:** 0°–360° (circular topology, like RAAN). +- **Stability:** Precesses due to J2 perturbation. Rate depends on inclination and eccentricity. +- **Adaptive fan-out:** 2–6 bins. Most aggressively compressed level. +- **Patricia compression:** Compressed away in most branches. Only discriminates within dense clusters where all higher-level elements are similar (e.g., differentiating individual Starlink shells that share the same a, i, Ω, and near-zero e). +- **Primary use case:** Breaking ties in mega-constellation clusters. + +--- + +## Patricia Path Compression + +When a subtree path contains single-child nodes (one child at a given level because all objects in that branch fall in the same range), the path is compressed. + +### Before compression (5 page reads): + +``` +L0(a=6798) → L1(i=51.6°) → L2(Ω=134°) → L3(e=0.0001) → L4(ω=22°) → leaf + ↓ ↓ ↓ ↓ ↓ + page read page read page read page read page read +``` + +### After compression (3 page reads): + +``` +L0(a=6798) → L1(i=51.6°) → COMPRESSED[Ω∈(90°,180°), e∈(0,0.02), ω∈(0°,360°)] → leaf + ↓ ↓ ↓ + page read page read page read +``` + +The compressed node's `skip_bounds` array stores the bounds for levels 2, 3, and 4. The query engine checks incoming predicates against these bounds — if a query specifies `e > 0.5`, it can reject this compressed branch without decompressing. If the compressed branch needs to be split (because new objects with different characteristics arrive), the compressed node is expanded back into individual levels. + +### Compression triggers + +- On insert: if a new entry would create a single-child level, compress instead. +- On bulk load: after initial trie construction, a bottom-up compression pass identifies and compresses all single-child chains. + +### Decompression triggers + +- On insert: if a new entry falls outside the `skip_bounds` of a compressed node, decompress the path and insert normally. +- Decompression creates new intermediate pages as needed. + +--- + +## Split and Merge Operations + +### Split + +When a node exceeds 85% fill factor: + +1. Choose the split point along the current level's orbital element dimension. +2. **Split strategy:** Equal-population split, not equal-range. Find the element value that divides the entries into two roughly equal groups. This keeps leaf occupancy balanced and query latency predictable. +3. Allocate a new page for the right half. +4. Update the parent's child entries (replace one entry with two). +5. If the parent overflows, split it recursively. + +### Merge + +When a node drops below 25% fill factor: + +1. Check if the node can merge with a sibling (adjacent range at the same level under the same parent). +2. If combined population < 70% of capacity, merge into one page and free the other. +3. Update the parent's child entries (replace two entries with one). +4. If the parent underflows, check for merge recursively. + +### Fill factor targets + +| Node Type | Split Threshold | Merge Threshold | Target Fill | +|-----------|----------------|-----------------|-------------| +| Internal | 85% (286 children) | 25% (84 children) | ~60% | +| Leaf | 85% (101 entries) | 25% (30 entries) | ~60% | + +--- + +## Query Traversal + +### Example: Pass prediction from Eagle, Idaho + +```sql +SELECT s.norad_id, s.name, + sgp4_passes(s.tle, observer(43.6955, -116.3530, 760), now(), interval '2 hours') +FROM satellites s +WHERE ktrie_passes_possible(s.tle, observer(43.6955, -116.3530, 760), now(), interval '2 hours'); +``` + +### Traversal steps + +1. **L0 (Semi-Major Axis):** Observer at ~43.7° latitude can only see satellites up to a certain altitude based on minimum elevation angle. For a 10° minimum elevation, the maximum visible altitude is roughly 2,500 km for a directly-overhead pass. Prune all branches with SMA > ~8,878 km (LEO/low-MEO only). This eliminates MEO, GEO, and beyond. **~75% of catalog remains** (LEO is dense), but all non-LEO branches are gone. + +2. **L1 (Inclination):** A ground station at 43.7° latitude can only see satellites with inclination ≥ ~33.7° (latitude minus max off-track angle). Prune all equatorial and low-inclination branches. **~60% of LEO eliminated** (equatorial and sub-40° inclination objects gone). + +3. **L2 (RAAN):** Based on current sidereal time and the 2-hour query window, only certain RAAN ranges produce ground tracks passing over Idaho. Coarse prune. **~50% of remaining eliminated.** + +4. **L3–L4:** Usually compressed in LEO. If not compressed, minor additional pruning on eccentricity (very eccentric objects have different visibility windows). + +5. **Leaf scan:** Remaining ~2,000–3,000 entries. For each, run SGP4 propagation at coarse time steps (e.g., 60-second intervals over 2 hours = 120 propagations per satellite). Check topocentric elevation from observer. Return passes with elevation > threshold. + +**Net result:** ~92% of the catalog pruned before any SGP4 propagation. The propagator — which is O(1) per time step but has a large constant factor due to the perturbation model — only runs on the candidates that survive the trie traversal. + +--- + +## PostgreSQL Integration + +### Index Access Method Registration + +KTrie registers as a custom index access method using PostgreSQL's `IndexAmRoutine`: + +```c +IndexAmRoutine *ktrie_handler(void) { + IndexAmRoutine *amroutine = makeNode(IndexAmRoutine); + + amroutine->amstrategies = 6; /* see operator strategies below */ + amroutine->amsupport = 3; + amroutine->amcanorder = false; + amroutine->amcanbackward = false; + amroutine->amcanunique = false; + amroutine->amcanmulticol = true; /* indexes full TLE composite type */ + amroutine->amoptionalkey = true; + amroutine->amsearcharray = false; + amroutine->amsearchnulls = false; + amroutine->amstorage = false; + amroutine->amclusterable = false; + amroutine->ampredlocks = false; + amroutine->amcanparallel = true; /* parallel scan supported */ + amroutine->amcaninclude = false; + + amroutine->ambuild = ktrie_build; + amroutine->ambuildempty = ktrie_buildempty; + amroutine->aminsert = ktrie_insert; + amroutine->ambulkdelete = ktrie_bulkdelete; + amroutine->amvacuumcleanup = ktrie_vacuumcleanup; + amroutine->amcostestimate = ktrie_costestimate; + amroutine->amoptions = ktrie_options; + amroutine->amvalidate = ktrie_validate; + amroutine->ambeginscan = ktrie_beginscan; + amroutine->amrescan = ktrie_rescan; + amroutine->amgettuple = ktrie_gettuple; + amroutine->amendscan = ktrie_endscan; + + return amroutine; +} +``` + +### Operator Strategies + +```sql +-- Strategy 1: Orbital regime containment (SMA range) +CREATE OPERATOR @> (LEFTARG = orbital_regime, RIGHTARG = tle, PROCEDURE = ktrie_regime_contains); + +-- Strategy 2: Inclination band overlap +CREATE OPERATOR && (LEFTARG = inclination_band, RIGHTARG = tle, PROCEDURE = ktrie_incl_overlaps); + +-- Strategy 3: Visibility cone intersection (observer + time window → candidate TLEs) +CREATE OPERATOR &? (LEFTARG = observer_window, RIGHTARG = tle, PROCEDURE = ktrie_visibility_possible); + +-- Strategy 4: Proximity search (find objects near a given orbital state) +CREATE OPERATOR <-> (LEFTARG = orbital_state, RIGHTARG = tle, PROCEDURE = ktrie_orbital_distance); + +-- Strategy 5: Conjunction screening (two TLEs, check if orbits intersect) +CREATE OPERATOR &= (LEFTARG = tle, RIGHTARG = tle, PROCEDURE = ktrie_conjunction_possible); + +-- Strategy 6: Ground track intersection (does orbit cross a geographic region?) +CREATE OPERATOR &@ (LEFTARG = geographic_region, RIGHTARG = tle, PROCEDURE = ktrie_groundtrack_intersects); +``` + +### Cost Estimation + +The cost estimator uses the `population` fields in the trie to predict: + +1. **Index scan cost:** Number of pages traversed (tree depth × pages per level). Patricia compression reduces this. +2. **Propagation cost:** Number of leaf entries surviving the trie prune × cost per SGP4 propagation × number of time steps in the query window. This is the dominant cost and what makes the population counts critical for the planner. +3. **Heap fetch cost:** Number of TLEs that need full data (beyond what's cached in the leaf entry). + +```c +void ktrie_costestimate(PlannerInfo *root, IndexPath *path, + double loop_count, Cost *indexStartupCost, + Cost *indexTotalCost, Selectivity *indexSelectivity, + double *indexCorrelation, double *indexPages) { + + /* Estimate surviving population from trie structure */ + double surviving_pop = estimate_surviving_population(root, path); + + /* SGP4 propagation cost dominates */ + double propagation_steps = extract_time_window(path) / SGP4_STEP_INTERVAL; + double sgp4_cost_per_step = 0.05; /* calibrate empirically */ + + *indexTotalCost = (tree_depth * PAGE_READ_COST) + + (surviving_pop * propagation_steps * sgp4_cost_per_step) + + (surviving_pop * HEAP_FETCH_COST); + + *indexSelectivity = surviving_pop / total_catalog_size; +} +``` + +### Index Creation + +```sql +-- Create the KTrie index on a TLE table +CREATE INDEX idx_satellites_ktrie ON satellites USING ktrie (tle_data) + WITH (fill_factor = 60, compression_threshold = 1, reindex_raan_drift = 5.0); +``` + +**Storage parameters:** +- `fill_factor` (default 60): Target page fill percentage after splits. +- `compression_threshold` (default 1): Minimum single-child chain length before Patricia compression activates. +- `reindex_raan_drift` (default 5.0): Maximum mean RAAN drift in degrees before Level 2 triggers a rebuild. + +--- + +## Bulk Loading + +For initial index construction (e.g., loading the full Space-Track catalog of ~30,000+ tracked objects): + +1. **Sort** all TLEs by semi-major axis (primary), then inclination (secondary). +2. **Bottom-up construction:** Build leaf pages first, then construct internal nodes from the leaf level up. This avoids the overhead of top-down insertions and splits. +3. **Compression pass:** After construction, walk the tree bottom-up and compress all single-child chains. +4. **Population propagation:** Sum leaf counts upward through internal nodes to populate all `population` fields. + +This is analogous to how GiST and SP-GiST handle bulk loading, but the sort order is domain-specific (Keplerian element priority). + +--- + +## TLE Freshness and Index Maintenance + +TLEs have a limited validity window. A TLE for a LEO satellite is typically accurate for 1–3 days; deep-space objects may be valid for weeks. The index must handle TLE updates gracefully: + +1. **Update-in-place:** If the new TLE's orbital elements fall within the same leaf node's ranges, update the leaf entry and heap tuple without restructuring the trie. +2. **Move:** If the new TLE's elements have drifted enough to belong in a different branch (e.g., post-maneuver), delete from the old leaf and insert into the correct branch. +3. **Staleness flag:** If a TLE exceeds its expected validity window without an update, flag the leaf entry as `MANEUVERING` (possible unreported maneuver) so the propagator can apply wider uncertainty bounds. +4. **Decay handling:** Objects in orbital decay (decreasing SMA over successive TLEs) are flagged `DECAYING`. These may need to move between Level 0 bins as their altitude drops. + +--- + +## Constants + +All orbital mechanics constants used in the index and propagator must match the WGS-72 values that TLEs are fitted against: + +```c +#define KTRIE_GM 398600.8 /* km³/s², WGS-72 gravitational parameter */ +#define KTRIE_RE 6378.135 /* km, WGS-72 Earth equatorial radius */ +#define KTRIE_J2 0.001082616 /* WGS-72 second zonal harmonic */ +#define KTRIE_XKE 0.0743669161 /* sqrt(GM) in earth-radii³/min² units */ +#define KTRIE_DEEP_THRESHOLD 0.15625 /* 225/1440: orbital period threshold for SDP4 */ +#define KTRIE_MINUTES_PER_DAY 1440.0 +``` + +Never use WGS-84 values inside the propagator or index. WGS-84 is used only for the final TEME → ITRF → geodetic transformation when computing observer-relative positions. + +--- + +## File Organization + +``` +pg_ktrie/ +├── ktrie.h # Core data structures (this spec) +├── ktrie_handler.c # Index AM registration and routing +├── ktrie_build.c # Index construction and bulk loading +├── ktrie_insert.c # Single-tuple insertion, split logic +├── ktrie_delete.c # Deletion, merge logic, vacuum +├── ktrie_scan.c # Index scan (beginscan, gettuple, endscan) +├── ktrie_compress.c # Patricia path compression/decompression +├── ktrie_cost.c # Query planner cost estimation +├── ktrie_operators.c # SQL operator implementations +├── ktrie_utils.c # Keplerian element conversions, J2 precession rates +├── sgp4/ +│ ├── sgp4_propagator.c # SGP4 near-earth propagation (from STR#3 / Vallado Rev-1) +│ ├── sdp4_propagator.c # SDP4 deep-space propagation +│ ├── deep.c # DEEP subroutine (resonance integrator) +│ ├── tle_parser.c # TLE line 1 + line 2 parser +│ ├── coord_transforms.c # TEME → ITRF → geodetic/topocentric +│ └── sgp4.h # SGP4 constants, structs, WGS-72 values +├── sql/ +│ ├── ktrie--1.0.sql # Extension SQL: types, operators, index AM +│ └── ktrie.control # PostgreSQL extension control file +├── test/ +│ ├── test_str3_vectors.sql # Spacetrack Report #3 test cases (25 vectors) +│ ├── test_vallado.sql # Vallado Rev-1 test cases (518 vectors) +│ └── test_ktrie_ops.sql # Index operation tests +└── Makefile # PGXS build +``` + +--- + +## Validation + +The SGP4 implementation must pass both standard test vector sets before the index is considered operational: + +1. **Spacetrack Report #3, Chapter 13:** 25 test cases covering near-earth and deep-space objects. Sub-meter accuracy for near-earth. These test internal consistency — that the implementation matches the canonical FORTRAN. + +2. **Vallado Rev-1, Appendix D/E:** 518 verification test cases. Machine-epsilon agreement with the reference C++ implementation. These test cross-implementation correctness. + +3. **Kelso 2007 (optional but recommended):** SGP4 output compared against GPS precision ephemerides. ~1 km accuracy at epoch with 1–3 km/day growth. This validates that SGP4 itself (not just our implementation of it) is producing physically meaningful results. + +The index structure itself should be validated with: + +- Round-trip tests: insert TLEs, query them back, verify all orbital elements match. +- Population count invariant: sum of all leaf entries = sum of root's children's populations. +- Compression invariant: decompressing a compressed node and recompressing produces identical skip_bounds. +- Split/merge cycle: splitting a node and immediately merging produces the original node. diff --git a/docs/ktrie/SP-GIST-ORBITAL-TRIE.md b/docs/ktrie/SP-GIST-ORBITAL-TRIE.md new file mode 100644 index 0000000..07e7f05 --- /dev/null +++ b/docs/ktrie/SP-GIST-ORBITAL-TRIE.md @@ -0,0 +1,799 @@ +# SP-GiST Orbital Trie: Domain-Specific Index for Satellite Pass Prediction + +## 1. Purpose & Lineage + +This document specifies a domain-specific SP-GiST index for accelerating satellite pass prediction queries in PostgreSQL. The index decomposes TLE orbital element space into a 2-level hierarchical trie — semi-major axis and inclination — with a query-time RAAN filter and propagation-aware cost estimation. + +**Primary use case:** 1-to-N pass prediction. "Which of 30,000 satellites are visible from this observer in the next 2 hours?" The index prunes the catalog analytically before any SGP4/SDP4 propagation, which is the dominant query cost. + +**Lineage:** This design evolved from the original KTrie spec ([KTRIE-SPEC-ORIGINAL.md](KTRIE-SPEC-ORIGINAL.md)) through design review. The original proposed a fully custom PostgreSQL index access method with 5 Keplerian element levels. Analysis showed: + +1. Only 2 of 5 levels (SMA, inclination) are temporally stable enough to index +2. PostgreSQL's SP-GiST framework provides all necessary infrastructure (WAL, VACUUM, parallel scan, prefix compression) +3. RAAN filtering is better as a query-time analytical filter that adapts to the query window +4. Constellation-level pruning emerges naturally from the trie structure without explicit classification +5. The real innovation — propagation-aware cost estimation — works within SP-GiST via `traversalValue` + +The React visualization ([ktrie-layout.jsx](ktrie-layout.jsx)) reflects the original 5-level design and will be updated separately. + +--- + +## 2. The Visibility Decision Tree + +Pass prediction asks: "Can satellite S be seen from observer O during time window [t₁, t₂]?" There are exactly four questions, ordered by decreasing temporal stability: + +### Q1: Can the orbit reach observable altitude? + +Determined by perigee and apogee altitude, derived from SMA and eccentricity via Kepler's third law. + +- **Stability:** Very stable. Changes only with atmospheric drag (slow decay) or powered maneuvers (rare, discrete events). +- **Discrimination:** Eliminates entire orbital regimes. An observer with a 10° minimum elevation can see satellites up to ~2,500 km altitude. This removes all MEO, GEO, and beyond — roughly 25% of the catalog. +- **Verdict: INDEXABLE.** + +### Q2: Can the ground track reach my latitude? + +A satellite with inclination `i` has a ground track bounded by latitudes `[-i, +i]`. An observer at latitude `φ` can only see satellites where `i ≥ |φ|` (simplified; the actual constraint includes off-track visibility angle, which depends on altitude). + +- **Stability:** Very stable. Inclination barely changes without thrust. +- **Discrimination:** An observer at 43.7° (Eagle, Idaho) eliminates all equatorial and low-inclination objects — roughly 60% of LEO. +- **Verdict: INDEXABLE.** + +### Q3: Is the orbital plane aligned with my location? + +The Right Ascension of the Ascending Node (RAAN, Ω) determines where the orbit crosses the equator in inertial space. A satellite is potentially overhead when its RAAN is roughly aligned with the observer's current sidereal position. + +- **Stability: UNSTABLE.** RAAN precesses at 0.5–7°/day due to J2 oblateness perturbation. The precession rate is: + + ``` + Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i) + ``` + + where `n` is mean motion, `J₂ = 0.001082616`, `Rₑ = 6378.135 km` (WGS-72), and `a` is semi-major axis. + +- **Key insight:** Ω̇ is fully determined by SMA and inclination — the two elements already indexed at L0 and L1. The RAAN check requires only the TLE's stored RAAN₀ and epoch (available in the leaf) plus the elements already traversed. No additional index level is needed. +- **Verdict: QUERY-TIME FILTER.** Computed per surviving candidate after L0+L1 pruning. + +### Q4: Is the satellite at the right orbital phase? + +Mean anomaly changes at ~4°/second for LEO. The satellite's actual position along its orbit at any given moment is the irreducible question that requires SGP4/SDP4 numerical propagation. + +- **Stability: COMPLETELY UNSTABLE.** Changes continuously. +- **Verdict: IRREDUCIBLE. Requires SGP4 propagation.** + +### Conclusion + +Only two dimensions are worth indexing. Everything else is either computable from those two dimensions (RAAN) or requires full propagation (mean anomaly). This drives the 2-level trie design. + +--- + +## 3. Why SP-GiST + +### SP-GiST vs custom AM vs enhanced GiST + +| Requirement | SP-GiST | Custom AM | Enhanced GiST | +|-------------|---------|-----------|----------------| +| Page management + WAL | Free | Must implement | Free | +| VACUUM + dead tuple cleanup | Free | Must implement | Free | +| Crash recovery | Free | Must implement | Free | +| Parallel scan | Free | Must implement | Free | +| Prefix compression | Built-in (prefix/suffix) | Must implement | Not available | +| Non-balanced tree | Native | Must implement | Not natural | +| Fixed level decomposition | Via `level` counter | Must implement | Not available | +| Traversal state for cost est. | `traversalValue` | Must implement | Not available | +| Node labels for children | `nodeLabels` | Must implement | Not available | +| Support functions needed | 5 | 12+ | 7 (already exist) | +| Lines of domain-specific C | ~800 est. | ~3,000+ est. | ~200 (delta) | + +**SP-GiST was literally designed for this kind of data structure.** Its space-partitioning model matches the KTrie concept directly: fixed decomposition rules, non-balanced trees, prefix compression. The `traversalValue` mechanism carries state down the tree during scans — exactly what population-aware cost estimation needs. + +### The quad-tree precedent + +PostgreSQL's built-in SP-GiST quad-tree (`spgquadtreeproc.c`) demonstrates a key pattern: the `restDatum` (remaining datum after prefix extraction) can be the **same type** at every level. The quad-tree passes the full point unchanged: + +```c +out->result.matchNode.restDatum = in->datum; +``` + +The tree terminates by depth, not by value exhaustion. Our design does the same — the leaf stores the full `tle` type. No intermediate compressed struct needed. The leaf has everything for the RAAN query-time filter (RAAN₀, epoch, mean_motion, inclination, eccentricity). + +### What SP-GiST's prefix compression gives us + +The original KTrie spec implemented Patricia compression manually (compressed nodes, skip_bounds arrays, compression/decompression triggers). SP-GiST provides this natively: the `config` function declares a prefix type, and the `choose` function returns prefix/suffix decompositions. Single-child chains are compressed automatically by the framework. Same semantics, zero custom page management. + +--- + +## 4. Architecture + +### 2-level trie with query-time filters + +``` +SP-GiST L0: Semi-Major Axis (altitude regime) + ├── Equal-population splits, not equal-range + ├── Density-balanced: LEO (75% of catalog) gets finer bins + └── Prunes by perigee/apogee altitude reachability + +SP-GiST L1: Inclination + ├── Equal-population splits + ├── Natural clustering at launch-site latitudes (28.5°, 51.6°, 97°) + └── Prunes by ground-track latitude coverage + +Query-time RAAN filter (in leaf_consistent): + ├── Projects RAAN to query midpoint via J2 precession rate + ├── Checks alignment with observer's sidereal position + ├── Adapts automatically to any query window length + └── Cost: ~10ns per candidate, ~45μs for entire post-L0/L1 batch + +SGP4/SDP4 propagation (irreducible): + └── Full numerical propagation for surviving candidates +``` + +### Equal-population splits + +The current GiST uses equal-range splits (median midpoint of the geometric spread). Equal-population splits keep the tree balanced in **object density**, not geometric space. The altitude band 300–600 km (where Starlink, ISS, and most LEO debris live) gets finer subdivision than 600–2,000 km. This matters because: + +- Query cost is proportional to surviving population, not geometric volume +- Dense regions need finer discrimination; sparse regions don't benefit from it +- The cost estimator's population predictions are more accurate with balanced subtrees + +### Constellation-aware pruning + +Mega-constellations cluster tightly in L0+L1 space. Starlink shell 1: all ~2,000 satellites at 550 km / 53.0°. OneWeb: ~600 at 1,200 km / 87.9°. These clusters naturally fall into single L1 subtrees, giving the cost estimator a tight population count and a narrow J2 precession rate range. Constellation-level pruning emerges from the trie structure without explicit constellation classification. + +--- + +## 5. SP-GiST Support Functions + +Five functions implement the index. All use WGS-72 constants internally, consistent with pg_orrery's constant chain of custody. + +### 5.1 `tle_trie_config()` + +Declares the type system for the trie. + +```c +Datum tle_trie_config(PG_FUNCTION_ARGS) +{ + spgConfigOut *cfg = (spgConfigOut *) palloc0(sizeof(spgConfigOut)); + + cfg->prefixType = FLOAT8RANGEOID; /* orbital_bounds: SMA or inclination range */ + cfg->labelType = FLOAT8OID; /* bin boundary value */ + cfg->leafType = TLE_TYPE_OID; /* full TLE at leaves */ + cfg->canReturnData = true; /* enable index-only scans */ + cfg->longValuesOK = false; /* fixed-size types only */ + + PG_RETURN_VOID(); +} +``` + +The prefix type stores the range covered by each inner node. The label type stores bin boundary values for child selection. The leaf type is the existing `tle` type — no new type needed at the leaf level. + +### 5.2 `tle_trie_choose()` + +Decides which child to descend into during insertion. Level-aware: L0 routes by SMA, L1 routes by inclination. + +```c +Datum tle_trie_choose(PG_FUNCTION_ARGS) +{ + spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0); + spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1); + + tle_type *tle = DatumGetTleP(in->leafDatum); + int level = in->level; + double key; + + /* Extract the element for this level */ + if (level == 0) + key = tle_sma_km(tle); /* semi-major axis in km */ + else + key = tle_inclination_rad(tle); /* inclination in radians */ + + if (in->allTheSame) + { + /* All children equivalent — pick first */ + out->resultType = spgMatchNode; + out->result.matchNode.nodeN = 0; + out->result.matchNode.levelAdd = 1; + out->result.matchNode.restDatum = in->leafDatum; /* pass full TLE unchanged */ + PG_RETURN_VOID(); + } + + /* Find the child whose bin contains our key */ + for (int i = 0; i < in->nNodes; i++) + { + double boundary = DatumGetFloat8(in->nodeLabels[i]); + if (i == in->nNodes - 1 || key < DatumGetFloat8(in->nodeLabels[i + 1])) + { + out->resultType = spgMatchNode; + out->result.matchNode.nodeN = i; + out->result.matchNode.levelAdd = 1; + out->result.matchNode.restDatum = in->leafDatum; + PG_RETURN_VOID(); + } + } + + /* Fallback: add new node (should not happen with well-chosen splits) */ + out->resultType = spgAddNode; + out->result.addNode.nodeLabel = Float8GetDatum(key); + out->result.addNode.nodeN = in->nNodes; + PG_RETURN_VOID(); +} +``` + +**Key detail:** `restDatum = in->leafDatum` — the full TLE passes unchanged to every level, following the quad-tree precedent. The trie terminates at depth 2 (two levels), not by value exhaustion. + +### 5.3 `tle_trie_picksplit()` + +Splits a leaf page that has overflowed. Uses equal-population strategy: sort by the current level's element, split at the median entry (not the median value). + +```c +Datum tle_trie_picksplit(PG_FUNCTION_ARGS) +{ + spgPickSplitIn *in = (spgPickSplitIn *) PG_GETARG_POINTER(0); + spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1); + + int level = in->level; + int nTuples = in->nTuples; + + /* Extract sort keys for this level */ + double *keys = palloc(nTuples * sizeof(double)); + int *order = palloc(nTuples * sizeof(int)); + + for (int i = 0; i < nTuples; i++) + { + tle_type *tle = DatumGetTleP(in->datums[i]); + keys[i] = (level == 0) ? tle_sma_km(tle) : tle_inclination_rad(tle); + order[i] = i; + } + + /* Sort by key value */ + qsort_with_keys(order, nTuples, keys); /* stable sort by keys[order[i]] */ + + /* Equal-population split: divide into N bins of ~equal count */ + int nBins = choose_bin_count(nTuples); /* heuristic: sqrt(n), clamped [2, 32] */ + + out->nNodes = nBins; + out->nodeLabels = palloc(nBins * sizeof(Datum)); + out->mapTuplesToNodes = palloc(nTuples * sizeof(int)); + out->leafTupleDatums = palloc(nTuples * sizeof(Datum)); + + int per_bin = nTuples / nBins; + for (int bin = 0; bin < nBins; bin++) + { + int start = bin * per_bin; + out->nodeLabels[bin] = Float8GetDatum(keys[order[start]]); + + int end = (bin == nBins - 1) ? nTuples : (bin + 1) * per_bin; + for (int j = start; j < end; j++) + { + int orig = order[j]; + out->mapTuplesToNodes[orig] = bin; + out->leafTupleDatums[orig] = in->datums[orig]; /* TLE unchanged */ + } + } + + pfree(keys); + pfree(order); + PG_RETURN_VOID(); +} +``` + +### 5.4 `tle_trie_inner_consistent()` + +The pruning engine. At each inner node, determines which children could contain matching satellites. Carries accumulated bounds in `traversalValue` for cost estimation. + +```c +/* + * Traversal state carried down the tree during index scans. + * Accumulates bounds for cost estimation and RAAN filter setup. + */ +typedef struct OrbitalTraversal { + double sma_low, sma_high; /* accumulated SMA range from L0 */ + double inc_low, inc_high; /* accumulated inclination range from L1 */ + int32 population; /* estimated objects in this subtree */ + double j2_rate; /* J2 precession rate (computed after L1) */ +} OrbitalTraversal; + +Datum tle_trie_inner_consistent(PG_FUNCTION_ARGS) +{ + spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0); + spgInnerConsistentOut *out = (spgInnerConsistentOut *) PG_GETARG_POINTER(1); + + int level = in->level; + + /* Reconstruct or initialize traversal state */ + OrbitalTraversal *trav; + if (in->traversalValue) + trav = (OrbitalTraversal *) in->traversalValue; + else + { + trav = palloc0(sizeof(OrbitalTraversal)); + trav->sma_low = 0; + trav->sma_high = INFINITY; + trav->inc_low = 0; + trav->inc_high = M_PI; + trav->population = -1; /* unknown until leaf scan */ + } + + /* Extract query parameters from scankey */ + observer_window *qwin = extract_observer_window(in); + + out->nodeNumbers = palloc(in->nNodes * sizeof(int)); + out->traversalValues = palloc(in->nNodes * sizeof(void *)); + out->nNodes = 0; + + for (int i = 0; i < in->nNodes; i++) + { + double bin_low = DatumGetFloat8(in->nodeLabels[i]); + double bin_high = (i < in->nNodes - 1) + ? DatumGetFloat8(in->nodeLabels[i + 1]) + : INFINITY; + bool dominated = false; + + if (level == 0) + { + /* L0: SMA pruning — can a satellite at this altitude be visible? */ + double perigee_alt_km = bin_low - WGS72_AE; + if (perigee_alt_km > max_visible_altitude(qwin)) + dominated = true; /* too high to be visible */ + } + else if (level == 1) + { + /* L1: Inclination pruning — can this inclination reach observer latitude? */ + double observer_lat = fabs(qwin->observer.lat_rad); + if (bin_high < observer_lat) + dominated = true; /* ground track never reaches observer */ + } + + if (!dominated) + { + /* Propagate traversal state to child */ + OrbitalTraversal *child_trav = palloc(sizeof(OrbitalTraversal)); + memcpy(child_trav, trav, sizeof(OrbitalTraversal)); + + if (level == 0) { + child_trav->sma_low = bin_low; + child_trav->sma_high = bin_high; + } else if (level == 1) { + child_trav->inc_low = bin_low; + child_trav->inc_high = bin_high; + /* After L1, we can compute J2 precession rate */ + double a_mid = (child_trav->sma_low + child_trav->sma_high) / 2.0; + double i_mid = (bin_low + bin_high) / 2.0; + double n = sqrt(WGS72_MU / (a_mid * a_mid * a_mid)); + child_trav->j2_rate = -1.5 * n * WGS72_J2 + * (WGS72_AE / a_mid) * (WGS72_AE / a_mid) + * cos(i_mid); + } + + int idx = out->nNodes++; + out->nodeNumbers[idx] = i; + out->traversalValues[idx] = child_trav; + } + } + + PG_RETURN_VOID(); +} +``` + +### 5.5 `tle_trie_leaf_consistent()` + +Final check at the leaf level. Applies the RAAN query-time filter and eccentricity check. Always sets `recheck = true` because SGP4 propagation is still needed for the definitive answer. + +```c +Datum tle_trie_leaf_consistent(PG_FUNCTION_ARGS) +{ + spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0); + spgLeafConsistentOut *out = (spgLeafConsistentOut *) PG_GETARG_POINTER(1); + + tle_type *tle = DatumGetTleP(in->leafDatum); + observer_window *qwin = extract_observer_window_from_scankey(in); + + /* RAAN query-time filter */ + double dt_days = (qwin->t_mid_jd - tle_epoch_jd(tle)); + double raan_now = tle_raan_rad(tle) + tle_j2_raan_rate(tle) * dt_days; + raan_now = fmod(raan_now, 2.0 * M_PI); + if (raan_now < 0) raan_now += 2.0 * M_PI; + + /* Observer sidereal position at query midpoint */ + double lst = local_sidereal_time(qwin->observer.lon_rad, qwin->t_mid_jd); + + /* RAAN visibility window: Earth rotation during query + ground footprint */ + double earth_rotation = (qwin->t_end_jd - qwin->t_start_jd) * 360.0; /* degrees */ + double footprint = ground_footprint_deg(tle_sma_km(tle), qwin->min_elevation); + double raan_window_half = (earth_rotation / 2.0 + footprint) * (M_PI / 180.0); + + double raan_diff = fabs(raan_now - lst); + if (raan_diff > M_PI) raan_diff = 2.0 * M_PI - raan_diff; + + if (raan_diff > raan_window_half) + { + out->leafValue = in->leafDatum; + out->recheck = true; + PG_RETURN_BOOL(false); /* RAAN not aligned — skip this candidate */ + } + + /* Eccentricity sanity check: highly eccentric orbits need wider altitude band */ + double ecc = tle_eccentricity(tle); + if (ecc > 0.1) + { + double perigee = tle_sma_km(tle) * (1.0 - ecc) - WGS72_AE; + double apogee = tle_sma_km(tle) * (1.0 + ecc) - WGS72_AE; + if (perigee > max_visible_altitude(qwin)) + { + out->leafValue = in->leafDatum; + out->recheck = true; + PG_RETURN_BOOL(false); + } + } + + out->leafValue = in->leafDatum; + out->recheck = true; /* always recheck — SGP4 propagation is the ground truth */ + PG_RETURN_BOOL(true); +} +``` + +--- + +## 6. The RAAN Query-Time Filter + +### Why not a trie level + +The effective RAAN visibility window depends entirely on the query time window. This table shows why static trie bins can't capture the physics: + +| Query window | Earth rotation | Effective RAAN window | Candidates eliminated | +|-------------|---------------|----------------------|----------------------| +| 30 min | 7.5° | ~52° (14% of sky) | ~85% | +| 2 hours | 30° | ~74° (21%) | ~79% | +| 6 hours | 90° | ~134° (37%) | ~63% | +| 12 hours | 180° | ~224° (62%) | ~38% | +| 24 hours | 360° | 360° (100%) | 0% | + +A static bin structure (4 bins of 90° each) could eliminate at most 75% (3 of 4 bins) for a 30-minute query, but eliminates nothing for queries longer than ~6 hours. The query-time filter automatically adapts to the actual window. + +### The J2 precession rate + +RAAN precession is the dominant secular perturbation for LEO orbits. The rate is: + +``` +Ω̇ = -1.5 · n · J₂ · (Rₑ/a)² · cos(i) +``` + +where: +- `n = √(μ/a³)` — mean motion (rad/s), with `μ = 398600.8 km³/s²` (WGS-72) +- `J₂ = 0.001082616` (WGS-72) +- `Rₑ = 6378.135 km` (WGS-72) +- `a` — semi-major axis (km) +- `i` — inclination (radians) + +For a 400 km LEO satellite at 51.6° inclination (ISS-like): +``` +a = 6778.135 km +n = 0.00114 rad/s +Ω̇ = -1.5 × 0.00114 × 0.001082616 × (6378.135/6778.135)² × cos(51.6°) + ≈ -1.07 × 10⁻⁶ rad/s ≈ -5.3°/day +``` + +This rate is fully determined by SMA and inclination — both already indexed at L0 and L1. + +### Per-candidate cost + +Projecting RAAN to query time requires: +1. One subtraction (epoch difference) +2. One multiply (rate × time) +3. One addition (RAAN₀ + drift) +4. One modulo (wrap to [0, 2π)) +5. One range check (within visibility window?) + +**Cost: ~10 nanoseconds per candidate.** + +After L0 and L1 pruning, a typical pass prediction query over 30,000 TLEs leaves ~4,500 candidates. The total RAAN filter cost: + +``` +4,500 × 10 ns = 45 μs +``` + +This is negligible compared to SGP4 propagation cost for the survivors (typically hundreds of milliseconds to seconds). The filter eliminates ~79% of those 4,500 candidates (for a 2-hour window), leaving ~945 for SGP4 propagation instead of 4,500. + +--- + +## 7. Propagation-Aware Cost Estimation + +### The core innovation + +Standard PostgreSQL index cost estimators model I/O cost: pages read, tuples fetched. KTrie's original insight — preserved in this SP-GiST design — is to model **downstream computation cost**. The expensive operation in satellite queries isn't reading data; it's SGP4/SDP4 propagation. + +The cost estimator tells the query planner: + +``` +estimated_cost = surviving_population × time_steps × sgp4_cost_per_step +``` + +This lets the planner compare "index scan + 945 SGP4 evaluations" vs "sequential scan + 30,000 SGP4 evaluations" — a decision no other PostgreSQL index type can make. + +### Population tracking via `traversalValue` + +SP-GiST's `traversalValue` mechanism carries the `OrbitalTraversal` struct down the tree during scans. At each inner node, the cost estimator accumulates: + +```c +typedef struct OrbitalTraversal { + double sma_low, sma_high; /* from L0 */ + double inc_low, inc_high; /* from L1 */ + int32 population; /* objects in this subtree */ + double j2_rate; /* derived from L0 + L1 midpoints */ +} OrbitalTraversal; +``` + +The `population` field counts objects in each subtree. After L1, `j2_rate` is computed from the accumulated SMA and inclination bounds. This enables the RAAN filter to predict how many candidates will survive: + +``` +expected_visible = population × (RAAN_window / 360°) +``` + +### Constellation detection as emergent property + +Mega-constellations cluster tightly in SMA × inclination space. Starlink shell 1: ~2,000 satellites at 550 km / 53.0°. All members share nearly the same J2 precession rate, and their RAANs are evenly distributed by constellation design (phased orbital planes). + +The trie naturally groups these into a single L1 subtree. The cost estimator sees a tight population with a uniform RAAN distribution and can predict: + +``` +Starlink shell 1 example (2-hour query from Eagle, Idaho): + population = 2,000 + RAAN_window = 74° → fraction = 20.6% + RAAN_survivors = 2,000 × 0.206 = ~412 + time_steps = 120 (2 hours at 60-second intervals) + sgp4_per_step = 0.05 ms + estimated_cost = 412 × 120 × 0.05 ms = 2.5 seconds +``` + +No explicit constellation classification is needed. The structure emerges from orbital mechanics. + +### Custom statistics function + +The cost estimator can be registered as a custom statistics function for the SP-GiST operator class, or integrated into the `inner_consistent` function via `traversalValue`. The planner sees accurate per-subtree cost predictions without any special configuration: + +```c +void spgist_tle_cost_estimate(PlannerInfo *root, IndexPath *path, + double loop_count, Cost *startup_cost, + Cost *total_cost, Selectivity *selectivity, + double *correlation, double *index_pages) +{ + double surviving_pop = estimate_from_traversal(root, path); + double time_steps = extract_query_window_steps(path); + double sgp4_cost = 0.05; /* ms per propagation step, calibratable */ + + double raan_fraction = estimate_raan_survival(path); + double propagation_candidates = surviving_pop * raan_fraction; + + *total_cost = (2 * PAGE_READ_COST) /* L0 + L1 traversal */ + + (surviving_pop * RAAN_FILTER_COST) /* query-time RAAN */ + + (propagation_candidates * time_steps * sgp4_cost) /* SGP4 */ + + (propagation_candidates * HEAP_FETCH_COST); /* fetch full TLE */ + + *selectivity = propagation_candidates / total_catalog_size; +} +``` + +--- + +## 8. Operators + +### Types + +```sql +-- Observation query parameters bundled as a composite type +CREATE TYPE observer_window AS ( + obs observer, -- existing pg_orrery observer type (lat, lon, alt_m) + t_start timestamptz, -- query window start + t_end timestamptz, -- query window end + min_el float8 -- minimum elevation angle in degrees +); +``` + +### Operator definitions + +Three operators, starting minimal. The flagship operator `&?` is the primary interface for pass prediction. + +```sql +-- Strategy 1: Altitude regime containment +-- "Is this TLE's orbit within this altitude range?" +CREATE OPERATOR @> ( + LEFTARG = float8range, -- altitude range in km (e.g., '[200,600]') + RIGHTARG = tle, + PROCEDURE = tle_regime_contains, + COMMUTATOR = <@ +); + +-- Strategy 2: Orbital envelope overlap (enhanced existing &&) +-- "Do these two TLEs share overlapping altitude + inclination space?" +CREATE OPERATOR && ( + LEFTARG = tle, + RIGHTARG = tle, + PROCEDURE = tle_envelope_overlaps, + COMMUTATOR = && +); + +-- Strategy 3: Visibility cone check (flagship operator) +-- "Could this satellite be visible from this observer during this time window?" +-- Combines SMA pruning (L0) + inclination pruning (L1) + RAAN query-time filter +CREATE OPERATOR &? ( + LEFTARG = observer_window, + RIGHTARG = tle, + PROCEDURE = tle_visibility_possible +); +``` + +### Example queries + +```sql +-- Primary use case: pass prediction +-- "Which satellites might be visible from Eagle, Idaho in the next 2 hours?" +SELECT s.norad_id, s.name +FROM satellites s +WHERE ROW( + observer('43.7N 116.4W 760m'), + now(), + now() + interval '2 hours', + 10.0 +)::observer_window &? s.tle; + +-- Altitude regime query +-- "Which satellites orbit between 400 and 600 km?" +SELECT s.norad_id, s.name +FROM satellites s +WHERE '[400,600]'::float8range @> s.tle; + +-- Combined: visible LEO passes with full SGP4 propagation +SELECT s.norad_id, s.name, + predict_passes(s.tle, observer('43.7N 116.4W 760m'), + now(), now() + interval '2 hours', 10.0) +FROM satellites s +WHERE ROW( + observer('43.7N 116.4W 760m'), + now(), + now() + interval '2 hours', + 10.0 +)::observer_window &? s.tle; +``` + +The `&?` operator returns `true` for satellites that **might** be visible (a superset of the actual answer). The `predict_passes()` function then runs SGP4 propagation on only those candidates. The index's job is to minimize the candidate set, not to produce the final answer. + +--- + +## 9. Data Structures + +### SP-GiST node structure + +SP-GiST manages its own page layout, inner tuples, and leaf tuples. We declare our types; the framework handles storage: + +- **Prefix type:** `float8range` — the SMA or inclination range covered by an inner node +- **Label type:** `float8` — bin boundary values for child dispatch +- **Leaf type:** `tle` — the existing pg_orrery TLE type (112 bytes, `STORAGE = plain`) + +No new page format. No custom WAL records. SP-GiST provides all of this. + +### `observer_window` composite type + +```c +/* Not a new C struct — uses SQL composite type mechanics */ +/* Fields: observer (24B) + t_start (8B) + t_end (8B) + min_el (8B) = 48 bytes */ +``` + +This is a standard SQL composite type, not a custom C type. PostgreSQL handles I/O, storage, and parameter passing. The operator functions extract fields via `GetAttributeByNum()`. + +### WGS-72 constants + +All internal computations use the same WGS-72 constants already defined in pg_orrery's `types.h`: + +```c +#define WGS72_MU 398600.8 /* km³/s² */ +#define WGS72_AE 6378.135 /* km */ +#define WGS72_J2 0.001082616 +``` + +The constant chain of custody is maintained: WGS-72 for orbital mechanics, WGS-84 for observer coordinate output. No new constants introduced. + +--- + +## 10. What Changed from the Original KTrie + +| Original KTrie | Evolved SP-GiST | Rationale | +|----------------|-----------------|-----------| +| Custom index AM (12+ functions) | SP-GiST (5 functions) | PostgreSQL handles WAL, VACUUM, parallel, recovery | +| 5-level trie (a, i, Ω, e, ω) | 2-level trie (a, i) | Only SMA + inclination are temporally stable | +| RAAN as L2 (4-8 static bins) | Query-time analytical filter | Time-dependent; adapts to window; no reindex | +| Eccentricity as L3 | Query-time check in `leaf_consistent` | Near-zero in LEO; unreliable for fine pruning | +| Arg. perigee as L4 | Dropped | Near-zero discrimination in LEO | +| Patricia compression (custom) | SP-GiST prefix compression (built-in) | Same semantics, zero custom page management | +| `uint16` population (max 65,535) | `int32` via `traversalValue` | No overflow risk; Starlink alone targets 12,000+ | +| 6 operators | 3 operators | Start minimal; `&?` is the flagship | +| 10+ C source files | ~3 C source files | SP-GiST handles infrastructure | +| Custom page layout (72B header) | SP-GiST page layout | No custom page management | +| Custom split/merge logic | SP-GiST `picksplit` | Framework handles page operations | +| Manual compression triggers | SP-GiST automatic prefix handling | Framework compresses single-child paths | +| `reindex_raan_drift` parameter | Eliminated | RAAN not in the trie; no reindex needed | + +### What survived intact + +Three ideas from the original spec are preserved without modification: + +1. **Population-aware cost estimation** — the core innovation. Now delivered via `traversalValue` instead of `uint16` per-entry fields. +2. **Equal-population splits** — density-balanced tree, not geometry-balanced. Now in `picksplit` instead of custom split logic. +3. **WGS-72 constant chain of custody** — unchanged, inherited from pg_orrery. + +--- + +## 11. Implementation Roadmap + +### Phase 1: SP-GiST prototype + +Create the core SP-GiST index with the 5 support functions and 3 operators. + +**Files:** +``` +src/spgist_tle.c — config, choose, picksplit, inner_consistent, leaf_consistent +src/visibility_ops.c — &? operator, RAAN filter, observer_window type support +``` + +**SQL:** +``` +sql/pg_orrery--0.6.0--0.7.0.sql — types, operators, SP-GiST operator class +``` + +**Tests:** +``` +test/sql/spgist_tle.sql — index creation, operator tests, pruning validation +test/expected/spgist_tle.out +``` + +**Goal:** Working index that correctly prunes by SMA + inclination + RAAN. Correctness over performance. + +### Phase 2: Cost estimator + +Add the propagation-aware cost estimation function. + +**Files:** +``` +src/spgist_cost.c — custom cost estimator with population tracking +``` + +**Goal:** Query planner correctly chooses index scan vs sequential scan based on predicted SGP4 cost. + +### Phase 3: Benchmark vs current GiST + +Load 30,000 TLEs from Space-Track. Compare: + +| Metric | Current GiST | SP-GiST orbital trie | +|--------|-------------|---------------------| +| Pruning rate (2h window, 43.7° lat) | Measure | Measure | +| Pruning rate (24h window, equator) | Measure | Measure | +| Query time (pass prediction) | Measure | Measure | +| False positive rate | Measure | Measure | +| Index size | Measure | Measure | +| Build time | Measure | Measure | + +### Phase 4: Evaluate + +If the SP-GiST trie achieves >85% pruning on the benchmark suite, ship as pg_orrery v0.7.0. If not, analyze where candidates survive and determine whether deeper trie levels or alternative strategies are justified. + +**Decision gate:** The GiST enhancement (adding eccentricity to the existing key) provides the baseline. The SP-GiST trie must demonstrably exceed it to justify the additional code. + +--- + +## 12. Open Questions + +1. **Should `observer_window` be a new custom type or a SQL composite?** + A custom C type (like `observer`) gives tighter control and avoids composite type overhead. A SQL composite is simpler to implement and extend. The composite is sufficient for the prototype; migrate to custom C type if profiling shows overhead. + +2. **Can we store population metadata in SP-GiST's prefix datum?** + The prefix is `float8range` (16 bytes). Population could be packed into a custom prefix struct instead, but this couples the prefix to the cost estimator. Using `traversalValue` keeps them separate. Needs testing to determine if `traversalValue` introduces measurable overhead. + +3. **What's the right default picksplit strategy?** + Options: median-of-element-value (equal-range) vs median-of-population (equal-count). Equal-count is theoretically better for balanced query cost, but equal-range is simpler and may perform similarly when the catalog distribution is stable. Implement equal-count, benchmark against equal-range. + +4. **Should the cost estimator calibrate `sgp4_cost_per_step` at index creation time?** + Running a small SGP4 benchmark during `CREATE INDEX` would give an accurate per-machine calibration. But it couples index creation to runtime performance, and the cost varies with TLE characteristics (near-earth vs deep-space). A configurable GUC (`pg_orrery.sgp4_cost_us`, default 50) may be more practical. + +5. **What about conjunction screening (N×N)?** + The original spec included `&=` (conjunction possible). This is a different query pattern: orbit-to-orbit, not observer-to-orbit. The SP-GiST trie can support it (L0+L1 pruning applies), but the RAAN filter doesn't (no observer). Defer to a future phase. + +6. **Index-only scans: what data is needed?** + `canReturnData = true` enables index-only scans, avoiding heap fetches. The leaf stores the full `tle` type. If the query only needs NORAD ID and basic orbital parameters (common for pass prediction pre-screening), the index can serve the query without touching the heap. Verify this works correctly with the `STORAGE = plain` TLE type. diff --git a/docs/ktrie/ktrie-layout.jsx b/docs/ktrie/ktrie-layout.jsx new file mode 100644 index 0000000..f51c2b9 --- /dev/null +++ b/docs/ktrie/ktrie-layout.jsx @@ -0,0 +1,544 @@ +import { useState } from "react"; + +const MONO = "'SF Mono', 'Cascadia Code', 'Fira Code', monospace"; +const SANS = "'Segoe UI', system-ui, sans-serif"; + +const colors = { + bg: "#0a0e17", + surface: "#111827", + surface2: "#1a2332", + border: "#1e3a5f", + borderHi: "#2563eb", + text: "#e2e8f0", + textDim: "#64748b", + textMuted: "#475569", + accent: "#3b82f6", + accentDim: "#1e40af", + green: "#10b981", + greenDim: "#064e3b", + amber: "#f59e0b", + amberDim: "#78350f", + red: "#ef4444", + redDim: "#7f1d1d", + purple: "#a78bfa", + purpleDim: "#4c1d95", + cyan: "#22d3ee", + cyanDim: "#164e63", + orange: "#fb923c", +}; + +const levelMeta = [ + { name: "Semi-Major Axis (a)", unit: "km", example: "6,798 km (ISS)", color: colors.accent }, + { name: "Inclination (i)", unit: "deg", example: "51.6° (ISS)", color: colors.green }, + { name: "RAAN (Ω)", unit: "deg", example: "Right Ascension", color: colors.amber }, + { name: "Eccentricity (e)", unit: "", example: "0.0001 (ISS)", color: colors.purple }, + { name: "Arg. Perigee (ω)", unit: "deg", example: "Argument of Perigee", color: colors.orange }, +]; + +const ByteBlock = ({ label, bytes, color, detail, dimColor }) => ( +
+
{label}
+
{bytes}B
+ {detail &&
{detail}
} +
+); + +const StructField = ({ type, name, size, comment, color }) => ( +
+ {type} + {name} + {size}B + {comment} +
+); + +const SectionHeader = ({ children, color = colors.accent }) => ( +
{children}
+); + +const Badge = ({ children, color }) => ( + {children} +); + +const PageDiagram = ({ type }) => { + const isInternal = type === "internal"; + const isLeaf = type === "leaf"; + const isCompressed = type === "compressed"; + + const headerColor = colors.cyan; + const entryColor = isInternal ? colors.accent : isLeaf ? colors.green : colors.purple; + const entryLabel = isInternal ? "KTrieChildEntry" : isLeaf ? "KTrieLeafEntry" : "CompressedPath + Entries"; + const entrySize = isInternal ? 24 : isLeaf ? 68 : "variable"; + const capacity = isInternal ? "~337 children" : isLeaf ? "~119 TLEs" : "path + leaves"; + + return ( +
+
+
+ {isInternal ? "Internal Node" : isLeaf ? "Leaf Node" : "Compressed Node"} +
+ {capacity} +
+ + {/* Page visualization */} +
+ {/* PG Header */} +
+ PageHeaderData + 24B +
+ + {/* KTrie Node Header */} +
+
+ KTrieNodeHeader + 72B +
+
+ + + + + + + + +
+
+ + {/* Entries section */} +
+
+ + {entryLabel} × N + + + {entrySize}B each → {capacity} + +
+ + {/* Show 3 sample entries */} + {[0, 1, 2].map(i => ( +
+ {isInternal ? ( + <> + + + + + + + ) : ( + <> + + + + + + + + + + + + )} +
+ ))} +
+
+ + {/* Page footer */} +
+ special: free_offset + 8,192 bytes total +
+
+
+ ); +}; + +const TreeViz = () => { + const nodeStyle = (color, label, sub, pop) => ( +
+
{label}
+
{sub}
+ {pop && {pop}} +
+ ); + + const connector = (color = colors.border) => ( +
+ ); + + return ( +
+ Adaptive Trie Traversal — ISS Query Path +
+ SELECT * FROM satellites WHERE sgp4_passes(tle, observer(43.70, -116.35), now(), '2h') — Eagle, ID +
+ +
+ {/* Level 0: Semi-major axis */} +
L0: Semi-Major Axis
+
+ {nodeStyle(colors.accent, "6,378–6,978", "sub-LEO → LEO", "28,441")} + {nodeStyle(colors.textMuted, "6,978–13,000", "MEO-low", "412")} + {nodeStyle(colors.textMuted, "13,000–30,000", "MEO-high", "89")} + {nodeStyle(colors.textMuted, "30,000–42,200", "GEO region", "1,247")} + {nodeStyle(colors.textMuted, "42,200+", "super-GEO", "18")} +
+ {connector(colors.accent)} +
+ ✓ ISS at 6,798km → first bin, prune 4 branches +
+ {connector(colors.green)} + + {/* Level 1: Inclination — adaptive! */} +
L1: Inclination (adaptive: 12 bins in LEO vs 3 in GEO)
+
+ {nodeStyle(colors.textMuted, "0°–28°", "equatorial", "2,104")} + {nodeStyle(colors.textMuted, "28°–45°", "mid-lat", "1,856")} + {nodeStyle(colors.green, "45°–55°", "ISS band", "8,912")} + {nodeStyle(colors.textMuted, "55°–70°", "GPS-ish", "3,201")} + {nodeStyle(colors.textMuted, "70°–82°", "polar-adj", "1,877")} + {nodeStyle(colors.textMuted, "82°–99°", "polar/SSO", "9,441")} + {nodeStyle(colors.textMuted, "99°+", "retro", "1,050")} +
+ {connector(colors.green)} +
+ ✓ 51.6° → ISS band, 8,912 candidates remain +
+ {connector(colors.amber)} + + {/* Level 2: RAAN */} +
L2: RAAN (coarse — changes rapidly due to J2 precession)
+
+ {nodeStyle(colors.textMuted, "0°–90°", "Q1", "2,156")} + {nodeStyle(colors.amber, "90°–180°", "Q2", "2,304")} + {nodeStyle(colors.textMuted, "180°–270°", "Q3", "2,211")} + {nodeStyle(colors.textMuted, "270°–360°", "Q4", "2,241")} +
+ {connector(colors.amber)} + + {/* Leaf level */} +
+
+
~2,304 leaf entries
+
+ → SGP4 propagate only these TLEs
+ → Check elevation from observer
+ → Return visible passes +
+
+ Pruned 92.3% of catalog before propagation +
+
+
+
+
+ ); +}; + +export default function KTrieLayout() { + const [activeTab, setActiveTab] = useState("pages"); + + const tabs = [ + { id: "pages", label: "Page Layout" }, + { id: "structs", label: "C Structs" }, + { id: "tree", label: "Query Traversal" }, + { id: "levels", label: "Level Semantics" }, + ]; + + return ( +
+ {/* Header */} +
+
+

+ KTrie +

+ + Keplerian Patricia Trie — PostgreSQL Index AM + +
+
+ Adaptive branching · Fixed level semantics · 8kB page-aligned · Patricia path compression +
+
+ + {/* Tabs */} +
+ {tabs.map(t => ( + + ))} +
+ + {/* Page Layout Tab */} + {activeTab === "pages" && ( +
+ 8kB Page Layouts — Internal vs Leaf +
+ + +
+ +
+
+ Capacity Math +
+
+ Page: 8,192B — PG header: 24B — Node header: 72B → 8,096B available
+ Internal: 8,096 / 24B per child = 337 max children (adaptive: use 4–337 based on density)
+ Leaf: 8,096 / 68B per entry = 119 max TLEs per page
+ Split threshold: 85% fill → split along next orbital element dimension
+ Merge threshold: 25% fill → merge with sibling if combined < 70% +
+
+
+ )} + + {/* C Structs Tab */} + {activeTab === "structs" && ( +
+ ktrie.h — Core Data Structures + +
+
+ /* Node type enum */ +
+ + + +
+ +
+
+ /* Level semantics — fixed regardless of branching */ +
+ {levelMeta.map((l, i) => ( + + ))} +
+ +
+
+ typedef struct KTrieNodeHeader {"{"} /* 72 bytes */ +
+ + + + + + + + + + +
{"}"}
+
+ +
+
+ typedef struct KTrieChildEntry {"{"} /* 24 bytes — internal node */ +
+ + + + + +
{"}"}
+
+ +
+
+ typedef struct KTrieLeafEntry {"{"} /* 68 bytes — leaf node */ +
+ + + + + + + + + + +
{"}"}
+
+
+ )} + + {/* Query Traversal Tab */} + {activeTab === "tree" && } + + {/* Level Semantics Tab */} + {activeTab === "levels" && ( +
+ Fixed Level Semantics with Adaptive Fan-Out + + {levelMeta.map((level, i) => ( +
+
+
+ + Level {i} + + {level.name} +
+ {level.unit || "dimensionless"} +
+ +
+ {i === 0 && ( + <> + Regime boundaries: sub-LEO (<300km alt), LEO (300–2,000), MEO (2,000–35,000), GEO (35,000–36,000), HEO/beyond
+ Adaptive range: 5–20 bins typical. LEO subdivides heavily (75% of catalog)
+ Split strategy: Equal-population splits, not equal-range. 6,378–6,578 might be one bin (sparse), 6,578–6,978 might be 8 bins (dense LEO)
+ Query predicate: Altitude range directly maps → instant prune + + )} + {i === 1 && ( + <> + Key clusters: 0° (equatorial), 28.5° (Cape Canaveral), 51.6° (ISS), 55° (GPS), 63.4° (Molniya critical), 97-98° (SSO), 180° (retrograde limit)
+ Adaptive range: 4–32 bins. Fine-grained near SSO (huge population), coarse near 180°
+ Note: Inclination is the most stable element — rarely changes except under thrust. Best discriminator after SMA
+ Special: Flag nodes containing 63.4° ± 0.5° as HAS_RESONANT (Molniya critical inclination) + + )} + {i === 2 && ( + <> + Range: 0°–360°, wraps around
+ Caution: RAAN precesses rapidly due to J2 (~0.5–7°/day depending on altitude/inclination)
+ Adaptive range: 4–8 coarse bins (fine subdivision pointless — RAAN drifts between TLE updates)
+ Reindex trigger: When mean RAAN drift since last reindex exceeds bin width
+ Patricia compression: This level frequently compressed away for near-circular orbits + + )} + {i === 3 && ( + <> + Range: 0.0 (circular) to ~0.95 (extreme HEO)
+ Distribution: Massively skewed — 90%+ of LEO objects have e < 0.02
+ Adaptive range: 2–4 bins. Usually just "near-circular" vs "eccentric" vs "highly elliptical"
+ Patricia compression: Almost always compressed in LEO branches (everything is near-circular)
+ Query relevance: Critical for pass prediction — eccentric orbits have variable ground speed + + )} + {i === 4 && ( + <> + Range: 0°–360°
+ Stability: Precesses due to J2, rate depends on inclination and eccentricity
+ Adaptive range: 2–6 bins, often compressed away entirely
+ Patricia compression: Most aggressively compressed level — rarely needed for spatial queries
+ Primary use: Discriminating within dense clusters (e.g., Starlink shells at same a/i/Ω) + + )} +
+
+ ))} + +
+
+ Patricia Path Compression +
+
+ When a subtree has a single child at levels 2–4 (common in LEO), compress the path:
+ Before: L0(a) → L1(i) → L2(Ω) → L3(e) → L4(ω) → leaf — 5 page reads
+ After: L0(a) → L1(i) → COMPRESSED[Ω,e,ω stored in header] → leaf — 3 page reads
+ Savings: 40% fewer I/O ops for typical LEO queries. Decompresses on split when population grows. +
+
+
+ )} +
+ ); +}