3 Commits

Author SHA1 Message Date
c995bc2712 route_plan: translation_chain includes Device DNs (cti-audit-prompts/007)
cucx-docs's 007 empirically proved that route_translation_chain's
candidate filter `WHERE np.tkpatternusage IN (3, 5, 7)` excluded
Device DNs (tkpatternusage=2), which caused false-positive HIGH
findings on CTI-RP-to-CTI-RP failsafe chains — the typical CER
deployment shape.

The Bingham canary: 911-CTI-RP CFNA → 912 (DN of 912-CTI-RP) under
911CER-CSS. Direct numplan query against 911CER-PT returns 26 rows;
translation_chain reported `candidates_evaluated: 23`. The 3-row
gap is exactly the 3 Device DNs, excluded by the pre-fix filter
regardless of input number.

CUCM's runtime CFNA matcher includes Device DNs (otherwise no one
could dial 912 and reach the device). My tool's exclusion diverged
from production routing semantics. Result: every cluster using a
CTI-RP-to-CTI-RP failsafe pattern got at least one false-positive
HIGH finding on its first cti_failsafe_reachability run, wasting
operator investigation time on a phantom defect.

This commit broadens the candidate filter:

  - WHERE np.tkpatternusage IN (3, 5, 7)
  + WHERE np.tkpatternusage IN (2, 3, 5, 7)
                                ^
                              Device DN

Side effect: route_translation_chain now also surfaces Device DNs as
matches when called directly, which matches production routing
semantics. Existing callers benefit automatically.

The _note in the response now names the candidate set explicitly so
future readers don't have to dig into the SQL to know what's
included.

Updated comment block above the WHERE clause documents:
  - which tkpatternusage values are included and why
  - the empirical observation that motivated including Device DNs
  - cross-reference to cti-audit-prompts/007 for the smoking-gun
    candidates_evaluated:23-vs-26 evidence

Tests: +2 in TestDeviceDnInTranslationChainCandidates:

  - test_translation_chain_sql_includes_device_dn_usage: lock the
    SQL down so a future contributor can't re-narrow the filter to
    (3, 5, 7) and re-introduce the false-positive class
  - test_cti_rp_to_cti_rp_failsafe_does_not_false_positive: the
    Bingham canary scenario — 911-CTI-RP forwarding to a Device DN
    in a reachable partition correctly produces zero findings

The dispatch fake's SQL match-string updated from "(3, 5, 7)" to
"(2, 3, 5, 7)" to keep the existing 31 cti tests green; net
mcaxl suite: 269 → 271 passing.

Live re-run pending — will ping the agent thread with post-patch
output once the MCP server reloads.

Re-run expectations (per cucx-docs's 007):
  - 911-CTI-RP / 912 finding (CFNA + CFUR): GONE — Device DN matches
  - 912-CTI-RP / 10911 finding: UNCHANGED — Route pattern still
    unreachable (CER911-PT not in 911CER-CSS)
  - 913-CTI-RP / 60003 finding: UNCHANGED — destination doesn't
    exist anywhere

Findings: 6 → 4 (the 4 that actually matter).
2026-05-09 04:37:41 -06:00
99986daa45 route_plan: cti_failsafe_reachability fix-suggestion handles dotted patterns
Limitation surfaced by the live Bingham smoke-test (cti-audit-prompts/004):
the canonical 912-CTI-RP finding got the broken-forward flag correct,
but the suggested-fix message couldn't name CER911-PT (where pattern
'10.911' lives) because the exact-literal lookup
`WHERE np.dnorpattern = '10911'` doesn't match the dot-form `10.911`.

The CUCM separator-dot in patterns is purely visual — represents
access-code boundary, not a digit. A destination string `10911`
should match a configured pattern `10.911` since both represent the
same dialed digits.

Two-stage match in _suggest_failsafe_fix:

  1. Exact-literal: WHERE np.dnorpattern = '<dest>' (current behavior)
  2. Dot-stripped: pull all patterns with `.` in them, filter
     Python-side by `pattern.replace('.', '') == dest`

Stage 2 only runs when stage 1 returns no partitions, so the common
case (exact-literal hit) takes the fast path. Falls back to the
wildcard-investigation generic message only when neither stage finds
a match.

The fix message also distinguishes the two cases:
  - Exact-literal hit → "Pattern '10911' lives in partition X..."
  - Dot-stripped hit → "Pattern '10.911' (matches destination '10911')
    lives in partition X..."

Naming both the pattern form and the destination keeps the operator
oriented when the dialed digits and the configured pattern look
different.

Tests: +5 in TestDotStrippedFixSuggestion exercising:
  - dot-stripped match cites the dotted pattern form
  - exact-literal takes precedence over dotted match
  - multi-partition dotted match
  - no-exact-no-dotted falls back to generic
  - irrelevant dot-positions correctly excluded from match

One existing assertion updated from "no exact-literal pattern" to
"no exact-literal or dot-stripped pattern" (more accurate after the
patch).

Full mcaxl suite: 264 → 269 passing (+5 dot-stripped tests).
The 1 unrelated test_wildcard.py timing flake is pre-existing
(regex-backtracking timing assertion fails by 36ms under load).

Cross-references:
  - Live smoke-test findings: agent-threads/cti-audit-prompts/004
  - Original tool: agent-threads/cti-audit-prompts/002, commit d33cd7c
2026-05-09 03:36:51 -06:00
d33cd7c809 route_plan: add cti_failsafe_reachability tool
Closes the bug class cucx-docs flagged at Bingham — a CTI Route
Point's CFNA destination points at a number that is structurally
unreachable from the configured CFNA-CSS, so the failsafe forward
fires but finds no matching pattern and the call dies. Invisible
from any single-record inspection (CTI RP record looks fine,
destination pattern exists in some partition, CSS is fine — defect
lives in the relationship between CFNA-CSS and destination's
partition).

The motivating Bingham finding (life-safety severity):

  912-CTI-RP (Secondary CER) CFNA + CFUR → "10911" via 911CER-CSS
  Pattern "10.911" exists in CER911-PT
  911CER-CSS does NOT contain CER911-PT
  → failsafe is structurally broken; both CER servers down would
    produce fast-busy on 911 calls instead of routing through ELIN-10
    to the PSAP

Implementation per axl/agent-threads/cti-audit-prompts/002:

  - Tool, not prompt — output is structured + deterministic; same
    shape as route_patterns_targeting (Q1 confirmed as proposed)
  - Three-tier severity: HIGH for life-safety descriptions, MEDIUM
    for non-life-safety, no LOW (Q2 refined from cucx-docs's
    binary proposal — every broken forward is a real bug, just not
    all are 911)
  - Scope: CFNA + CFUR only for v1; CFB excluded by design (Q3
    confirmed — CTI RPs rarely go busy)
  - Lives in route_plan.py alongside route_patterns_targeting +
    device_grep + translation_chain (Q5 — defer cti.py namespace
    until adjacent prompts land)
  - Named cti_failsafe_reachability not _audit (Q4 — drops the
    _audit suffix per the established tool-vs-prompt naming split;
    tools use direct-action names, prompts use _audit)

Life-safety token list (case-insensitive substring match against
name AND description):

  ("emergency", "911", "cer", "psap", "panic", "alert")

Suggested-fix message names the partition where the destination's
pattern lives and proposes either "add partition X to CSS Y" or
"change CSS to a CSS containing partition X." Falls back to a
generic "manual investigation needed" message when the destination
matches no exact-literal pattern in any partition (often means a
wildcard pattern is the actual target).

Tests: 26 in TestLifeSafetyDetection + TestCtiFailsafeReachability:

  - 16 token-matching cases (10 positive, 4 negative, 2 sentinel)
  - 10 tool-level cases including the canonical Bingham bug
    reproduced verbatim (assertion compares the entire finding dict
    to the expected output from cucx-docs's 001 message)

Full mcaxl suite: 238 → 264 passing (+26 from this work).

Adjacent prompts cucx-docs flagged as lower-priority follow-ups
(cti_route_point_audit, cti_port_pool_audit,
cti_application_user_audit) deferred but tracked.
2026-05-09 03:28:49 -06:00