[FIX] reap orphan ns when CR deletion missed by operator #2
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "ep/May11-2026/ReapOrphanNs"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If the operator was down when an Environment CR was deleted (TTL janitor
reap or
heph down), the DELETED watch event landed at an older RV thanthe next
listAndWatchseed and was lost —handleDeletenever fired,leaving the namespace
Activewith no CR. Symptom: ns annotationhephaestus.io/last-seenhours pastttl, nodeletionTimestamp, nomatching
EnvironmentCR. User reads as "TTL didn't enforce."Janitor and EnvironmentController now both list heph-managed namespaces
and
deleteNamespaceany whosehephaestus.io/sluglabel has no liveCR. Janitor pass runs every 15s tick; controller pass runs once on
startup before the watch loop, closing the window immediately on boot.
Both gated by leader, both bail on CR-list error (empty-CR-set treated
as authoritative only when the list call succeeded).
findOrphanNamespacesis pure-fn + exported, covered by 5 new unittests in
operator/test/janitor.test.ts.Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com