[FIX] reap orphan ns when CR deletion missed by operator #2

Merged
dunemask merged 1 commit from ep/May11-2026/ReapOrphanNs into main 2026-05-11 20:11:54 +00:00
Owner

If the operator was down when an Environment CR was deleted (TTL janitor
reap or heph down), the DELETED watch event landed at an older RV than
the next listAndWatch seed and was lost — handleDelete never fired,
leaving the namespace Active with no CR. Symptom: ns annotation
hephaestus.io/last-seen hours past ttl, no deletionTimestamp, no
matching Environment CR. User reads as "TTL didn't enforce."

Janitor and EnvironmentController now both list heph-managed namespaces
and deleteNamespace any whose hephaestus.io/slug label has no live
CR. Janitor pass runs every 15s tick; controller pass runs once on
startup before the watch loop, closing the window immediately on boot.
Both gated by leader, both bail on CR-list error (empty-CR-set treated
as authoritative only when the list call succeeded).

findOrphanNamespaces is pure-fn + exported, covered by 5 new unit
tests in operator/test/janitor.test.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

If the operator was down when an Environment CR was deleted (TTL janitor reap or `heph down`), the DELETED watch event landed at an older RV than the next `listAndWatch` seed and was lost — `handleDelete` never fired, leaving the namespace `Active` with no CR. Symptom: ns annotation `hephaestus.io/last-seen` hours past `ttl`, no `deletionTimestamp`, no matching `Environment` CR. User reads as "TTL didn't enforce." Janitor and EnvironmentController now both list heph-managed namespaces and `deleteNamespace` any whose `hephaestus.io/slug` label has no live CR. Janitor pass runs every 15s tick; controller pass runs once on startup before the watch loop, closing the window immediately on boot. Both gated by leader, both bail on CR-list error (empty-CR-set treated as authoritative only when the list call succeeded). `findOrphanNamespaces` is pure-fn + exported, covered by 5 new unit tests in `operator/test/janitor.test.ts`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[FIX] reap orphan ns when CR deletion missed by operator
Some checks failed
ci / check (pull_request) Has been cancelled
integration / integration (pull_request) Has been cancelled
2af7791e2f
If the operator was down when an Environment CR was deleted (TTL janitor
reap or `heph down`), the DELETED watch event landed at an older RV than
the next `listAndWatch` seed and was lost — `handleDelete` never fired,
leaving the namespace `Active` with no CR. Symptom: ns annotation
`hephaestus.io/last-seen` hours past `ttl`, no `deletionTimestamp`, no
matching `Environment` CR. User reads as "TTL didn't enforce."

Janitor and EnvironmentController now both list heph-managed namespaces
and `deleteNamespace` any whose `hephaestus.io/slug` label has no live
CR. Janitor pass runs every 15s tick; controller pass runs once on
startup before the watch loop, closing the window immediately on boot.
Both gated by leader, both bail on CR-list error (empty-CR-set treated
as authoritative only when the list call succeeded).

`findOrphanNamespaces` is pure-fn + exported, covered by 5 new unit
tests in `operator/test/janitor.test.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dunemask deleted branch ep/May11-2026/ReapOrphanNs 2026-05-11 20:12:01 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
dunemask/hephaestus!2
No description provided.