Test integrity

Catch test edits that weaken assertions to turn red green, the moment the diff lands.

When a test fails, the path of least resistance is to weaken the test. An agent swaps an assertion for assert True, replaces an integration call with a Mock(), or sprinkles @pytest.mark.skip to make red go green. You want an LLM reviewer that reads the edit diff and blocks these patterns before they merge.

"""Block test edits that weaken assertions to make a failing test pass."""

from __future__ import annotations

from captain_hook import (
    BaseHookEvent,
    Event,
    HookResult,
    Prompt,
    SourceEdits,
    TestFile,
    Tool,
    on,
    prompt_check,
)

INTEGRITY_TEMPLATE = """
You are reviewing a test edit for signs the agent weakened tests to make them pass.

Block if you see any of:
- An assertion replaced by `assert True`, `pass`, or a no-op.
- A real call replaced by a `Mock()` that defeats the test's purpose.
- A bulk addition of `@pytest.mark.skip` or `pytest.skip(...)` without justification.
- An integration boundary (DB, HTTP, file I/O) swapped for a stub.

File: {fp}

--- old ---
{old}
--- new ---
{new}
"""


@on(Event.PostToolUse, only_if=[SourceEdits(lang="py", include_tests=True), TestFile(), Tool("Edit")])
def guard_test_edits(evt: BaseHookEvent) -> HookResult | None:
    if not (fp := evt.file) or not (old := evt.old) or not (new := evt.content):
        return None
    return prompt_check(
        evt,
        Prompt.from_template(INTEGRITY_TEMPLATE, fp=fp.path, old=old, new=new),
        prefix="TEST INTEGRITY",
        suffix=" If unsure whether the change weakens the test, allow.",
    )

Verified by replay

The weakening verdict is an LLM judgment, so it isn’t asserted by an inline test — the model’s decision is exercised by replaying real test-edit sessions through the hook.

What it catches

# assert exercise_count == 3  ->  assert True             # assertion gutted to a no-op
# total = service.charge(card)  ->  total = Mock()         # real call swapped for a stub
# @pytest.mark.skip("flaky")                               # bulk skip with no justification
# db.execute(query)  ->  return FAKE_ROW                   # integration boundary stubbed out

What it allows

# Edit on src/billing/charge.py                            # not a test file — only_if skips it
# Write on tests/test_billing.py                           # not an Edit — Tool("Edit") skips it
# Edit on tests/test_billing.py tightening an assertion    # LLM allows; doesn't weaken the test

The catch / allow split mirrors the hook’s only_if narrowing and prompt, so it stays true as the hook evolves.

Run it yourself

uvx capt-hook --hooks docs/examples test

What it catches

What it allows

Run it yourself

See also