Scoring sentiment

Wrap an inference model in a composable ScoreSpec whose deterministic stages short-circuit or post-process its score.

Sentiment scoring rates how a user feels across a slice of a session — a 1 is hostile, a 5 is happy. The model that produces that number is an InferenceEngine, but you rarely run it bare. You wrap it in a ScoreSpec: an ordered tuple of small, deterministic stages that bracket the model.

Each stage acts in one of two phases, and the phase is intrinsic to the stage’s type — you don’t configure it:

Short-circuit stages run before inference. When one matches, it returns a score outright and the model never sees that bucket. FrustrationShortCircuit is the only short-circuit stage: an explicit “wtf, this is broken” needs no model to read it.
Post-process stages run after inference. They fold the model’s raw score through deterministic adjustments — clamping an over-eager 5, softening a harsh 1. PositiveClamp, MildIrritationDemote, and ResumeClamp are post-process stages.

FilteredEngine is the wrapper that ties the two together: it asks the spec which buckets short-circuit, runs the inner engine only on the rest, then post-processes every score. Every deterministic stage executes in Rust; only model inference stays Python-side. See The Rust engine for the executor and its correctness story.

Setup

A ConversationBucket groups one session’s user and assistant events into a short time window (three minutes by default), and bucket_events lifts those windows straight off the parsed event stream. A real InferenceEngine runs a model over each bucket’s user text. To keep this guide hermetic — no model download, no GPU, no dependence on your ~/.claude — we parse an inline transcript that spans three bucket windows and stand in a ConstantEngine that scores every bucket it’s handed a flat 5. That isolates the spec: any score that isn’t 5 is the work of a deterministic stage, not the model.

from cc_transcript import UserEvent, parse_events_from_bytes
from cc_transcript.sentiment import (
    FilteredEngine,
    SentimentScore,
    bucket_events,
    build_score_spec,
    clamp_resume,
    flag_frustration,
)

TRANSCRIPT = b"""\
{"type":"user","uuid":"u1","sessionId":"sess-1","timestamp":"2026-01-02T03:00:05.000Z","message":{"role":"user","content":"this is completely useless, wtf"}}
{"type":"assistant","uuid":"a1","sessionId":"sess-1","timestamp":"2026-01-02T03:00:10.000Z","message":{"role":"assistant","model":"claude-opus-4-7","stop_reason":"end_turn","content":[{"type":"text","text":"ack"}]}}
{"type":"user","uuid":"u2","sessionId":"sess-1","timestamp":"2026-01-02T03:03:05.000Z","message":{"role":"user","content":"looks reasonable, what about the edge case"}}
{"type":"assistant","uuid":"a2","sessionId":"sess-1","timestamp":"2026-01-02T03:03:10.000Z","message":{"role":"assistant","model":"claude-opus-4-7","stop_reason":"end_turn","content":[{"type":"text","text":"ack"}]}}
{"type":"user","uuid":"u3","sessionId":"sess-1","timestamp":"2026-01-02T03:06:05.000Z","message":{"role":"user","content":"continue"}}
{"type":"assistant","uuid":"a3","sessionId":"sess-1","timestamp":"2026-01-02T03:06:10.000Z","message":{"role":"assistant","model":"claude-opus-4-7","stop_reason":"end_turn","content":[{"type":"text","text":"ack"}]}}
"""

buckets = bucket_events(parse_events_from_bytes(TRANSCRIPT))
[(int(b.bucket_index), [e.text for e in b.events if isinstance(e, UserEvent)]) for b in buckets]
# -> [(0, ['this is completely useless, wtf']),
#     (1, ['looks reasonable, what about the edge case']),
#     (2, ['continue'])]

[(0, ['this is completely useless, wtf']),
 (1, ['looks reasonable, what about the edge case']),
 (2, ['continue'])]

Each event lands in the window its meta.timestamp falls into. Only UserEvent text is scored; the assistant turns ride along to mark each window as a real exchange. The stand-in engine:

class ConstantEngine:
    """A stand-in InferenceEngine: every inferred bucket scores 5 (a real one runs a model)."""

    async def score(self, buckets, on_progress=lambda _: None):
        return [SentimentScore(5) for _ in buckets]

    def peak_memory_gb(self):
        return 0.0

    async def close(self):
        return None

Run the spec

Compose a spec from two pure stages — flag_frustration() (short-circuit) and clamp_resume() (post-process) — and wrap the engine. FilteredEngine.score is async; inside a Quarto cell you drive it with top-level await. Do not reach for anyio.run or asyncio.run — the build kernel already has a running event loop, so those raise RuntimeError: ... already running.

spec = build_score_spec(flag_frustration(), clamp_resume())
engine = FilteredEngine(inner=ConstantEngine(), spec=spec)

# The engine is async; in a Quarto cell drive it with top-level await
# (anyio.run / asyncio.run raise "already running asyncio" under the build kernel).
scores = await engine.score(buckets)
[int(s) for s in scores]
# -> [1, 5, 3]   (frustration short-circuits to 1; bucket 1 keeps the model's 5;
#                  bucket 2's "continue" post-clamps 5 -> 3)

[1, 5, 3]

Read [1, 5, 3] bucket by bucket:

Bucket 0 → 1. "this is completely useless, wtf" matches the frustration groups, so flag_frustration short-circuits to 1. The model never runs on this bucket — ConstantEngine is never asked for it.
Bucket 1 → 5. "looks reasonable, what about the edge case" carries no special signal, so the model’s score stands. Here that’s the stand-in’s flat 5.
Bucket 2 → 3. "continue" is a bare resume phrase, so clamp_resume post-clamps the model’s 5 down to 3 — a one-word “keep going” is neutral, not delight.

The four stage builders

You never construct stages directly. Four builders return them pre-wired with the library’s regex groups, phrase sets, and defaults. Two are pure (regex or set-membership only) and two are lexicon-gated (they consult the sentiment lexicon — see below):

Builder	Stage	Phase	Determinism
`flag_frustration(*, score=1)`	`FrustrationShortCircuit`	short-circuit	pure
clamp_resume()	`ResumeClamp`	post-process	pure
`clamp_positive(*, max_words=SHORT_MESSAGE_MAX_WORDS)`	`PositiveClamp`	post-process	lexicon-gated
demote_mild_irritation()	`MildIrritationDemote`	post-process	lexicon-gated

flag_frustration and clamp_resume are the two you just ran. The other two add sharper post-processing by consulting the lexicon — deterministic in-package data, so they run here like everything else:

from cc_transcript.sentiment import build_score_spec, clamp_positive, demote_mild_irritation, flag_frustration, clamp_resume

# A fuller spec. clamp_positive lowers an over-eager 5 to 3 on a terse message
# that lacks any positive word; demote_mild_irritation softens a frustration-1
# down to 2 for mild impatience ("...again", "for the third time") that is not hostile.
spec = build_score_spec(
    flag_frustration(),
    clamp_positive(),
    demote_mild_irritation(),
    clamp_resume(),
)

Stage order is significant: short-circuit stages are consulted first (a frustration hit ends the bucket immediately), then the post-process stages fold in the order you list them. clamp_positive only fires on a model 5, demote_mild_irritation only on a 1 — they target the score the model is most likely to over-state.

clamp_positive and demote_mild_irritation both call Lexicon.has_hit to decide whether a message actually carries positive or hostile sentiment. That lexicon is the subject of the next section.

The lexicon

Lexicon.has_hit(text, *, want_negative) tokenizes a message with the embedded UDPipe model and scores each token’s lowercased surface for polarity — domain overrides first, then AFINN, with a negated token’s polarity sign-flipped — reporting whether any token reaches the polarity floor (+3, or -3 when want_negative=True). That’s what lets clamp_positive distinguish a terse-but-warm “perfect, ship it” from a flat “ok” — and demote_mild_irritation tell genuine hostility from mere impatience.

The lookup is surface-form: AFINN calibrates inflections as separate rows (“broken” carries a different strength than “break”), so tokens match the tables exactly as written. Both tables ship inside the package as TSV data, and the Rust engine compiles them in at build time. There is no extra to install and nothing to download.

No preset spec

The library ships no default ScoreSpec. Frustration thresholds, which clamps to apply, and how hard to clamp are policy, and policy belongs to the consumer — a tool like cc-sentiment composes its own spec from these builders. To assemble and tune your own, see Compose your own policy.