Human-in-the-loop¶
Some investigation steps cannot be decided by automation alone — "should we escalate this user lockout?", "approve this remediation?", "which of these three IPs is actually the suspect?". Human-in-the-loop (HITL) is the mechanism that lets a workflow pause at exactly that point, post a question to Slack, and resume — with the human's answer threaded back into the script — when the analyst clicks a button.
The mechanism is built on the Cy language's pause/resume primitive: a tool registered with hi_latency: True causes the interpreter to suspend, returning a checkpoint the host stores. This page covers what Analysi adds — the Slack delivery layer, three-layer pause propagation, memoised replay, and timeouts.
Source: chat/skills/hitl.md, models/hitl_question.py, slack_listener/, alert_analysis/jobs/control_events.py.
The three Slack tools¶
The Slack integration declares three actions with hi_latency: true in its manifest.json. Calling any of them from a Cy script triggers the pause:
| Cy call | Required params | Effect |
|---|---|---|
app::slack::ask_question(destination, question) |
destination, question |
DM a Slack user |
app::slack::ask_question_channel(destination, question, responses?) |
destination, question (responses is comma-separated button labels, max 5) |
Post in a channel with optional buttons |
app::slack::get_response(question_id) |
question_id |
Wait for the response to a previously-asked question |
The hi-latency tool never executes. At pause time, pending_tool_result is None. The backend takes over delivery — see the flow below.
Pause and resume¶
flowchart LR
classDef cy fill:#dbeafe,stroke:#1d4ed8,color:#1e3a8a
classDef svc fill:#86efac,stroke:#15803d,color:#052e16
classDef ext fill:#fef3c7,stroke:#b45309,color:#78350f
Script["Cy script
calls ask_question_channel(...)"]:::cy
Pause["Cy interpreter pauses
returns ExecutionCheckpoint
(pending_tool_result = None)"]:::cy
Persist["Alerts Worker:
persist checkpoint
insert hitl_questions row (status=pending)
TaskRun.status = paused"]:::svc
Slack["Backend posts via send_hitl_question:
Block Kit message with buttons
update question_ref = message_ts"]:::svc
User(["Analyst clicks button"]):::ext
Match["Notifications Worker:
match (message_ts, channel)
update question row (answered)
emit human:responded"]:::svc
Resume["Resume handler:
inject answer as pending_tool_result
restart interpreter from checkpoint
memoised replay of completed steps"]:::svc
Script --> Pause --> Persist --> Slack --> User --> Match --> Resume --> Script
Two non-obvious details:
- The hi-latency tool entry is purely a marker. The Cy registration for
app::slack::ask_question_channelis the dict form{"fn": wrapper, "hi_latency": True}(task_execution.py:1314) — when the interpreter seeshi_latency: True, it captures the pending call args into a checkpoint and raisesExecutionPausedwithout invokingwrapper. The backend posts to Slack itself. - Resume is memoised. The
ExecutionCheckpointstoresnode_results,pending_tool_args, andvariables. On resume, completed steps replay from cache — the LLM is not re-invoked for steps that already succeeded. Resume is fast and deterministic.
Three-layer pause propagation¶
A pause never stops at the Cy interpreter. It propagates upward through every layer that's currently waiting on the script (chat/skills/hitl.md):
| Layer | Status while paused | Constant |
|---|---|---|
TaskRun |
paused |
TaskConstants.Status.PAUSED (constants.py:32) |
WorkflowNodeInstance (if Task is in a workflow) |
paused |
WorkflowConstants.Status.PAUSED (constants.py:63) |
AlertAnalysis (if workflow is in an alert pipeline) |
paused_human_review |
alert_analysis/jobs/control_events.py:636 |
This guarantees that no upstream process tries to finalise results while a question is outstanding. Resume reverses all three.
The human:responded control event¶
When the analyst clicks a button, the Notifications Worker (running Slack Socket Mode) matches the click to a pending question by (message_ts, channel_id) = (question_ref, channel), marks the question answered, and emits a control event:
The constant is HITLQuestionConstants.CHANNEL_HUMAN_RESPONDED (constants.py:78). The control-event consumer dispatches to the resume handler at alert_analysis/jobs/control_events.py:770, which loads the checkpoint, injects pending_tool_result = answer, and re-runs the Cy interpreter via TaskExecutionService.execute.
The hitl_questions table¶
Each pause creates a row tracked through its lifecycle. The table is monthly-partitioned by created_at. Key columns (models/hitl_question.py):
| Column | Purpose |
|---|---|
tenant_id |
Multi-tenant isolation |
question_ref |
Slack message_ts (set after the message is posted) |
channel |
Slack channel ID |
question_text, options |
What was asked, what buttons were offered |
status |
pending → answered | expired (HITLQuestionConstants.Status) |
answer, answered_by, answered_at |
The reply, the Slack user who clicked, the timestamp |
timeout_at |
Deadline — default now() + 4 hours (HITLQuestionConstants.DEFAULT_TIMEOUT_HOURS) |
task_run_id, workflow_run_id, node_instance_id, analysis_id |
Links back to the paused execution (no FKs because the target tables are partitioned) |
Timeout¶
A reconciliation cron runs every 10 seconds (alert_analysis/jobs/reconciliation.py). It finds pending questions whose timeout_at < now(), marks them expired, and fails the associated TaskRun / WorkflowNodeInstance / AlertAnalysis with an explanatory error message. The default 4-hour timeout is configurable per-question.
Audit trail¶
Every answer is logged as an hitl.question_answered activity event (HITLQuestionConstants.AUDIT_ACTION_ANSWERED — constants.py:81) capturing who answered, what they answered, and when. This feeds the same activity audit trail used for compliance and investigation review.
Where to go next¶
- Authoring: the
app::slack::*tools are documented above; for the upstream Cy pause primitive see the cy-language pause/resume docs. - Where pauses fit: a paused Task inside a paused Workflow inside a paused alert analysis.
- Future delivery channels: only Slack is supported today. Microsoft Teams support is mentioned as planned in the chat skill.