Skip to content

Phase 3 - Humaan-in-the-Loop (HITL)

Overview

In phase 3, we will introduce the capability to pause the execution of an agent task, when certain "high risk" tools are identified for use, and require human intervention prior to proceeding. Achieving this relies on changes made in phases 1 and 2.

Phase 3 Resulting Flow

For tool calls identified as "high risk", instead of directly executing these calls, the application will need to:

  1. Save the relevant tool call details such that they can be resumed after human intervention (this should probably be in the task item which might require extension of its structure).
  2. Place the task in to a paused state
  3. Respond with a payload indicating the call or calls which triggered the HITL requirement as well as a URL, referencing the appropriate request ID(s), which can be invoked to "Approve" or "Reject" the call(s).

Upon receipt of this new response, the client (not considered here) will be responsible for prompting the user for approval or rejection of the tool call(s).

If the tool call(s) is/are approved, the application will need to resume immediately, from the point of the tool call(s) and proceed as normal (this might require an alternate logic path to reach the handler and/or agent invocation).

If the tool call(s) is/are rejected, the application should place the task in to a canceled state and respond with a payload indicating as such.

Approvals and rejections should be recorded somewhere in a persistence layer.

Additional Non-Functional Requirements

  • Keep the definition of the approval/rejection endpoint flexible. We will be introducing additional functionality in subsequent phases which will allow tasks to be paused in cases where the user need to perform authentication or grant consent in order for tool calls to proceed. It would be ideal if this single "resume" endpoint could be used for these cases, as well.
  • The "resume" endpoint should leverage the request_id in the URL to identify which tool call(s) task is being resumed/canceled. There is no need to include session_id or task_id as the request ID, alone, would uniquely identify the task/invocation which requires HITL.
  • User authorization will need to be included in the resume endpoint, and it should be validated that the user calling the endpoint is the same user who initiated the task.
  • The determination of a tool's "risk level" can be left out of scope for this planning. Just assume the check_for_intervention placeholder will determine handle this logic.
  • The payload to-be-sent indicating that tool use approvals are required can be a completely new model object. The existing response objects should not be extended to support this new functionality.
  • I believe we'll have to know the exact structure of the tool call responses from the LLMs in order to determine exactly how we'll need to extend AgentTaskItem to accommodate the persistence of the tool call details. For this reason, don't be specific about exactly how to persist this, just indicate that it will need to be persisted in AgentTaskItem.
  • The "resume" endpoint will either approve or reject ALL tool calls for a given request_id. We will not support partial approvals or rejections.