Implementation Plan for Manual Tool Calling (with Code)
This document outlines the plan to refactor the agent invocation process to allow for manual interception of tool calls, as described in context/manual-tool-call.md
. This change is a prerequisite for implementing Human-in-the-Loop (HITL) functionality.
The implementation will follow the "de-abstracted" pattern prototyped in src/sk_agents/skagents/v1/sk_agent_v2.py
and will be integrated into the new stateful API version tealagents/v1alpha1
.
1. High-Level Strategy
We will replace the high-level agent.invoke()
and agent.invoke_stream()
calls with a manual orchestration of the LLM interaction within our Agent
. This provides a clear interception point to inspect tool calls before they are executed.
2. New HITL Placeholder Module
A new placeholder module will be created to establish a clean interception point for future HITL logic.
File to Create: src/sk_agents/tealagents/v1alpha1/hitl_manager.py
# src/sk_agents/tealagents/v1alpha1/hitl_manager.py
from semantic_kernel.contents.function_call_content import FunctionCallContent
def check_for_intervention(tool_call: FunctionCallContent) -> bool:
"""
Placeholder for HITL logic. In the future, this will check
if the tool call requires user consent based on configured policies.
Returns False for now, allowing all calls to proceed without interruption.
"""
# TODO: Implement actual policy checks for high-risk tools.
print(f"HITL Check: Intercepted call to {tool_call.plugin_name}.{tool_call.function_name}. Allowing to proceed.")
return False
3. Refactoring the Agent Handler
The core changes will be in the Agent
.
File to Modify: src/sk_agents/tealagents/v1alpha1/agent.py
3.1. Add Imports and Helper Method
We will add necessary imports and the _invoke_function
helper method, adapted directly from sk_agent_v2.py
.
# Add to imports at the top of src/sk_agents/tealagents/v1alpha1/agent.py
import asyncio
from functools import reduce
from semantic_kernel.connectors.ai.chat_completion_client_base import ChatCompletionClientBase
from semantic_kernel.contents.function_call_content import FunctionCallContent
from semantic_kernel.contents.function_result_content import FunctionResultContent
from semantic_kernel.contents.streaming_chat_message_content import StreamingChatMessageContent
# Import the new HITL placeholder
from sk_agents.hitl import hitl_manager
# ... existing class definition for Agent ...
# Add this helper method inside the TealAgentsV1Alpha1Handler class
async def _invoke_function(self, kernel: "Kernel", fc_content: FunctionCallContent) -> FunctionResultContent:
"""Helper to execute a single tool function call."""
function = kernel.get_function(
fc_content.plugin_name,
fc_content.function_name,
)
function_result = await function(kernel, fc_content.to_kernel_arguments())
return FunctionResultContent.from_function_call_content_and_result(
fc_content, function_result
)
3.2. Updated invoke
Method (Non-Streaming)
The invoke
method will be replaced with the following code to manually handle the tool-calling loop.
# Replace the existing 'invoke' method in TealAgentsV1Alpha1Handler
async def invoke(self, history: ChatHistory) -> AsyncIterable[ChatMessageContent]:
kernel = self.agent.kernel
arguments = self.agent.arguments
chat_completion_service, settings = kernel.select_ai_service(
arguments=arguments, type=ChatCompletionClientBase
)
assert isinstance(chat_completion_service, ChatCompletionClientBase)
# Initial call to the LLM
response_list = await chat_completion_service.get_chat_message_contents(
chat_history=history,
settings=settings,
kernel=kernel,
arguments=arguments,
)
function_calls = []
# Separate content and tool calls
for response in response_list:
# A response may have multiple items, e.g., multiple tool calls
fc_in_response = [item for item in response.items if isinstance(item, FunctionCallContent)]
if fc_in_response:
history.add_message(response) # Add assistant's message to history
function_calls.extend(fc_in_response)
else:
# If no function calls, it's a direct answer
yield response
# If tool calls were returned, execute them
if function_calls:
# --- INTERCEPTION POINT ---
for fc in function_calls:
hitl_manager.check_for_intervention(fc)
# In the future, a `True` return would trigger a pause flow.
# Execute all functions in parallel
results = await asyncio.gather(
*[self._invoke_function(kernel, fc) for fc in function_calls]
)
# Add results to history
for result in results:
history.add_message(result.to_chat_message_content())
# Make a recursive call to get the final response from the LLM
async for final_response in self.invoke(history):
yield final_response
3.3. Updated invoke_stream
Method (Streaming)
The invoke_stream
method will be replaced to handle streaming and tool calls correctly.
# Replace the existing 'invoke_stream' method in TealAgentsV1Alpha1Handler
async def invoke_stream(self, history: ChatHistory) -> AsyncIterable[StreamingChatMessageContent]:
kernel = self.agent.kernel
arguments = self.agent.arguments
chat_completion_service, settings = kernel.select_ai_service(
arguments=arguments, type=ChatCompletionClientBase
)
assert isinstance(chat_completion_service, ChatCompletionClientBase)
all_responses = []
# Stream the initial response from the LLM
async for response_list in chat_completion_service.get_streaming_chat_message_contents(
chat_history=history,
settings=settings,
kernel=kernel,
arguments=arguments,
):
for response in response_list:
all_responses.append(response)
if response.content:
yield response # Yield content chunks to the client immediately
# Aggregate the full response to check for tool calls
if not all_responses:
return
full_completion: StreamingChatMessageContent = reduce(lambda x, y: x + y, all_responses)
function_calls = [
item
for item in full_completion.items
if isinstance(item, FunctionCallContent)
]
# If tool calls are present, execute them
if function_calls:
history.add_message(message=full_completion.to_chat_message_content())
# --- INTERCEPTION POINT ---
for fc in function_calls:
hitl_manager.check_for_intervention(fc)
# Execute functions in parallel
results = await asyncio.gather(
*[self._invoke_function(kernel, fc) for fc in function_calls]
)
# Add results to history
for result in results:
history.add_message(result.to_chat_message_content())
# Make a recursive call to get the final streamed response
async for final_response_chunk in self.invoke_stream(history):
yield final_response_chunk
4. Testing Strategy
The testing strategy remains the same as the previous plan, but the tests in tests/test_tealagents_handler.py
will now be written against this specific implementation, verifying:
- Simple chat works without regression.
- Tool calls are correctly identified and executed.
- The
hitl_manager.check_for_intervention
function is called for each tool invocation. - The final response after tool execution is correct for both streaming and non-streaming modes.