Agents

In BALROG, agents are entities that typically wrap an LLM client. They are responsible for receiving observations and selecting actions. As such, agents are internally responsible for:

Maintaining Observation/Action Histories: Agents keep a record of past observations and actions to support context-aware decision-making.
Querying LLMs: Agents send observations to the LLM, receive responses, and use those responses to decide on actions.

Pre-built agents

BALROG ships with several pre-built agents:

Agent Type	Description
naive	Outputs actions based on the current action/observation history without any additional reasoning.
icl	Learns from past interactions by updating its memory with observations and actions, then generates an action based on these experiences.
chain_of_thought	Generates actions through step-by-step reasoning, providing a final action output.
self_refine	Generates an action, then iteratively refines it based on feedback until no further improvements are needed.

🤖 Creating Custom Agents

The simple zero-shot agent in naive.py outputs only a single action with no extra reasoning, but this is often suboptimal. We may want the agent to analyze its situation, form and refine plans, interpret image observations, or handle history more effectively.

To build a custom agent, you’ll mainly work with:

balrog/agents/custom.py -> your custom agent file.
balrog/prompt_builder/history.py -> containing the history prompt builder, an helper class to deal with with observation/action history in prompts.

You’re free to modify or create additional files, as long as they don’t interfere with evaluation, logging, or environment processes.

Simple Planning Agent

The following code demonstrates a custom planning agent that stores and follows a plan, updating it as needed. This agent uses the default history prompt builder.

custom.py

from balrog.agents.base import BaseAgent
import re


class CustomAgent(BaseAgent):
    def __init__(self, client_factory, prompt_builder):
        super().__init__(client_factory, prompt_builder)
        self.client = client_factory()
        self.plan = None

    def act(self, obs, prev_action=None):
        if prev_action:
            self.prompt_builder.update_action(prev_action)
        self.prompt_builder.update_observation(obs)

        plan_text = f"Current Plan:\n{self.plan}\n" if self.plan else "You have no plan yet.\n"

        planning_instructions = """
Review the current plan above if present. Decide whether to continue with it or make changes.
If you make changes, provide the updated plan. Then, provide the next action to take. 
You must output an action at every step.
Format your answer in the following way:
PLAN: <your updated plan if changed, or "No changes to the plan." if the current plan is good>
ACTION: <your next action>
        """.strip()

        messages = self.prompt_builder.get_prompt()
        if messages and messages[-1].role == "user":
            messages[-1].content += "\n\n" + plan_text + "\n" + planning_instructions

        response = self.client.generate(messages)

        # Extract the plan and action from the LLM's response
        plan, action = self._extract_plan_and_action(response.completion)

        # Update the internal plan if it has changed
        if plan != "No changes to the plan.":
            self.plan = plan

        # Save the plan in the response.reasoning field and the action in response.completion
        response = response._replace(reasoning=plan, completion=action)
        return response

    def _extract_plan_and_action(self, response_text):
        plan_match = re.search(r"PLAN:\s*(.*?)(?=\nACTION:|\Z)", response_text, re.IGNORECASE | re.DOTALL)
        action_match = re.search(r"ACTION:\s*(.*)", response_text, re.IGNORECASE | re.DOTALL)

        plan = plan_match.group(1).strip() if plan_match else "No changes to the plan."
        action = action_match.group(1).strip() if action_match else None

        return plan, action

Experiment with this example or explore additional templates from repositories like LangGraph. Feel free to contribute by opening a PR with your own reasoning templates.