iclbench package
Subpackages
- iclbench.agents package
- iclbench.environments package
- Subpackages
- Submodules
- iclbench.environments.env_wrapper module
- Module contents
- iclbench.prompt_builder package
Submodules
iclbench.client module
- class iclbench.client.ClaudeWrapper(client_config)[source]
Bases:
LLMClientWrapper
- class iclbench.client.GoogleGenerativeAIWrapper(client_config)[source]
Bases:
LLMClientWrapper
- class iclbench.client.LLMResponse(model_id, completion, stop_reason, input_tokens, output_tokens, reasoning)
Bases:
tuple- completion
Alias for field number 1
- input_tokens
Alias for field number 3
- model_id
Alias for field number 0
- output_tokens
Alias for field number 4
- reasoning
Alias for field number 5
- stop_reason
Alias for field number 2
- class iclbench.client.OpenAIWrapper(client_config)[source]
Bases:
LLMClientWrapper
- class iclbench.client.ReplicateWrapper(client_config)[source]
Bases:
LLMClientWrapper
iclbench.dataset module
iclbench.evaluator module
- class iclbench.evaluator.Evaluator(env_name, config, original_cwd='')[source]
Bases:
objectClass to evaluate an agent on a set of tasks in a given environment.
The Evaluator class is responsible for orchestrating the evaluation of agents across multiple tasks within a specified environment. It manages the setup of the environment, runs episodes, logs results, and can execute evaluations in parallel or sequentially.
- Variables:
env_name (str) – Name of the environment in which the agent operates.
config (Config) – Configuration object containing evaluation parameters.
tasks (list) – List of tasks for the specified environment.
num_episodes (int) – Number of episodes to run for each task.
num_workers (int) – Number of parallel worker processes to use.
max_steps_per_episode (int) – Maximum number of steps per episode.
dataset (InContextDataset) – Dataset object for managing in-context learning tasks.
- __init__(env_name, config, original_cwd='')[source]
Initializes the Evaluator with environment name and configuration.
- Parameters:
env_name (str) – Name of the environment.
config (Config) – Configuration object with evaluation parameters.
original_cwd (str, optional) – Original current working directory. Defaults to “”.
- load_in_context_learning_episode(i, task, agent, episode_log)[source]
Loads and executes an in-context learning episode for the specified task.
- Parameters:
i (int) – Index of the in-context learning episode.
task (str) – Name of the task to be evaluated.
agent (BaseAgent) – The agent being evaluated.
episode_log (dict) – Log to record episode results.
- run(agent_factory)[source]
Executes the evaluation process either sequentially or in parallel.
- Parameters:
agent_factory (AgentFactory) – Factory to create instances of the agent.
- Returns:
Summary of the results for all tasks.
- Return type:
dict
- run_episode(task, agent, process_num=None, position=0)[source]
Executes a single evaluation episode for the specified task.
- Parameters:
task (str) – Name of the task to be evaluated.
agent (BaseAgent) – The agent being evaluated.
process_num (int, optional) – Process number for logging purposes. Defaults to None.
position (int, optional) – Position for progress bar. Defaults to 0.
- Returns:
- Log of the episode results including trajectory, action frequency,
and performance metrics.
- Return type:
dict