to train from your agent’s behavior, you first pull its traces out of your observability platform. an adapter connects to your provider, fetches the traces, and normalizes them into a consistent format, so the rest of the pipeline works the same regardless of where your traces came from.
quickstart
connect to your provider and fetch a project’s traces:
from benchmax.traces.braintrust.adapter import BraintrustTraceAdapter
adapter = BraintrustTraceAdapter(api_key="bt_...")
adapter.connect() # validates credentials
projects = adapter.list_projects()
traces, cursor = adapter.fetch_traces(project_id=projects[0].id, limit=500)
braintrust is the only built-in provider today. to connect another source, implement the TraceAdapter protocol and register it in benchmax.traces.registry.
fetching all traces
fetch_traces is paginated. loop with the returned cursor to pull an entire project:
all_traces = []
cursor = None
while True:
batch, cursor = adapter.fetch_traces(project_id=projects[0].id, limit=100, cursor=cursor)
all_traces.extend(batch)
if cursor is None:
break
supported message formats
the adapter normalizes each message in a trace into the standard form below, auto-detecting the common provider shapes:
- openai: tool calls in the standard nested form (
tool_calls[].function.{name, arguments}), the flat form (tool_calls[].{name, arguments}), and the legacyfunctioncall - anthropic / openclaw: structured
contentblocks (textandtoolCall) - role and field aliases:
toolResult→tool,toolCallId→tool_call_id,toolName→name
NormalizedTrace format
all adapters return NormalizedTrace objects with a consistent structure:
| field | type | description |
|---|---|---|
id | str | unique trace identifier |
messages | list[TraceMessage] | the conversation (system, user, assistant, tool messages) |
scores | dict[str, float] | provider-reported scores (e.g. task success, accuracy) |
metadata | dict | provider-specific metadata (task ID, model, timestamp) |
errors | list[str] | any extraction errors encountered |
each TraceMessage has role, content, and optionally tool_calls (list of ToolCall with name, arguments, id).
next steps
once you have traces, process them into training examples.