Partnership opportunities

Secure your pass

Call to action
Your text goes here. Insert your content, thoughts, or information in this space.
Button

Back to speakers

Jonatan
von Martens
AI Safety Engineering
ElevenLabs
Jonatan von Martens is an AI safety engineer at ElevenLabs, working at the intersection of model behavior, reliability, and responsible deployment. His work focuses on identifying and mitigating failure modes in production AI systems, helping ensure advanced generative models are safe, robust, and fit for real-world use.
Button
15 April 2025 16:30 - 17:00
Panel | Evaluating autonomous agents: Closing the gap between tests and real-world behaviour
Evaluating autonomous agents is fundamentally harder than evaluating static models or prompt-based systems. Behavior unfolds over sequences of actions, interacts with tools and environments, and changes under real traffic in ways that are difficult to capture with offline tests alone. In this panel, engineers and system builders compare how they evaluate agent behavior in practice. The discussion will explore where traditional testing breaks down, how teams reason about trajectories rather than single outputs, and what signals matter most once agents are operating in dynamic, real-world environments. Expect candid perspectives on what works, what doesn’t, and where evaluation remains an open problem for agentic systems.