Red teaming generative AI at scale
"Red teaming generative AI at scale," presented by Giulio Zizzo (IBM Research), focuses on evaluating and improving the robustness of large language models using scalable red teaming methodologies and tooling. This session is part of the Open Community for Research at Open Community Experience 2026 in Brussels, Belgium. This session examines the challenges of evaluating large language models as they move from isolated chatbot use cases into business-critical systems. It outlines how attack surfaces expand when models interact with tools, retrieval systems, and multi-step workflows, increasing the risk of prompt injection, data leakage, and workflow manipulation. The talk introduces ARES (AI Robustness Evaluation System), an open source framework designed to scale red teaming activities. It addresses limitations of manual testing by enabling systematic exploration of attack objectives and strategies, including multi-turn interactions, optimisation-based attacks, and obfuscation techniques. The framework separates attack goals, strategies, and evaluation pipelines to support flexible and targeted testing. ARES uses a modular architecture with a lightweight core and plugin-based extensions, allowing users to customise attack scenarios and evaluation methods. Components include target interfaces for interacting with models, goal definitions for attacker intent, strategy modules implementing attack techniques, and evaluation layers ranging from rule-based checks to model-based analysis. The session also demonstrates how red teaming outputs are mapped to threat models, enabling prioritisation based on system-specific risks. Results include identification of successful attack paths, evaluation of model responses, and generation of structured reports to inform mitigation strategies and safeguard design. Key topics covered - red teaming large language models (llms) - attack surfaces in ai systems (tools, rag, workflows) - prompt injection and jailbreak techniques - attack objectives vs attack strategies - scaling red teaming beyond manual testing - ares (ai robustness evaluation system) architecture - modular and plugin-based evaluation frameworks - multi-turn and optimisation-based attacks - evaluation pipelines for llm outputs - mapping vulnerabilities to threat models - risk prioritisation and safeguard design - automation of adversarial testing workflows Why this matters As LLMs are integrated into critical systems, isolated testing is insufficient to identify risks. Scalable red teaming frameworks enable systematic evaluation of vulnerabilities, helping organisations design more robust safeguards aligned with their specific threat models. About OCX26 Open Community Experience 2026 is the Eclipse Foundation’s flagship event, held in Brussels, Belgium. It brings together developers, architects, and industry leaders to explore open source technologies across domains including AI, automotive, tooling, and cloud systems, with a focus on practical implementation. Learn more at https://www.ocxconf.org/ Chapters 00:00 introduction to llm red teaming 00:31 motivation for robustness evaluation 01:20 scaling challenges in adversarial testing 02:01 human vs automated attack discovery 03:21 attack surfaces in ai systems 04:12 attack objectives vs strategies 05:32 scaling evaluation across attack classes 06:54 mapping risks to threat models 07:39 overview of llm red teaming tools 08:57 introduction to ares framework 09:14 modular architecture and core components 10:30 attack goals and strategy design 12:09 evaluation methods and pipelines 13:01 system interfaces and deployment options 13:31 demo scenario and attack execution 14:41 multi-turn attack example 15:29 evaluation outputs and reporting 16:13 roadmap and open source contributions
Download
0 formatsNo download links available.