Skip to main content

Real-world AI evaluation design and planning.

Briggs, M., Westling, C. and Skeadas, T., 2026. Real-world AI evaluation design and planning. In: ICSIS 2026, 9-12 June 2026, Valencia, Spain. (In Press)

Full text available as:

[thumbnail of IEEE_Test_Design_Planning.pdf] PDF
IEEE_Test_Design_Planning.pdf - Accepted Version
Restricted to Repository staff only until 12 June 2026.
Available under License Creative Commons Attribution Non-commercial.

563kB

Official URL: https://intelligent-systems.net/icsis2026/

Abstract

Understanding how AI systems behave in the real- world is becoming more imperative in a world where companies, organizations, and governments are quickly adopting and deploying this technology. Using a novel framework for real-world AI evaluation, CIRCLE [1], we present a set of activities for testing AI systems in deployment contexts including field testing and red teaming. We demonstrate how these activities can produce specific outcomes of interest to stakeholders outside the AI stack. The CIRCLE framework is rooted in an understanding of the AI lifecycle that moves beyond traditional model-centric evaluation techniques. By providing a hypothetical case study from an education setting, we showcase how evaluation approaches that are responsive to stakeholders’ views outside of the traditional AI stack allow for systems that are aligned with stakeholder objectives, support the aims of building more trustworthy and safer AI systems, and enable better decisions about their deployment.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:evaluation models; artificial intelligence; sociotechnical systems; system testing
Group:Faculty of Media, Science and Technology
ID Code:42011
Deposited By: Symplectic RT2
Deposited On:11 May 2026 11:38
Last Modified:11 May 2026 11:38

Downloads

Downloads per month over past year

More statistics for this item...
Repository Staff Only -