Real-world AI evaluation design and planning

Briggs, M.; Westling, Carina; Skeadas, T.

Real-world AI evaluation design and planning.

Tools

Briggs, M., Westling, C. and Skeadas, T., 2026. Real-world AI evaluation design and planning. In: ICSIS 2026, 9-12 June 2026, Valencia, Spain. (In Press)

Full text available as:

[thumbnail of IEEE_Test_Design_Planning.pdf]

PDF
IEEE_Test_Design_Planning.pdf - Accepted Version
Restricted to Repository staff only until 12 June 2026.
Available under License Creative Commons Attribution Non-commercial.
563kB

Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk.

Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material.

Official URL: https://intelligent-systems.net/icsis2026/

Abstract

Understanding how AI systems behave in the real- world is becoming more imperative in a world where companies, organizations, and governments are quickly adopting and deploying this technology. Using a novel framework for real-world AI evaluation, CIRCLE [1], we present a set of activities for testing AI systems in deployment contexts including field testing and red teaming. We demonstrate how these activities can produce specific outcomes of interest to stakeholders outside the AI stack. The CIRCLE framework is rooted in an understanding of the AI lifecycle that moves beyond traditional model-centric evaluation techniques. By providing a hypothetical case study from an education setting, we showcase how evaluation approaches that are responsive to stakeholders’ views outside of the traditional AI stack allow for systems that are aligned with stakeholder objectives, support the aims of building more trustworthy and safer AI systems, and enable better decisions about their deployment.

Item Type:	Conference or Workshop Item (Paper)
Uncontrolled Keywords:	evaluation models; artificial intelligence; sociotechnical systems; system testing
Group:	Faculty of Media, Science and Technology
ID Code:	42011
Deposited By:	Symplectic RT2
Deposited On:	11 May 2026 11:38
Last Modified:	11 May 2026 11:38

Downloads

Downloads per month over past year

More statistics for this item...

Repository Staff Only -