Towards Trustworthy AI-Enabled Decision Support Systems: Validation of the Multisource AI Scorecard Table (MAST) | Journal of Artificial Intelligence Research

PDF

Published: Aug 2, 2024

DOI: https://doi.org/10.1613/jair.1.14990

Keywords:

human computer interaction, user interfaces, heuristics, information extraction

Pouria Salehi

Yang Ba

Nayoung Kim

Ahmadreza Mosallanezhad

Anna Pan

Myke C. Cohen

Yixuan Wang

Jieqiong Zhao

Shawaiz Bhatti

James Sung

Erik Blasch

Michelle V. Mancenido

Erin K. Chiou

a:1:{s:5:"en_US";s:24:"Arizona State University";}

Abstract

The Multisource AI Scorecard Table (MAST) is a checklist tool to inform the design and evaluation of trustworthy AI systems based on the U.S. Intelligence Community’s analytic tradecraft standards. In this study, we investigate whether MAST can be used to differentiate between high and low trustworthy AI-enabled decision support systems (AI-DSSs). Evaluating trust in AI-DSSs poses challenges to researchers and practitioners. These challenges include identifying the components, capabilities, and potential of these systems, many of which are based on the complex deep learning algorithms that drive DSS performance and preclude complete manual inspection. Using MAST, we developed two interactive AI-DSS testbeds. One emulated an identity-verification task in security screening, and another emulated a text-summarization system to aid in an investigative task. Each testbed had one version designed to reach low MAST ratings, and another designed to reach high MAST ratings. We hypothesized that MAST ratings would be positively related to the trust ratings of these systems. A total of 177 subject-matter experts were recruited to interact with and evaluate these systems. Results generally show higher MAST ratings for the high-MAST compared to the low-MAST groups, and that measures of trust perception are highly correlated with the MAST ratings. We conclude that MAST can be a useful tool for designing and evaluating systems that will engender trust perceptions, including for AI-DSS that may be used to support visual screening or text summarization tasks. However, higher MAST ratings may not translate to higher joint performance, and the connection between MAST and appropriate trust or trustworthiness remains an open question.

Issue

Vol. 80 (2024)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details