TY - JOUR
T1 - Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance
AU - Dong, Yi
AU - Huang, Wei
AU - Bharti, Vibhav
AU - Cox, Victoria
AU - Banks, Alec
AU - Wang, Sen
AU - Zhao, Xingyu
AU - Schewe, Sven
AU - Huang, Xiaowei
N1 - Funding Information:
This work is supported by the UK DSTL (through the project of Safety Argument for Learning-enabled Autonomous Underwater Vehicles) and the UK EPSRC (through the Offshore Robotics for Certification of Assets [EP/W001136/1] and End-to-End Conceptual Guarding of Neural Architectures [EP/T026995/1]). Xingyu Zhao and Alec Banks’ contribution to the work is partially supported through Fellowships at the Assuring Autonomy International Programme. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 956123.
Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/5
Y1 - 2023/5
N2 - The increasing use of Machine Learning (ML) components embedded in autonomous systems—so-called Learning-Enabled Systems (LESs)—has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LESs pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LESs with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose solutions to practical use. Probabilistic safety argument templates at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also scope our methods with case studies on simulated Autonomous Underwater Vehicles and physical Unmanned Ground Vehicles.
AB - The increasing use of Machine Learning (ML) components embedded in autonomous systems—so-called Learning-Enabled Systems (LESs)—has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LESs pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LESs with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose solutions to practical use. Probabilistic safety argument templates at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also scope our methods with case studies on simulated Autonomous Underwater Vehicles and physical Unmanned Ground Vehicles.
KW - Learning-Enabled Systems
KW - Robotics and Autonomous Systems
KW - Software reliability
KW - assurance cases
KW - operational profile
KW - probabilistic claims
KW - robustness verification
KW - safe AI
KW - safety arguments
KW - safety regulation
KW - safety-critical systems
KW - statistical testing
UR - http://www.scopus.com/inward/record.url?scp=85164282663&partnerID=8YFLogxK
U2 - 10.1145/3570918
DO - 10.1145/3570918
M3 - Article
SN - 1539-9087
VL - 22
JO - ACM Transactions on Embedded Computing Systems
JF - ACM Transactions on Embedded Computing Systems
IS - 3
M1 - 48
ER -