Skip to main navigation Skip to search Skip to main content

A Hybrid Framework Integrating Large Language Models and Active Learning for Synthetic Population Persona Labelling

Research output: Contribution to journalConference articlepeer-review

6 Downloads (Pure)

Abstract

Transport decarbonisation is a complex multi-objective optimisation problem that involves social, technological, economic and environmental aspects. A key challenge lies in building high-fidelity models that capture realistic, heterogeneous behavioural factors at the individual level. Persona frameworks offer a promising solution for simulating a complex socio-demographic background in agent-based modelling (ABM). However, transferring and applying knowledge learnt from personas to various other populations, such as large synthetic populations, is labour-intensive and often subjective, especially with the semantic information in persona frameworks. To overcome this, this paper leverages large language models (LLMs) and proposes an Active LLM Fusion (ALF) framework for autonomous training and universal persona assignment. An LLM ensemble acts as a high-quality but costly labelling oracle, while the AL loop can iteratively select the most uncertain individuals from the unlabelled pool, query the LLM ensemble only for those samples, and update the local classifier. We demonstrate the framework by assigning the UK Department for Transport’s (DfT) persona to a synthetic population of the West Midlands region. The results show that the model can train classifiers efficiently with high accuracy. For a soft-label classification with 12 classes, the model converges on a dataset of nearly 1,000 instances, yielding a log-loss (cross-entropy) of only 0.1380 nats (≈94.45% of uncertainty reduced compared to random labelling). The approach significantly increases the applicability of existing persona frameworks and is transferable to arbitrary population datasets. It enables ABM subgroups to exhibit richer behaviours. Our research findings thereby evidence the value of our methodology in delivering a means to facilitate more realistic transport simulations and policy analyses.
Original languageEnglish
Pages (from-to)259-268
Number of pages10
JournalProcedia Computer Science
Volume280
DOIs
Publication statusPublished - 2 Jun 2026
Event17th International Conference on Ambient Systems, Networks and Technologies 2026 | 9th International Conference on Emerging Data and Industry 2026 - Istanbul, Turkey
Duration: 14 Apr 202616 Apr 2026
https://cs-conferences.acadiau.ca/ant-26/

Keywords

  • Agent-based Modelling
  • Large Language Model
  • Active Learning
  • Persona
  • Transport Decarbonisation

Fingerprint

Dive into the research topics of 'A Hybrid Framework Integrating Large Language Models and Active Learning for Synthetic Population Persona Labelling'. Together they form a unique fingerprint.

Cite this