Emergent Structures and Training Dynamics in Large Language Models

Ryan Teehan*, Miruna Clinciu, Oleg Serikov, Eliza Szczechla, Natasha Seelam, Shachar Mirkin, Aaron Gokaslan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large language models have achieved success on a number of downstream tasks, particularly in a few and zero-shot manner. As a consequence, researchers have been investigating both the kind of information these networks learn and how such information can be encoded in the parameters of the model. We survey the literature on changes in the network during training, drawing from work outside of NLP when necessary, and on learned representations of linguistic features in large language models. We note in particular the lack of sufficient research on the emergence of functional units - subsections of the network where related functions are grouped or organized - within large language models, and motivate future work that grounds the study of language models in an analysis of their changing internal structure during training time.

Original languageEnglish
Title of host publicationProceedings of BigScience Episode #5
Subtitle of host publicationWorkshop on Challenges & Perspectives in Creating Large Language Models
PublisherAssociation for Computational Linguistics
Pages146-159
Number of pages14
ISBN (Electronic)9781955917261
DOIs
Publication statusPublished - May 2022
EventWorkshop on Challenges & Perspectives in Creating Large Language Models - Virtual, Dublin, Ireland
Duration: 27 May 2022 → …

Conference

ConferenceWorkshop on Challenges & Perspectives in Creating Large Language Models
Country/TerritoryIreland
CityVirtual, Dublin
Period27/05/22 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Emergent Structures and Training Dynamics in Large Language Models'. Together they form a unique fingerprint.

Cite this