A New Non-Convex Framework to Improve Asymptotical Knowledge on Generic Stochastic Gradient Descent

Jean-Baptiste Fest, Audrey Repetti, Emilie Chouzenoux

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Stochastic gradient optimization methods are broadly used to minimize non-convex smooth objective functions, for instance when training deep neural networks. However, theoretical guarantees on the asymptotic behaviour of these methods remain scarce. Especially, ensuring almost-sure convergence of the iterates to a stationary point is quite challenging. In this work, we introduce a new Kurdyka Łojasiewicz theoretical framework to analyze asymptotic behavior of stochastic gradient descent (SGD) schemes when minimizing non-convex smooth objectives. In particular, our framework provides new almost-sure convergence results, on iterates generated by any SGD method satisfying mild conditional descent conditions. We illustrate the proposed framework by means of several toy simulation examples. We illustrate the role of the considered theoretical assumptions, and investigate how SGD iterates are impacted whether these assumptions are either fully or partially satisfied.
Original languageEnglish
Title of host publication33rd IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
EditorsDanilo Comminiello, Michele Scarpiniti
PublisherIEEE
ISBN (Electronic)9798350324112
DOIs
Publication statusPublished - 23 Oct 2023
Event33rd IEEE International Workshop on Machine Learning for Signal Processing 2023 - Rome, Italy
Duration: 17 Sept 202320 Sept 2023

Conference

Conference33rd IEEE International Workshop on Machine Learning for Signal Processing 2023
Abbreviated titleMLSP 2023
Country/TerritoryItaly
CityRome
Period17/09/2320/09/23

Keywords

  • Kurdyka-Lojasiewicz
  • Stochastic gradient descent
  • convergence analysis
  • non-convex optimization

ASJC Scopus subject areas

  • Signal Processing
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'A New Non-Convex Framework to Improve Asymptotical Knowledge on Generic Stochastic Gradient Descent'. Together they form a unique fingerprint.

Cite this