Skip to main navigation Skip to search Skip to main content

Visual-Enhanced Multimodal Framework for Flexible Job Shop Scheduling Problem

  • Peng Zhao
  • , Zhiguang Cao
  • , Di Wang
  • , Wen Song
  • , Wei Pang
  • , You Zhou*
  • , Yuan Jiang*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Downloads (Pure)

Abstract

Multimodal models leverage complementary information across modalities to enrich feature representations. While visual information shows potential in representing structure for some combinatorial optimization problems (COPs), its application to complex scheduling like the Flexible Job Shop Scheduling Problem (FJSP) remains underexplored. Current learning-based FJSP solvers predominantly rely on handcrafted state features. This dependence can lead to inconsistencies and may not fully capture the problem's intricate dynamics. Crucially, these methods overlook visual modalities. Visual representations offer a distinct advantage by inherently capturing the global topological structure and complex resource interactions within the FJSP state. Unlike localized handcrafted features, this holistic, structural view provides a richer foundation for understanding scheduling complexity and making informed decisions. To overcome these limitations by leveraging visual information-known for representing topological structures and providing richer state representations-we introduce the AO-framework. This multimodal feature fusion approach enhances handcrafted state features by integrating insights from visual data. Our core contribution is a novel fusion mechanism utilizing orthogonal projection and local attention. Unlike traditional methods that often rely on simple concatenation of visual data, our method uniquely reduces redundancy by projecting global image-derived features onto local handcrafted features. This process extracts distinct information inherent to the visual modality, significantly improving the quality and complementarity of the resulting state features and enabling more informed scheduling decisions. To our knowledge, the AO-framework represents the first multimodal framework applied to scheduling problems, demonstrating the significant potential of visual information in this domain. Extensive experiments across various FJSP solvers and datasets confirm that our framework yields substantial enhancements in solution quality, decision-making capabilities, and generalization.

Original languageEnglish
Title of host publicationMM '25: Proceedings of the 33rd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages2496-2505
Number of pages10
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Conference

Conference33rd ACM International Conference on Multimedia 2025
Abbreviated titleMM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • combinatorial optimization
  • flexible job-shop scheduling problem
  • multimodal fusion
  • reinforcement learning

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Software
  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Visual-Enhanced Multimodal Framework for Flexible Job Shop Scheduling Problem'. Together they form a unique fingerprint.

Cite this