Explicit semantic guided bi-incomplete multi-modal hashing with label co-occurrence and label graph constraints

Haoran Zhu,Xu Lu,Liang Zhang,Li Liu, Huaxiang Zhang

Abstract

Multi-modal hashing offers advantages in large-scale multimedia retrieval by integrating multi-modal features and generating compact binary codes for efficient computation. However, existing methods often assume complete modalities and labels, overlooking scenarios with incomplete features and labels, especially under high missing rates. To address this challenge, we propose LaDiff-BIMH, an explicit semantic-guided bi-incomplete multi-modal hashing framework with label co-occurrence and label graph constraints. Unlike prior supervised, unsupervised, or semi-supervised methods, LaDiff-BIMH specifically learns hash codes for multi-modal data with bi-incompleteness in multi-modal features and labels within a unified framework. LaDiff-BIMH consists of three stages: 1) Label Graph Constrained Autoencoder based Modal Reconstruction exploits the similarity and co-occurrence of available labels, guiding feature reconstruction, enhancing semantic consistency of latent features, and improving computational efficiency. This process also guides pseudo label generation and completes missing category information. 2) Conditional DDPM based Incomplete Modal Completion combines pseudo labels and complete modal features to achieve high-quality completion of incomplete modal features, enhancing the intrinsic connections between heterogeneous modalities. and 3) Explicit Semantic guided Multi-modal Hash Learning generates a fused representation through adaptive weighted multi-modal fusion, designing a discriminative hash center and semantic supervision mechanism to enhance the semantic consistency and discriminability of the fused hash code. Experiments demonstrate the superiority of LaDiff-BIMH over state-of-the-art methods.

Paper Linkage:https://doi.org/10.1016/j.neunet.2025.108198

R&D