Skip to main content

Multimodal hierarchical classification using cascade-of-thought.

Hou, J., Tan, Z., Hu, Q., Wang, P. and Gong, Y., 2025. Multimodal hierarchical classification using cascade-of-thought. Information Processing & Management, 63 (3), 104555.

Full text available as:

[thumbnail of Multimodal hierarchical classification using cascade-of-thought.pdf] PDF
Multimodal hierarchical classification using cascade-of-thought.pdf - Accepted Version
Restricted to Repository staff only until 21 December 2027.
Available under License Creative Commons Attribution Non-commercial No Derivatives.

926kB

DOI: 10.1016/j.ipm.2025.104555

Abstract

We propose Cascade-of-Thought (CSOT), a novel prompt-based method for multimodal hierarchical classification (MHC) that requires no training or labeled exemplars. Inspired by the LLM-as-a-Judge (LaaJ) paradigm, CSOT decomposes classification into rationale generation, confidence scoring, and decision ranking—each implemented via structured prompts to a vision–language model (VLM). Experiments on two public MHC benchmarks demonstrate that CSOT yields substantial performance gains, particularly for weaker VLMs, while also enhancing the output quality of near-ceiling models. CSOT offers a flexible, generalizable solution for real-world MHC tasks.

Item Type:Article
ISSN:0306-4573
Uncontrolled Keywords:Multimodal hierarchical classification; Vision language model; Multimodal reasoning; Zero-shot inference; LLM-as-a-Judge
Group:Faculty of Media, Science and Technology
ID Code:41866
Deposited By: Symplectic RT2
Deposited On:30 Mar 2026 15:13
Last Modified:30 Mar 2026 15:13

Downloads

Downloads per month over past year

More statistics for this item...
Repository Staff Only -