Zhao, J., Wang, Y., Liang, H. and Rusnachenko, N., 2024. NCL_NLP at SemEval-2024 task 7: CoT-NumHG: A CoT-based SFT training strategy with large language models for number-focused headline generation. In: SemEval 2024, 20-21 June 2024, Mexico City, 261-269.
Full text available as:
Preview |
PDF
2024.semeval-1.40.pdf - Published Version Available under License Creative Commons Attribution. 558kB |
|
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Official URL: https://semeval.github.io/SemEval2024/
DOI: 10.18653/v1/2024.semeval-1.40
Abstract
Headline Generation is an essential task in Natural Language Processing (NLP), where models often exhibit limited ability to accurately interpret numerals, leading to inaccuracies in generated headlines. This paper introduces CoT-NumHG, a training strategy leveraging the Chain of Thought (CoT) paradigm for Supervised Fine-Tuning (SFT) of large language models. This approach is aimed at enhancing numeral perception, interpretability, accuracy, and the generation of structured outputs. Presented in SemEval-2024 Task 7 (task 3): Numeral-Aware Headline Generation (English), this challenge is divided into two specific subtasks. The first subtask focuses on numerical reasoning, requiring models to precisely calculate and fill in the missing numbers in news headlines, while the second subtask targets the generation of complete headlines. Utilizing the same training strategy across both subtasks, this study primarily explores the first subtask as a demonstration of our training strategy. Through this competition, our CoT-NumHG-Mistral-7B model attained an accuracy rate of 94%, underscoring the effectiveness of our proposed strategy, detailed in our project repository.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Group: | Faculty of Media, Science and Technology |
| ID Code: | 41501 |
| Deposited By: | Symplectic RT2 |
| Deposited On: | 10 Mar 2026 15:44 |
| Last Modified: | 10 Mar 2026 15:44 |
Downloads
Downloads per month over past year
| Repository Staff Only - |
Tools
Tools