A practical generalization metric for deep networks benchmarking

Huang, M.; Yu, Hongchuan; Zhang, Jianjun

A practical generalization metric for deep networks benchmarking.

Tools

Huang, M., Yu, H. and Zhang, J., 2025. A practical generalization metric for deep networks benchmarking. Scientific Reports, 15, 9747.

Full text available as:

Preview

PDF (OPEN ACCESS ARTICLE)
s41598-025-93005-5.pdf - Published Version
Available under License Creative Commons Attribution.
4MB

Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk.

Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material.

DOI: 10.1038/s41598-025-93005-5

Abstract

There is an ongoing and dedicated effort to estimate bounds on the generalization error of deep learning models, coupled with an increasing interest with practical metrics that can be used to experimentally evaluate a model’s ability to generalize. This interest is not only driven by practical considerations but is also vital for theoretical research, as theoretical estimations require practical validation. However, there is currently a lack of research on benchmarking the generalization capacity of various deep networks and verifying these theoretical estimations. This paper aims to introduce a practical generalization metric for benchmarking different deep networks and proposes a novel testbed for the verification of theoretical estimations. Our findings indicate that a deep network’s generalization capacity in classification tasks is contingent upon both classification accuracy and the diversity of unseen data. The proposed metric system is capable of quantifying the accuracy of deep learning models and the diversity of data, providing an intuitive and quantitative evaluation method - a trade-off point. Furthermore, we compare our practical metric with existing generalization theoretical estimations using our benchmarking testbed. It is discouraging to note that most of the available generalization estimations do not correlate with the practical measurements obtained using our testbed. On the other hand, this finding is significant as it exposes the shortcomings of theoretical estimations and inspires new exploration.

Item Type:	Article
ISSN:	2045-2322
Group:	Faculty of Media & Communication
ID Code:	40916
Deposited By:	Symplectic RT2
Deposited On:	03 Apr 2025 10:38
Last Modified:	03 Apr 2025 10:38

Downloads

Downloads per month over past year

More statistics for this item...

Repository Staff Only -