Skip to main content

Using Sentence Embedding Techniques for Enhancing Terms-of-Service Text Summarization.

Peach, H., Rusnachenko, N., Baraskar, M. and Liang, H., 2025. Using Sentence Embedding Techniques for Enhancing Terms-of-Service Text Summarization. In: Bajaj, A., Sreedhar, S. and Abraham, A., eds. Bio-Inspired Computing. IBICA 2023. Cham: Springer, 55-64.

Full text available as:

[thumbnail of IBICA2023_Legal_Text_Simplification.pdf] PDF
IBICA2023_Legal_Text_Simplification.pdf - Accepted Version
Restricted to Repository staff only until 4 June 2026.
Available under License Creative Commons Attribution Non-commercial.

494kB

DOI: 10.1007/978-3-031-78943-4_7

Abstract

Summarization is useful for extracting salient information from linguistically complex texts. This is especially relevant in the legal domain, where it can be used to make content more accessible to layman readers. A simplified representation can help foster transparency and trust between an organization and individuals. We examine the background of the latest advances in extractive and abstractive summarization approaches. The recent appearance of transformer architecture with a self-attention mechanism has a huge impact on abstractive summarization performance. However, a major limitation of abstractive summarization pertains to constraints on input size. To address these shortcomings, in this paper, we propose a target-oriented sentence embedding classification (SEC) architecture. It is designed specifically for Terms-of-Service (ToS) document summarization and is intended to serve the preliminary text processing for abstractive summarization. The results of experiments conducted under a collection of ToS documents from the service TOS;DR show that SEC model results in average 11% increment across all ROUGE metrics (F-measure) in comparison with other extractive summarizers for significantly short summaries. The application of SEC in general-purpose abstractive summarizers results in models that illustrate increment in ROUGE-2 by 11-12% and equal or better ROUGE-L. We accompany the proposed architecture with the annotation service and complex word simplification modules, formed into a publicly available system(https://github.com/HarryPeach/simplifying-legal-content).

Item Type:Book Section
ISBN:9783031789427
Series Name:Lecture Notes in Networks and Systems
Volume:1230
ISSN:2367-3370
Group:Faculty of Media & Communication
ID Code:41158
Deposited By: Symplectic RT2
Deposited On:10 Jul 2025 10:52
Last Modified:10 Jul 2025 10:52

Downloads

Downloads per month over past year

More statistics for this item...
Repository Staff Only -