Diagnosis for Patient and GP: Dialogue-based Self-Diagnosis with Disease-Symptoms Graph and Referral Letter Classification

Wang, Ruibin

The health systems worldwide face substantive operational chal- lenges, more so after the exigencies precipitated by the COVID-19 pandemic, central to which is the pervasive shortage of medi- cal resources at each health system level. This study explores two strategies aimed at alleviating these pressures by harness- ing the potential of artificial intelligence (AI) and big data. The first strategy involves automated self-diagnosis tools for patients, which can help reduce the initial diagnostic burden on medi- cal systems. The second strategy targets General Practitioners (GPs), aiming to bolster their diagnostic capabilities using AI- powered assistants. Regarding the first strategy, current self-diagnosis tools face lim- itations in user interface and scope of diseases covered. In re- sponse, this study devised a dialogue-based self-diagnosis system. By harnessing data from the National Health Service (NHS) web- site, this thesis established a method to identify symptoms and generated a mapping of diseases to these symptoms, presenting as a disease-symptoms graph. Based on this graph, a two-choices di- agnostic policy, statistically grounded, and devised respective so- lutions tailored to each module, are formulated utilising dialogue diagnosis strategies. Subsequently, a demonstration web applica- tion is created to showcase the dialogue-based diagnosis process. Concerning the second strategy, referral letters from GPs often contain rich diagnostic information, including medical histories, clinical observations, and initial diagnoses. Despite this, the use of these letters to aid diagnosis has been limited, primarily due to challenges in data labelling and anonymization. This study pioneers the use of referral letters as training data for disease di- agnosis. However, data labelling and anonymization efforts are resource-intensive, resulting in a limited dataset size. As current methods proved inadequate for classifying diseases based on the gathered referral letters, this research proposes a hybrid architec- ture, which synergistically optimizes pre-trained encoder-based models and traditional deep learning models to fuse different rep- resentation spaces. It also innovatively introduces two data aug- mentation methods to underscore the importance of symptoms in the diagnostic process and to enhance feature representation performance. Our experiments showed that this approach signif- icantly improved disease classification accuracy. Additionally, the recent advancements in Large Language Mod- els (LLMs) prompted us to explore their potential in analysing referral letters and decision-making. Specifically, the in-context learning performance of ChatGPT and GPT-4 in disease pre- diction is investigated. The results indicated that direct usage was suboptimal. Therefore, two disease classification fine-tuning solutions are proposed: supervised classification with encoder- based pre-trained language models (PLMs) and multiple-choice question-answering with LLMs. To address the challenge of lim- ited training datasets, this thesis harnessed ChatGPT’s text- generation capabilities to augment data effectively. The findings revealed that encoder-based models markedly surpassed decoder- based LLMs in disease classification from the augmented referral letters. Moreover, fine-tuning LLMs proved more effective than using GPT-4’s few-shot learning. The experiment demonstrated that the optimal solution to assist GPs in clinical settings is com- bining the LLMs for data augmentation and the AI model based on encoder-based PLMs which achieve satisfactory performance for disease diagnosis.

Item Type:	Thesis (Doctoral)
Additional Information:	If you feel that this work infringes your copyright please contact the BURO Manager.
Uncontrolled Keywords:	assisting disease diagnosis; disease-symptoms graph; encoder-based transformer model; GPs;large language model; referral letter classification
Group:	Faculty of Media & Communication (Until 31/07/2025)
ID Code:	41660
Deposited By:	Symplectic RT2
Deposited On:	19 Dec 2025 13:06
Last Modified:	19 Dec 2025 13:06

Diagnosis for Patient and GP: Dialogue-based Self-Diagnosis with Disease-Symptoms Graph and Referral Letter Classification.

Abstract

Downloads