Danilatou, V., 2021. Risk assessment and mortality prediction in patients with venous thromboembolism using big data and machine learning. Masters Thesis (Masters). Bournemouth University.
Full text available as:
|
PDF
DANILATOU, Vasiliki_M.Res_2020.pdf 2MB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Abstract
Venous thromboembolism (VTE) is the third most common cardiovascular condition that affects mainly hospitalized and cancer patients and it is associated with high morbidity and mortality. Some patients need immediate treatment and monitoring in intensive care units (ICU). Moreover, cancer patients are at increased risk of developing VTE, especially in the immediate period after ICU hospitalization. It is crucial to predict which of the cancer patients will develop VTE, as well as early and late mortality in these high-risk patients and recognize possible treatable factors in order to improve survival. Several scoring and predictive models have been developed for these purposes, but with limited generalizability and they are mostly effective in the prediction of in-hospital mortality. They have several limitations, for example they use data recorded only on the first day of admission. Moreover, no score exists so far to predict late mortality in ICU patients. With the advanced use of electronic health records, open-source big- data medical databases and machine learning, predictive modelling could be utilized and become a powerful tool to guide clinical decision. The aim of the study was to explore the use and performance of various machine learning algorithms (ML) in order to predict two research questions: (i) VTE risk in ICU hospitalized cancer patients after discharge and, (ii) early and late mortality in VTE patients hospitalized in ICU. For that reason, a freely accessible database MIMIC-III has been used that contains a vast amount of various time-series healthcare data from thousands of patients, making it ideal for ML based forecasting. Since it provides information even after discharge from ICU, it gives an opportunity to predict late mortality. Two groups of datasets were extracted from the database: D1, consisted of 4,699 patients with cancer who were admitted to ICU and stratified in two groups based on whether they were readmitted to ICU within 90 days with a diagnosis of VTE or not. The ML classification task was to predict which of the cancer patients originally admitted to ICU will be readmitted with VTE within 90 days. D2, consisted of 2,468 patients who were admitted to ICU with a VTE diagnosis and stratified in three groups, based on their outcome, that is, died during their first ICU admission (early mortality group), died after their discharge from ICU or in a later admission (late mortality group) and remained alive for months after their admission in ICU. In this case, two ML classification tasks were constructed, first to build a model that distinguishes early mortality and second, a model that distinguishes late mortality. A very wide range of features were selected, that includes demographic information, clinical and laboratory data, prescriptions, procedures, well established comorbidity and severity scores as well as information coming from written notes. Clinically relevant entities from free medical notes were extracted using the sequence annotator SABER and then they were fitted into a Latent Dirichlet Allocation (LDA) model of 50 topics. In total, 1,471 features were extracted for each patient, grouped in 8 categories, each representing a different type of medical assessment. Automated ML platform that easily handles with-high dimensional, noisy and missing data, as well as Monte Carlo simulations based on Random Forests with hyperparameter tuning and class- balancing with Synthetic Minority Oversampling Technique (SMOTE) were trained in parallel. Due to the highly imbalanced nature of the first dataset (“cancer patients with thrombosis”), neither of the ML approaches were able to predict DVT in cancer patients even after the use of SMOTE method. As far as it concerns the prediction of early mortality in ICU patients with VTE, the best ML model chosen to predict early mortality was Random Forests (AUC=0,92). Regarding late mortality, the best ML model was again Random Forests. Nevertheless, the task of predicting late mortality was less efficient even with the holistic approach (AUC=0,82). Significant clinically relevant predictive features of early and late mortality were cancer, age, treatment with warfarin, and red cell transfusions, whereas known severity scores performed well only in the prediction of early mortality. The contribution of this study to the current knowledge was multi-leveled, as it explored the performance of various ML approaches in a big-data driven research approach, using multiple formats of data from structured to unstructured medical notes, it examined the effect of balancing techniques in highly imbalanced datasets, such as the case of medical datasets, and finally discovered possibly new biomarkers. Early mortality in critically-ill patients with VTE can be easily predicted by ML techniques, whereas in the case of late mortality, which is a more difficult task, and where medical scores are still lacking, ML could probably outperform classic statistical methods. There is a need for more precise and reliable tools in order to overcome the nature of highly imbalanced medical datasets, such as the case of “cancer patients with thrombosis” dataset. This study showed that automated ML approaches have similar performance with manual selection and parametrization of ML models, which is highly promising in the setting of healthcare “big-data” medical databases.
Item Type: | Thesis (Masters) |
---|---|
Additional Information: | If you feel that this work infringes your copyright please contact the BURO Manager. |
Uncontrolled Keywords: | venous thromboembolism; cancer; mortality; ICU; machine learning; big-data |
Group: | Faculty of Science & Technology |
ID Code: | 35112 |
Deposited By: | Symplectic RT2 |
Deposited On: | 27 Jan 2021 11:00 |
Last Modified: | 14 Mar 2022 14:26 |
Downloads
Downloads per month over past year
Repository Staff Only - |