Skip to main content

Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks.

Reich, T., Budka, M and Hulbert, D., 2021. Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks. In: IEEE Symposium Series on Computational Intelligence, 1-4 December, 2020, Canberra, ACT, Australia, 2843 -2852.

Full text available as:

_Thilo__IEEE_SSCI_2020.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.


Official URL:


Passengers of urban bus networks often rely on forecasts of Estimated Times of Arrival (ETA) and live-vehicle movements to plan their journeys. ETA predictions are unreliable due to the lack of good quality historical data, while ‘live’ positions in mobile apps suffer from delays in data transmission. This study uses deep neural networks to predict the next position of a bus under various vehicle-location data-quality regimes. Additionally, we assess the effect of the target representation in the prediction problem by encoding it either as unconstrained geographical coordinates, progress along known trajectory or ETA at the next two stops. We demonstrate that without data cleaning, model predictions give false confidence if mean errors are used, highlighting the importance of a holistic assessment of the results. We show that target representation affects the prediction accuracy, by constraining the prediction space. The literature is vague about quality issues in public transport data. Here we show that noisy data is a problem and discuss simple but effective approaches to address these issues. Research generally only focuses on a single method of target representation. Therefore, comparing several methods is a useful addition to the literature. This gives insight into the value of addressing data quality issues in urban transport data to enable better predictions and improve the passenger experience. We show that ‘rephrasing’ the prediction problem by changing the target representation can yield massively improved predictions. Our findings enable researchers using deep learning approaches in public transport to make more informed decisions about essential data cleaning steps and problem representation for improved results.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Public transport; ETA prediction; Traffic analysis; Modeling and prediction; Machine learning; Deep learning
Group:Faculty of Science & Technology
ID Code:36200
Deposited By: Symplectic RT2
Deposited On:06 Nov 2021 12:37
Last Modified:14 Mar 2022 14:30


Downloads per month over past year

More statistics for this item...
Repository Staff Only -