Skip Ribbon Commands
Skip to main content

Project title

Providing data analysis insights into real to-the-second timing patterns of passenger rail services using Machine Learning techniques (COF-INP-04)

  • Project number:

  • Topic:

  • Status:


Project Summary

This project used Machine Learning techniques to provide data analysis and accurate models and estimates about station dwell times and between-station track section travel times, including real-time predictions, variation and correlation to temporal, geographical and external factors.

Project Abstract

Project Briefs are hosted on
This project aimed to integrate and model train operation information from various data sources available to a train operator company. Instead of using data from aggregate delays (minutes) as is the current norm, this project analysed data at the second by second level. This made use of the RCM system that takes telemetry from train engines that is primarily designed to allow engineers to monitor the operation of engines and provide fault detection. It allowed for the analysis of train operations based on information such as about position, speed, breaking, gear/notch selection and door opening/closing to provide a more detailed understanding on travel time over segments of a train service as well as dwell times at stations. This information was analysed using Machine Learning and Artificial Intelligence algorithms to discover patterns and associations between events, as well to establish causality and models for early detection of possible delays.
The main challenge with this approach was the need to integrate vast amounts of raw data (typically measured in Terabytes of data) with other operational data, including service operation data and the timetable. The complex nature of the data, its quality and the fact that it was at times contradictory meant that extensive cleaning of the data was needed using state-of-the-art Artificial intelligence techniques.

The project team also used the experience they gained from a case study on Southeastern services to propose an Open Framework for Datasets that incorporates reusable elements and could potentially be used to support cross-industry data storage, collection and analysis.

Project Reports

Project Reports are hosted on SPARK

This document is hosted on SPARK


You are now being redirected to SPARK, RSSB's knowledge sharing hub, to access the requested information.

Your RSSB user name and password can be used to access SPARK.