To ensure the reliability of real-time traffic data used in transportation operations and planning, we developed a multi-stage screening algorithm to identify detection errors in large-scale traffic detector databases. This research addresses the common issue of erroneous data from malfunctioning or poorly calibrated traffic detectors, such as loop and radar sensors, used to monitor speed, flow, and occupancy. The algorithm systematically detects missing data, out-of-range values, and inconsistencies by applying a combination of threshold checks, statistical distribution comparisons (using Average Effective Vehicle Length, AEVL), and advanced techniques like the Kolmogorov–Smirnov (K-S) test and Multiple Comparison with the Best (MCB). Through application to the PeMS database, the algorithm not only identified faulty detectors but also enabled accurate validation of traffic speed profiles and the identification of high-speed crash-prone locations.
This research addresses key limitations in existing traffic data platforms, where sensor-based systems provide accurate yet localized data, and probe-based systems offer broader coverage but often suffer from low data penetration and potential bias. We developed a two-stage machine learning framework that combines the strengths of both data sources. In the first stage, we evaluate the performance of several regression-based machine learning algorithms using both probe and detector data, identifying models such as Random Forest as top performers. The second stage introduces a novel hybrid learning approach that incorporates a traffic flow model to improve estimation accuracy, particularly in scenarios where probe data is sparse or unreliable. This integrated framework demonstrates strong capabilities in estimating traffic speed and flow across entire freeway segments, offering a scalable and effective solution to support real-time operations at Traffic Operations Centers.
We leverage high-resolution connected vehicle (CV) data from Wejo to analyze mobility patterns in Baltimore during night shift working hours, with the goal of improving transportation access, safety, and equity for overnight workers. Covering the critical time window between 9 PM and 6 AM, this dataset provides granular, time-stamped vehicle trajectory information that captures real-world travel behavior during a period often overlooked by traditional data sources. By examining factors such as traffic volumes, route choices, travel speeds, and dwell times, we can identify key corridors used by night shift commuters, detect spatial gaps in transit coverage, and assess safety risks related to high-speed road segments, limited lighting, and pedestrian exposure. These insights enable a deeper understanding of the transportation challenges faced by essential workers traveling during non-peak hours and support the development of targeted interventions and planning strategies aimed at creating a safer, more equitable, and more efficient nighttime mobility system in urban areas like Baltimore.