Date of this Version



AVL, GPS, outlier, filtering, performance, ground-truth


Agencies use a variety of technologies and data providers to obtain travel time information. The best quality data can be obtained from second-by-second tracking of vehicles, but that data presents many challenges in terms of privacy, storage requirements and analysis. More frequently agencies collect or purchase segment travel time based upon some type of matching of vehicles between two spatially distributed points. Typical methods for that data collection involve license plate re-identification, Bluetooth, Wi-Fi, or some type of rolling DSRC identifier. One of the challenges in each of these sampling techniques is to employ filtering techniques to remove outliers associated with trip chaining, but not remove important features in the data associated with incidents or traffic congestion. This paper describes a curated data set that was developed from high-fidelity GPS trajectory data. The curated data contained 31,621 vehicle observations spanning 42 days; 2,550 observations had travel times greater than 3 minutes more than normal. From this baseline data set, outliers were determined using GPS waypoints to determine if the vehicle left the route. Two performance measures were identified for evaluating three outlier-filtering algorithms by the proportion of true samples rejected and proportion of outliers correctly identified. The effectiveness of the three methods over 10-minute sampling windows was also evaluated. The curated data set has been archived in a digital repository and is available online for others to test outlier-filtering algorithms.


This report was presented at the 2018 TRB conference but has not been previously published.