Preprocessing Methods#
Outlier detection#
Provides functions for detecting and correcting anomalies.
- detect_anomalies_in_trajectories(traj_data, tolerance=3.0, quantile=0.99, percentage_invalid=20, deleting=False, max_length=8, critical_length_traj=None, displacements_only=False)#
Detects and corrects outliers in a trajectory.
The function splits the trajectory into multiple dataframes, one per person, and calculates the distance between each pair of consecutive points. The expected distance is defined as the 99% quantile of the distances between consecutive points, multiplied by the tolerance.
If the calculated distance is larger than the expected one, the frame is considered an outlier.
If an outlier is detected, the program checks whether there are consecutive outliers. Since the distance can no longer be used as an indicator, the function searches for the next frame within a realistic range r. Every subsequent frame that is not within this range is also considered an outlier, and the factor n is increased by one. In this case, as points should not be considered valid again by accident, the tolerance is much smaller.
\[r = n * t * q_{0.99}\]In the following, the term trajectory means the trajectory data of a single person.
- Parameters:
traj_data (pedpy.TrajectoryData) – The trajectory data that has to be checked and corrected.
tolerance (float) – The tolerance equals to the factor the quantile of the distance is multiplied with. A low value means a low tolerance for potential outliers, which can be useful in trajectories where the speed of the pedestrians stays in a similar range. If the pedestrians speed variates, for example in bottleneck experiments, the tolerance should be chosen larger.
quantile (float) – The value, that is used as the guideline for the expected distance between 2 points is calculated by the quantile of all distances in the whole trajectory. A high quantile is recommended.
percentage_invalid (int) – If more that percentage_invalid % of the trajectory was detected as an outlier, the trajectory cannot be corrected properly and is considered as completely invalid.
deleting (bool) – A parameter whether completely invalid trajectories should be deleted or not.
max_length (int) – The maximum length for consecutive outliers. Sometimes it may happen that a few outliers occur directly one after another without a jump back to the correct trajectory. The max_length parameter defines how many frames long these consecutive outliers can be before the program checks whether this indicates a vertical displacement in the trajectory. The default value is 8.
critical_length_traj (int) – The minimum length a trajectory has to have. This integer value is only relevant in cases where it seems that there is a displacement in the trajectory. By the position of the anomaly and the length of it the functions evaluates how to deal with it. The default value is 10% of the trajectory’s length.
displacements_only (bool) – A bool parameter whether the program should only search and correct major jumps within the trajectory, that do not have a jump back. This includes outlier groups that contain the very first or the very last frame of a person id and displacements in the middle of the trajectory, where the tracker caused problems.
- Returns:
the corrected and modified copy of the original trajectory as pedpy.TrajectoryData. A list off all personIDs, in the original trajectory indexing, that were changed. A list off all personIDs, in the modified trajectory indexing, that were changed.
- Return type:
pedpy.TrajectoryData
Correcting invalid trajectories#
Provides functions for projecting trajectories.
- correct_invalid_trajectories(trajectory_data, walkable_area, back_distance_wall=-1, min_distance_wall=0.01, max_distance_wall=0.05, back_distance_obst=-1, min_distance_obst=0.01, max_distance_obst=0.05)#
Corrects invalid trajectories.
When dealing with head trajectories, it may happen that the participants lean over the obstacles. This means that their trajectory will leave the walkable area at some frames, this data can not be processed with PedPy.
The function locates false points and points, are too close to a wall but still outside of it, and corrects them by pushing them slightly away. Depending on the geometry and the parameter values, it can be beneficial to buffer the geometry beforehand to create thicker walls.
At the beginning it checks if the trajectory is valid, so that the whole process will not run unnecessary.
It returns a corrected version of the trajectory input (which is also a pedpy.TrajectoryData).
If a point lays inside the geometry or close to it, the point will be moved away outside the geometry. The new distance is calculated by linear interpolation. Points that lay further inside an obstacle/wall have a smaller new distance compared to a point that lays at the end of the range. The range is between back_distance and max_distance. The unit for all values is meters.
max_distance describes how far max. points can be moved out. Points that lay between the wall and this parameter are also slightly pushed away. Furthermore, a value >0 is necessary for a mostly accurate linear interpolation for all points, that need to be moved.
The formula for the calculation for the moving points look like:
\[x' = (x - a) \cdot (\frac{c - b}{c - a} )+ b\]x’ is the new distance to the wall x is the old distance to the wall, n which is either from the inside or not far enough away n a is equal to back_distance n b is equal to min_distance n c is equal to max_distance n
- Parameters:
trajectory_data (pedpy.TrajectoryData) – The trajectory data to be tested and corrected
walkable_area (pedpy.WalkableArea) – The belonging walkable area
back_distance_wall (float) – in meters, has to be <0. The distance behind the wall, till which the points inside the walls should be corrected. Points, which are further inside the walls, are ignored. The parameter is needed for the interpolation for the correcting.
min_distance_wall (float) – in meters, has to be >0. The minimum distance, where the points should be moved outside the wall.
max_distance_wall (float) – in meters, has to be >0 and >= min_distance_wall. Points, which lay nearer to the wall than max_distance_wall will be also moved away. A value >0 ist needed for the linear interpolation, else the calculations do not work.
back_distance_obst (float) – in meters, has to be <0. Equivalent to max_distance_wall, but the value the concerns the obstacles.
min_distance_obst (float) – in meters, has to be >0. Equivalent to min_distance_wall, but the value concerns the obstacles.
max_distance_obst (float) – in meters, has to be >0 and >= min_distance_obst. Equivalent to max_distance_wall, but the value concerns the obstacles.
- Returns:
pedpy.TrajectoryData, either the corrected version of the trajectory or the original trajectory, if the original trajectory was valid. A list off all personIDs that were changed.
- Return type: