Preprocessing#

Pedpy provides functions for preprocessing:

  1. Outlier detection

  2. Correcting invalid trajectories

Outlier detection#

PedPy provides a function that detects and corrects outliers and also detects vertical displacements within the trajectory, which occur when the tracking of a person is interrupted and the tracker continues tracking something else instead.

The algorithm for detecting outliers splits the trajectory into multiple dataframes, one per person, and calculates the distance between each pair of consecutive points. The expected distance d is defined as the 99% quantile of the distances between all consecutive points, multiplied by the tolerance t.

\[ d = t * q_{0.99} \]

tolerance:#

The tolerance parameter can be chosen manually. A low value for this parameter means a low tolerance for potential outliers, which can be useful in trajectories where pedestrians’ speed stays within a similar range. If pedestrian speed varies, for example in bottleneck experiments, the tolerance should be chosen higher. A value between 2 and 10 should cover most cases.

If an outlier is detected, the program checks whether there are consecutive outliers. Since the distance can no longer be used as an indicator, the function searches for the next frame within a realistic range r. Every subsequent frame that is not within this range is also considered an outlier, and the factor n is increased by one. In this case, as points should not be considered valid again by accident, the tolerance t’ is much smaller.

\[ r = n * t' * q_{0.99} \]

quantile:#

Like the tolerance, the quantile for the expected distance can also be chosen manually. This also influences the tolerance.

For every part of the trajectory, where anomalies were detected, the corresponding person id and frames, where outlier occurred, are put into the log output.

Outliers in the middle of the trajectory are corrected by interpolating the incorrect points as a straight line between the two correct points before and after the outlier occurs. Outliers at the beginning or at the end are extrapolated in the average direction of the trajectory.

trajectory_data = pedpy.load_trajectory(
    trajectory_file=pathlib.Path("demo-data/preprocessing/uni_corr_500_08_modified.txt"),
    default_unit=pedpy.TrajectoryUnit.METER,
)
trajectory_data_corrected, changed_index_orig, changed_index_new = pedpy.detect_anomalies_in_trajectories(
    trajectory_data, tolerance=6, quantile=0.98
)
INFO - Outliers found: personID 55 at frames [2018] 
INFO - Outliers found: personID 89 at frames [438] 
INFO - Outliers found: personID 123 at frames [3081, 3082, 3083, 3084, 3085, 3086, 3087] 
INFO - Outliers found: personID 209 at frames [479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490] 
INFO - Outliers found: personID 210 at frames [1387, 1388, 1389] 
INFO - Outliers found: personID 211 at frames [1186, 1187, 2055, 2056, 2057, 2058, 2059, 2060, 2061, 2062, 2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2078] 
INFO - Outliers found: personID 221 at frames [2063, 2064, 2065, 2066, 2067] 
INFO - Trajectory with personID 422 has to many invalid points and cannot be corrected
INFO - Trajectory with personID 422 will be returned unchanged
INFO - Outliers found: personID 463 at frames [350, 713, 714, 715, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766] 

Invalid trajectories#

If in a trajectory data set of a single person id more that certain percentage of all frames were considered outliers, this part of the trajectory is considered invalid.

percentage_invalid:#

This percentage mentioned above can be chosen manually by the percentage_invalid parameter, an integer parameter between 1 and 100. The default value is 20%.

deleting:#

The function provides the bool parameter deleting, where the user can determine, that invalid data sets should be removed in the returned trajectory.

Focus on displacement detection#

displacements_only:#

It is possible to filter only for displacements in the trajectory by setting displacements_only = True. In this case, anomalies that do not occur at the very beginning or the very end are ignored, and the trajectory data of the affected person ID is only cropped after a displacement. If outliers occur at the very beginning, they are removed as well.

trajectory_data_jumps_only, index_orig, index_new = pedpy.detect_anomalies_in_trajectories(
    trajectory_data, displacements_only=True
)
INFO - Frames in trajectory with original personID 55 were cropped after frame 2017 
INFO - Frames in trajectory with original personID 209 were cropped after frame 478 
INFO - Frames in trajectory with original personID 210 were cut before frame 1389
INFO - Frames in trajectory with original personID 211 were cut before frame 1188 and after frame 2054 
INFO - Trajectory with personID 422 has to many invalid points and cannot be corrected
INFO - Trajectory with personID 422 will be returned unchanged
INFO - Frames in trajectory with original personID 463 were cut before frame 351 and after frame 749 

Other parameters#

In the following description the term trajectory means the trajectory data of a single person id.

max_length:#

An integer value. Sometimes it may happen that a few outliers occur directly one after another without a jump back to the correct trajectory. The max_length parameter defines how many frames long these consecutive outliers can be before the program checks whether this indicates a vertical displacement in the trajectory. The default value is 8.

critical_length_traj:#

The minimum length a trajectory can have. This integer value is only relevant in cases where it seems that there is a displacement in the trajectory. If the supposed displacement happens before the number of previous frames can be considered a trajectory in its own right, every frame before the detected anomaly is assumed to be an outlier. If the minimum length has already been reached, the trajectory is cropped at the displacement. The default value is 10% of the trajectory’s length.

traj_data_low_tolerance = pedpy.detect_anomalies_in_trajectories(
    trajectory_data, tolerance=3, quantile=0.95, percentage_invalid=20, deleting=True, max_length=10
)[0]
INFO - Outliers found: personID 3 at frames [144, 145, 146, 147, 148, 149, 150, 151] 
INFO - Outliers found: personID 11 at frames [86, 87, 88, 89, 90, 91, 92, 93] 
INFO - Outliers found: personID 29 at frames [3324, 3325] 
INFO - Outliers found: personID 55 at frames [2018] 
INFO - Outliers found: personID 59 at frames [1182, 1183] 
INFO - Outliers found: personID 67 at frames [124, 125, 126, 127, 128, 129, 130, 131] 
INFO - Outliers found: personID 89 at frames [438] 
INFO - Outliers found: personID 117 at frames [1184, 1185] 
INFO - Outliers found: personID 123 at frames [3081, 3082, 3083, 3084, 3085, 3086, 3087] 
INFO - Outliers found: personID 204 at frames [156, 157, 158] 
INFO - Outliers found: personID 209 at frames [479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490] 
INFO - Outliers found: personID 210 at frames [1387, 1388, 1389] 
INFO - Outliers found: personID 211 at frames [1186, 1187, 2055, 2056, 2057, 2058, 2059, 2060, 2061, 2062, 2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2078] 
INFO - Outliers found: personID 221 at frames [2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070] 
INFO - Outliers found: personID 227 at frames [2136, 2137, 2138] 
INFO - Outliers found: personID 255 at frames [3178, 3179, 3180] 
INFO - Outliers found: personID 260 at frames [2905] 
INFO - Outliers found: personID 262 at frames [140, 141, 142, 143, 144] 
INFO - Outliers found: personID 268 at frames [2183, 2184, 2185] 
INFO - Outliers found: personID 282 at frames [2151, 2152, 2153] 
INFO - Outliers found: personID 370 at frames [111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127] 
INFO - Outliers found: personID 413 at frames [1722, 1838, 1839] 
INFO - Trajectory with personID 422 has to many invalid points and cannot be corrected
INFO - Trajectory with personID 422 was deleted
INFO - Outliers found: personID 463 at frames [350, 713, 714, 715, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766] 

Compare original and corrected trajectory#

The function returns a corrected copy of the input trajectory data. Furthermore, it returns two lists: the first contains all person ids of the parts of the original trajectory where anomalies were found, and the second contains the corresponding person IDs of the corrected trajectory. In most cases, these are the same; only if some person IDs were deleted do subsequent person IDs shift.

These lists can be used to plot the trajectory segments to get an impression of the outliers and how they were corrected. The black line represents the original trajectory, and the blue line represents the corrected one.

walk_area = pedpy.WalkableArea(
    shapely.from_wkt(
        "POLYGON ((10 -2, -10 -2, -10 7, 10 7, 10 -2), (9 6, -9 6, -9 5, 9 5, 9 6), (-9 -1, 9 -1, 9 0, -9 0, -9 -1))"
    )
)

%config InlineBackend.figure_format = 'retina'


for i in range(len(changed_index_orig)):
    original_trajectory = trajectory_data.data[trajectory_data.data["id"] == changed_index_orig[i]]
    trajectory_corrected = trajectory_data_corrected.data[trajectory_data_corrected.data["id"] == changed_index_new[i]]
    pedpy.plot_trajectories(
        traj=pedpy.TrajectoryData(data=original_trajectory, frame_rate=trajectory_data.frame_rate),
        walkable_area=walk_area,
        traj_width=1.75,
        traj_color=PEDPY_PETROL,
    ).set_aspect("equal")
    pedpy.plot_trajectories(
        traj=pedpy.TrajectoryData(data=trajectory_corrected, frame_rate=trajectory_data.frame_rate),
        walkable_area=walk_area,
        traj_width=0.5,
        traj_color=PEDPY_ORANGE,
    ).set_aspect("equal")
    legend_elements = [
        Line2D([0], [0], color=PEDPY_PETROL, lw=2, label="Original"),
        Line2D([0], [0], color=PEDPY_ORANGE, lw=2, label="Corrected"),
    ]

    plt.legend(handles=legend_elements, bbox_to_anchor=(1, 1), fontsize=8)
    plt.xlabel(f"personID {changed_index_orig[i]} / {changed_index_new[i]}")
    plt.show()

Correct invalid trajectories#

When working with head trajectories, participants may occasionally lean over obstacles. As a result, their trajectories can leave the walkable area for some frames, and this data cannot be processed by PedPy.

To address this, there is a function that moves trajectory points that lay inside a wall or too close to it. The distance that should remain between the point and the wall afterwards is calculated by linear interpolation. The new distance lies within the interval between min_distance and max_distance:

\[ d' = (d-b)*{(e-s) \over (e-b)}+s \]
  • d’ is the new distance to the wall

  • d is the original distance to the wall

  • b corresponds to back_distance

  • s corresponds to min_distance

  • e corresponds to max_distance

_images/parameters_preprocessing.png

If a point lies inside the geometry or too close to it, it will be pushed outward. The distance interval for these points starts at back_distance, which must be negative because it represents the maximum depth inside the wall, and ends at max_distance. Points located deeper inside an obstacle are assigned a smaller new distance than points located near the boundary of the interval.

For example, a point, which lays deep inside an obstacle will receive a new distance close to min_distance, which represents the minimum possible value for new_distance. A point that is already outside the obstacle but needs to be adjusted for smoother results will also receive a new distance, but this value will be only slightly larger than its original distance.

It is essential that max_distance is larger than min_distance, and that back_distance is negative. Depending on the geometry and the parameter values, it can also be beneficial to buffer the geometry beforehand to create thicker walls. If the walls are too thin, the function may accidentally move a point to the wrong side.

The function returns a pedpy.TrajectoryData, either the corrected version of the trajectory or the original trajectory, if the original trajectory was valid and a list with all person IDs that contained invalid trajectory points.

trajectory_data = pedpy.load_trajectory(
    trajectory_file=pathlib.Path("demo-data/preprocessing/030_c_56_h0_invalid.txt"),
    default_unit=pedpy.TrajectoryUnit.METER,
)

walk_area = pedpy.WalkableArea(
    [
        (3.5, -2),
        (3.5, 8),
        (-3.5, 8),
        (-3.5, -2),
    ],
    obstacles=[
        [
            (-0.7, -1.1),
            (-0.25, -1.1),
            (-0.25, -0.15),
            (-0.4, 0.0),
            (-2.8, 0.0),
            (-2.8, 6.7),
            (-3.05, 6.7),
            (-3.05, -0.3),
            (-0.7, -0.3),
            (-0.7, -1.0),
        ],
        [
            (0.25, -1.1),
            (0.7, -1.1),
            (0.7, -0.3),
            (3.05, -0.3),
            (3.05, 6.7),
            (2.8, 6.7),
            (2.8, 0.0),
            (0.4, 0.0),
            (0.25, -0.15),
            (0.25, -1.1),
        ],
    ],
)

print("Valid before: ", pedpy.is_trajectory_valid(traj_data=trajectory_data, walkable_area=walk_area))

valid_trajectory, invalid_person_ids = pedpy.correct_invalid_trajectories(
    trajectory_data=trajectory_data,
    walkable_area=walk_area,
    min_distance_obst=0.01,
    max_distance_obst=0.05,
    back_distance_obst=-1,
    min_distance_wall=0.01,
    max_distance_wall=0.05,
    back_distance_wall=-1,
)
print("Valid after: ", pedpy.is_trajectory_valid(traj_data=valid_trajectory, walkable_area=walk_area))
Valid before:  False
Valid after:  True

The values for min_-/max_- and back_distance are chosen differentially for walls around the geometry and for obstacles within it.

Compare original and corrected trajectory#

As the function also returns a list with all personIDS, that were invalid, it is possible to plot the trajectory segments before and after the correction.

Hide code cell source

for i in range(len(invalid_person_ids)):
    original_trajectory = trajectory_data.data[trajectory_data.data["id"] == invalid_person_ids[i]]
    trajectory_corrected = valid_trajectory.data[valid_trajectory.data["id"] == invalid_person_ids[i]]
    pedpy.plot_trajectories(
        traj=pedpy.TrajectoryData(data=original_trajectory, frame_rate=trajectory_data.frame_rate),
        walkable_area=walk_area,
        hole_alpha=0,
    ).set_aspect("equal")
    plt.xlabel(f"personID {invalid_person_ids[i]} / original")
    plt.show()
    traj_corr = pedpy.TrajectoryData(data=trajectory_corrected, frame_rate=trajectory_data.frame_rate)
    pedpy.plot_trajectories(traj=traj_corr, walkable_area=walk_area, hole_alpha=0).set_aspect("equal")
    plt.xlabel(f"personID {invalid_person_ids[i]} / corrected")
    plt.show()