Continuing my work on Machine Learning with point clouds in the realm of autonomous robots, and coming from working with image data, I was faced with the following question: does 3D data need normalization like image data does? The answer is a clear YES (duh!). Normalization, or feature scaling, is an important preprocessing step for many machine learning algorithms. The main benefit is that it encloses all features in a common boundary without losing information. This makes the flow of algorithms like gradient descent smooth and avoids bias toward features with values higher in magnitude.
Take the following image captured by my robot during one of our exploratory trips. The pixels in the image have the following statistics, min. value: 0, max value: 255, mean: 94.170, standard deviation: 74.270. This large spread in the values does not play nicely with machine learning algorithms.
There are multiple ways to scale the pixels of an image. A common one is to enclose all of the values within a range of -1.0 and 1.0. The simple code snippet below achieves just that and changes the values of the above image to obtain the following statistics, min. value: -1.0, max. value: 1.0, mean: -0.261, and standard deviation: 0.583.
def normalize_image(image): image = tf.cast(image, dtype=tf.float32) image = image/127.5 image -= 1 return image
In the case of point clouds, where the data is composed of at least the XYZ coordinates of the points, the range of values can also be large. Take my RoboSense lidar, which has a horizontal resolution of 360°, a vertical resolution of 32°, and a detection distance of about 150 meters, the values each point can take vary widely. My initial question was, what would it mean to normalize this data? After spending some time searching, I found that many researchers do something similar to what I described above for images, which is to enclose the points within values of -1.0 and 1.0. In the case of points in 3D, this is equivalent to scaling down the point cloud to fit within a unit sphere.
So, for the point cloud shown below, which was taken exactly where the image above was taken, the statistics of the points are min. value: -96.804, max. value: 98.091, mean: -0.320, and standard deviation: 11.373.
In order to enclose all points within a unit sphere, the mean values for X, Y, and Z are computed and subtracted from the values of every point, this results in moving the entire point cloud to the origin (X = 0, Y= 0, Z = 0). Then the distances between all points and the origin are computed, and the coordinates of every point are divided by the maximum of such distances, effectively scaling all distances to the range -1.0 and 1.0. The code snipped below achieves this and the following three animations show the result.
def normalize_pc(points): centroid = np.mean(points, axis=0) points -= centroid furthest_distance = np.max(np.sqrt(np.sum(abs(points)**2,axis=-1))) points /= furthest_distance return points
In the first animation, the distance tool is used to check the distance from where the robot was to some random points. The shown distances are the original unnormalized ones.
The second animation shows the same point cloud after the points have been scaled. Using the distance measurement tool it can be seen that no distance is larger than one meter.
Finally, the third animation uses the point measurement tool to verify that every coordinate falls within the range [-1.0, 1.0]. The statistics for the scaled point cloud are min. value: -0.931, max. value: 0.985, mean: 0.0, and standard Deviation: 0.111.
With this normalization, my point cloud data is ready to play nicely with the deep learning algorithms that I will be using soon.