Normalizing (Feature Scaling) Point Clouds for Machine Learning

Capturing some point cloud and image data with my robot.

Continuing my work on Machine Learning with point clouds in the realm of autonomous robots, and coming from working with image data, I was faced with the following question: does 3D data need normalization like image data does? The answer is a clear YES (duh!). Normalization, or feature scaling, is an important preprocessing step for many machine learning algorithms. The main benefit is that it encloses all features in a common boundary without losing information. This makes the flow of algorithms like gradient descent smooth and avoids bias toward features with values higher in magnitude.

Take the following image captured by my robot during one of our exploratory trips. The pixels in the image have the following statistics, min. value: 0, max value: 255, mean: 94.170, standard deviation: 74.270. This large spread in the values does not play nicely with machine learning algorithms.

Image captured by the robot while exploring the city

There are multiple ways to scale the pixels of an image. A common one is to enclose all of the values within a range of -1.0 and 1.0. The simple code snippet below achieves just that and changes the values of the above image to obtain the following statistics, min. value: -1.0, max. value: 1.0, mean: -0.261, and standard deviation: 0.583.

def normalize_image(image):
  image = tf.cast(image, dtype=tf.float32)
  image = image/127.5
  image -= 1

  return image

In the case of point clouds, where the data is composed of at least the XYZ coordinates of the points, the range of values can also be large. Take my RoboSense lidar, which has a horizontal resolution of 360°, a vertical resolution of 32°, and a detection distance of about 150 meters, the values each point can take vary widely. My initial question was, what would it mean to normalize this data? After spending some time searching, I found that many researchers do something similar to what I described above for images, which is to enclose the points within values of -1.0 and 1.0. In the case of points in 3D, this is equivalent to scaling down the point cloud to fit within a unit sphere.

So, for the point cloud shown below, which was taken exactly where the image above was taken, the statistics of the points are min. value: -96.804, max. value: 98.091, mean: -0.320, and standard deviation: 11.373.

Point cloud captured by the robot while exploring the city

In order to enclose all points within a unit sphere, the mean values for X, Y, and Z are computed and subtracted from the values of every point, this results in moving the entire point cloud to the origin (X = 0, Y= 0, Z = 0). Then the distances between all points and the origin are computed, and the coordinates of every point are divided by the maximum of such distances, effectively scaling all distances to the range -1.0 and 1.0. The code snipped below achieves this and the following three animations show the result.

def normalize_pc(points):
	centroid = np.mean(points, axis=0)
	points -= centroid
	furthest_distance = np.max(np.sqrt(np.sum(abs(points)**2,axis=-1)))
	points /= furthest_distance

	return points

In the first animation, the distance tool is used to check the distance from where the robot was to some random points. The shown distances are the original unnormalized ones.

Measuring some distances from robot location in an unnormalized point cloud.

The second animation shows the same point cloud after the points have been scaled. Using the distance measurement tool it can be seen that no distance is larger than one meter.

Measuring some distances from robot location in a normalized point cloud.

Finally, the third animation uses the point measurement tool to verify that every coordinate falls within the range [-1.0, 1.0]. The statistics for the scaled point cloud are min. value: -0.931, max. value: 0.985, mean: 0.0, and standard Deviation: 0.111.

Measuring some points in a normalized point cloud.

With this normalization, my point cloud data is ready to play nicely with the deep learning algorithms that I will be using soon.

Creating a Point Cloud Dataset for 3D Deep Learning

For the past two years, I have been working with robots. Earlier this year I stopped focusing on cameras only and decided to start working with LiDARs. So after much research, I settled for a 32 beams RoboSense device.

RoboSense Helios LiDAR

I had to spend some time setting it up, especially creating a suitable mount able to also carry a camera. After some playing around, the LiDAR was finally ready and I declare that I am in love with this kind of data.

Testing the LiDAR at night

The next step for my project was to start developing a system to detect and track objects in 3D using LiDAR point clouds. The applications are multiple but include detecting fixed objects (buildings, traffic signs, etc.) to create 3D maps, as well as detecting moving objects (pedestrians, cars, etc.) to avoid collisions.

Before any of the above-mentioned applications could be developed, I first needed to learn how to efficiently load point cloud data into TensorFlow, the tool that I use for Deep Learning. For now, my dataset consists of 12,200 point cloud-image pairs. The image is used as context to know what the LiDAR was looking at. I also pre-processed all point clouds to only show data approximately within the field of view of the camera, as opposed to the original 360° view.

Filtered point cloud with context image

Trying to load the data into TensorFlow was more challenging than I had expected. First, the point clouds were stored as PCD (Point Cloud Data) files, which is a file format for storing 3D point cloud data. TensorFlow cannot directly work with this type of file, so conversion was needed. Enter, the Open3D library, an easy-to-use tool to manipulate point clouds. Using this tool I could easily load a PCD file and extract the points as NumPy arrays of X, Y, and Z coordinates. Another tool, PyPotree, a point cloud viewer for large datasets was used to visualize and confirm that the points were extracted correctly on Google Colab.

Visualizing a point cloud within Google Colab with PyPotree

So armed with the new tools I uploaded 12,200 PCDs and 12,200 JPGsto my Google Drive and connected it to a Google Colab. I then created some code to load the PCDs, extract the points and put them in a NumPy array, a structure that TensorFlow can easily process. I ran the code confidently and watched in horror how after waiting for several minutes, the Colab complained that it had run out of memory while converting the point clouds. Bad news, as I plan to collect and process a lot more data than I currently have.

Fortunately, this is a common problem when dealing with large datasets, and tools like TensorFlow have the functionality to deal with such situations. The needed solution is the Dataset API, which offers methods to create efficient input pipelines. Quoting the API’s documentation: Dataset usage follows a common pattern:

  1. Create a source dataset from your input data.
  2. Apply dataset transformations to preprocess the data.
  3. Iterate over the dataset and process the elements.

Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

So, in essence, the Dataset API will allow me to create a pipeline and the data will be loaded in parts as the training loop in TensorFlow requests it, avoiding running out of memory. So, I reviewed how to use the API, and created some code to make a data pipeline. Following step 1 of the abovementioned pattern, the code first loaded a list of URLs for all of the PCDs and the images, then in step 2, the PCDs were to be loaded and converted to points in NumPy, and the images loaded and normalized. But here is when I ran into trouble again.

To be efficient, everything in the Dataset API (and all TensorFlow APIs apparently) runs as Tensors in a graph. The Dataset API provides functions to load data from different formats, but there were none for PCDs. After studying different possible solutions, I decided that instead of having my data as multiple PCD and JPEG files and having TensorFlow load them and pre-process them, I would instead pre-process all of the data offline, and pack it in an HDF5 file.

The Hierarchical Data Format version 5 (HDF5), is an open-source file format that supports large, complex, heterogeneous data. I obviously verified that the Dataset API supports this type of file. The main advantage of using this format, apart from playing nicely with TensorFlow, is that I can pack all of my data into one nicely structured large file that I can easily move around. I created a simple Python script to load all of the PCDs, extract the points, and pack them together with their corresponding context file into a nice HDF5 file.

Python script to pack point clouds and images into an HDF5 file

After loading the HDF5 file (approx 18 GB) into my Drive, I went back to Colab and added the corresponding Dataset API code. Essentially, step 1 of the pattern loaded the images and points from the HDF5 file and created the corresponding pairs, step 2 did some random selection of points from the point cloud (I will explain why in a later post), and normalized the images, and step 3 was ready to nicely serve the data upon request.

The final Dataset API code to load and serve data during training

I tried the data pipeline with a very basic training code, and it worked beautifully. No more out-of-memory error. I am not sure if this is the most efficient way to serve my data but it did the trick, and especially, creating the pipeline was a first great exercise in point cloud data manipulation. Next up, training the first TensorFlow model using point clouds.

Simplifying Point Cloud Labeling with Contextual Images and Point Cloud Filtering

Annotating point clouds from multi-line 360° LiDAR is exceedingly difficult. Providing context in the form of camera frames and limiting the point cloud to the Field Of View (FOV) of the camera simplifies things. To achieve this, we first had to replace our old, and not so stable LiDAR mount, with a sturdier one capable of also holding a camera.

With the LiDAR and the camera closer together, the next step was to synchronize and store the data coming from both sensors. The collected data was huge, so a simple Python script was created to allow for the selection of sequences of interest. Once the sequences were visually selected, they were saved in a format that the annotation tool CVAT can understand.

It was noted that although now CVAT provided the camera frames as context to annotate the LiDAR point clouds, the point clouds were still too large (360° horizontally, 32° vertically). It was not easy to know which part of the cloud corresponded to the camera frame (visual context), and many objects were still hard to identify.

To solve the issue, a C++ program using the Point Cloud Library (PCL) was created. PCL’s FrustumCulling filter was used for this purpose. The filter lets the user define a vertical and horizontal Field Of View (FOV), as well as the near and far plane distance. After some testing, the best parameters were defined to approximate the FOV of the camera. The points of the input cloud that fall outside of the approximated FOV are filtered out, and the points of the output closely match what the camera sees. This greatly facilitates the annotation of objects in the point cloud. Watch the above video for more exciting details.

Demonstrating SEDRAD, The Self Driving Ads Robot at AppWorks.

On Saturday, May 14, 2022, we demonstrated SEDRAD at the AppWorks offices in Taipei, Taiwan. The goal was to get approval to use the robot during their upcoming Demo Day #24. The demonstration was a big success and SEDRAD is set to navigate autonomously while showing information about the participating startups in the event.

AppWorks is the leading startup accelerator in Taiwan. It helps early-stage startups with resources and advice and facilitates their access to industry experts and potential investors. AppWorks admits two batches of startups per year, and at the end of each period, they hold their awaited Demo Day. We are excited to showcase SEDRAD during the upcoming event. Stay tuned for more information.

Testing 3D Annotation Tools

Before you can train a supervised Deep Learning model, you must first label your data.

Today I am testing Intel OpenVINO’s CVAT and MATLAB’s Lidar Labeler annotation tools for 3D data. First impressions, CVAT makes it easier to navigate the point cloud, but a small bug makes it hard to place the initial cuboid, making things a little slower overall. Lidar Labeler’s navigation is a little more difficult, but because it has no bug when placing the cuboid, it overall becomes faster to use. CVAT being free, if they were to fix the bug, it would become the preferred tool for now.

#DeepLearning #MachineLearning #AI #SelfDrivingCars #LiDAR

RTK Heading + LiDAR (Temporary) Mount Ready

After a few days of playing with some Makeblock (blue) metal pieces, I finally created a temporary mount for my RTK Heading (Dual GNSS) + 32-beans LiDAR system. It should be enough to test the sensors while a more stable one is built. I also conducted a quick indoor test of the LiDAR, it has been raining for two weeks so no chance to go outdoors yet.

SEDRAD, The Self-Driving Ads Robot is Finally Here

I am pleased to announce that the first version of SEDRAD, The Self-Driving Ads Robot, is finally here. I have released it as part of the final submission of the OpenCV Spatial AI Competition #Oak2021. Watch the video to learn more about what SEDRAD is capable of doing, and if you have any questions, don’t hesitate to contact me.

Drivable Path Segmentation Test 1

A couple of weeks ago I was collecting and labeling driving images to teach #SEDRAD how to recognize the surface to drive on using semantic segmentation.

The first deeplabv3+ model running fully on Oak-D cameras is ready and we took it for a spin. It is not perfect but it is a first step towards improving the safety of our #SelfDriving #Ads #Robot.

#Oak2021 #OpenCV #robotics #AI #MachineLearning #SuperAnnotate #autonomousdriving