Testing Open3D-ML for 3D Object Detection and Segmentation

Point clouds with semantic labels from the Semantic KITTI dataset.

When starting out new research, my approach is usually to test different related things until enough experience allows me to begin connecting the dots. Before I could start building custom models for 3D object detection, I acquired a LiDAR and played around with some data. One next obvious step was to find out how the research world was labeling such data before I could label my own.

There are some very popular point clouds datasets for autonomous driving out there, with the most popular being the KITTI datasetNuScenesWaymo Open Dataset among others. I spent some time studying the KITTI dataset a while ago, and in general, noticed how hard it was to find the right tools to visualize the data. That was until I discovered Open3D, which made it simple for me to process and visualize point clouds. Open3D can be optionally bundled with Open3D-ML, which includes tools to visualize annotated point cloud data, and train/build/test 3D machine learning models (more on that in a future post).

Visualizing bounding boxes with Open3D. Image by Open3D via https://github.com/isl-org/Open3D-ML

The Open3D-ML GitHub page provides easy instructions to install the library with pip, but this only works with specific versions of CUDA and TensorFlow. Because I wanted to use the newer versions of such libraries, I decided to build Open3D from source. When doing this, I noticed that some steps were missing or were not clear enough. To simplify the life of anyone interested in building this library, I include below the steps that I followed to install and test Open3D-ML. Note that my system is Ubuntu 20.04.4 LTS and I have a Cuda-enabled GPU, therefore, the instructions here presented may vary depending on your system.

Step 1: Install Conda

Using Conda is the recommended way to try anything new without risking breaking your system. To install Conda follow the official steps here.

Step 2: Create and activate a Conda environment

Make sure to replace myenv with the actual name that you want to use.

conda create --name myenv
conda activate myenv

Step 3: Install Node.js

To install Node.js you can follow the steps below:

curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo npm install -g yarn

Step 4: Install TensorFlow

To install TensorFlow follow the official steps here.

Step 5: Install Jupyter Lab

conda install -c conda-forge jupyterlab

Step 6: Clone Open3D

git clone https://github.com/isl-org/Open3D

Step 7: Install dependencies

cd Open3d

Step 8: Create the build directory and clone Open3D-ML

mkdir build
cd build
git clone https://github.com/isl-org/Open3D-ML.git

Step 9: Configure the installation

This is assuming you have a Cuda-enabled GPU. Make sure to replace /path/to/your/conda/env/bin/python with the correct path to your Python. Also do not forget the two dots at the end of the command.


Step 10: Build the library

make -j$(nproc)

Step 11: Install as Python package

make install-pip-package

Step 12: Test Open3D installation

python -c "import open3d"

Step 13: Test Open3D-ML with TensorFlow installation

python -c "import open3d.ml.tf as ml3d"

Step 14: Downloading and preparing a dataset

In this step, we will be downloading the SemanticKITTI dataset. This dataset is over 80 GB so make sure to have plenty of space and time. The following steps will download and prepare the dataset. Make sure to replace /path/to/save/dataset with the desired path.

cd Open3D-ML/scripts/
./download_semantickitti.sh /path/to/save/dataset

Step 15: Loading and visualizing the dataset

In order to visualize the SemanticKITTI dataset, save the following Python code in a file and run it. Remember to replace /path/to/save/dataset/ with the path where the SemanticKITTI dataset was saved.

import open3d.ml.tf as ml3d

# construct a dataset by specifying dataset_path
dataset = ml3d.datasets.SemanticKITTI(dataset_path='/path/to/save/dataset/SemanticKitti/')

# get the 'all' split that combines training, validation and test set
all_split = dataset.get_split('all')

# print the attributes of the first datum

# print the shape of the first point cloud

# show the first 100 frames using the visualizer
vis = ml3d.vis.Visualizer()
vis.visualize_dataset(dataset, 'all', indices=range(340))

Right when you run the Python script, a visualizer opens and loads the first 340 data frames. You can change the number of frames loaded in the code. Once opened, you can explore the point clouds based on intensity, but the most interesting part is to explore the point clouds based on the semantic label of each point. The videos below show two examples.

In the first video, you can see how by selecting multiple frames you can play them as an animation. Make sure to select labels as the data type from the presented options.

Viewing some frames with their semantic labels as an animation with Open3D-ML

The second video shows how you can select a given frame and inspect the semantic objects present by activating and de-activating certain labels. When certain colors are too light and difficult to see, you can change the color to improve visibility.

Inspecting the semantic objects present in a frame with Open3D-ML

Step 16: Troubleshooting

When performing the steps above, I encountered the following exceptions. Fixing them was easy, in case you find them as well.

If you get ModuleNotFoundError: No module named ‘yapf’

pip install yapf

If you get ModuleNotFoundError: No module named ‘jupyter_packaging’

pip install jupyter-packaging

And that’s it. Open3D-ML is a great tool for visualizing point cloud datasets. The next step is to study the datasets to see how they were labeled. Then, I will go over training/testing 3D models with Open3D. Hopefully, this will bring me closer to performing the same operations with my custom data.

Normalizing (Feature Scaling) Point Clouds for Machine Learning

Capturing some point cloud and image data with my robot.

Continuing my work on Machine Learning with point clouds in the realm of autonomous robots, and coming from working with image data, I was faced with the following question: does 3D data need normalization like image data does? The answer is a clear YES (duh!). Normalization, or feature scaling, is an important preprocessing step for many machine learning algorithms. The main benefit is that it encloses all features in a common boundary without losing information. This makes the flow of algorithms like gradient descent smooth and avoids bias toward features with values higher in magnitude.

Take the following image captured by my robot during one of our exploratory trips. The pixels in the image have the following statistics, min. value: 0, max value: 255, mean: 94.170, standard deviation: 74.270. This large spread in the values does not play nicely with machine learning algorithms.

Image captured by the robot while exploring the city

There are multiple ways to scale the pixels of an image. A common one is to enclose all of the values within a range of -1.0 and 1.0. The simple code snippet below achieves just that and changes the values of the above image to obtain the following statistics, min. value: -1.0, max. value: 1.0, mean: -0.261, and standard deviation: 0.583.

def normalize_image(image):
  image = tf.cast(image, dtype=tf.float32)
  image = image/127.5
  image -= 1

  return image

In the case of point clouds, where the data is composed of at least the XYZ coordinates of the points, the range of values can also be large. Take my RoboSense lidar, which has a horizontal resolution of 360°, a vertical resolution of 32°, and a detection distance of about 150 meters, the values each point can take vary widely. My initial question was, what would it mean to normalize this data? After spending some time searching, I found that many researchers do something similar to what I described above for images, which is to enclose the points within values of -1.0 and 1.0. In the case of points in 3D, this is equivalent to scaling down the point cloud to fit within a unit sphere.

So, for the point cloud shown below, which was taken exactly where the image above was taken, the statistics of the points are min. value: -96.804, max. value: 98.091, mean: -0.320, and standard deviation: 11.373.

Point cloud captured by the robot while exploring the city

In order to enclose all points within a unit sphere, the mean values for X, Y, and Z are computed and subtracted from the values of every point, this results in moving the entire point cloud to the origin (X = 0, Y= 0, Z = 0). Then the distances between all points and the origin are computed, and the coordinates of every point are divided by the maximum of such distances, effectively scaling all distances to the range -1.0 and 1.0. The code snipped below achieves this and the following three animations show the result.

def normalize_pc(points):
	centroid = np.mean(points, axis=0)
	points -= centroid
	furthest_distance = np.max(np.sqrt(np.sum(abs(points)**2,axis=-1)))
	points /= furthest_distance

	return points

In the first animation, the distance tool is used to check the distance from where the robot was to some random points. The shown distances are the original unnormalized ones.

Measuring some distances from robot location in an unnormalized point cloud.

The second animation shows the same point cloud after the points have been scaled. Using the distance measurement tool it can be seen that no distance is larger than one meter.

Measuring some distances from robot location in a normalized point cloud.

Finally, the third animation uses the point measurement tool to verify that every coordinate falls within the range [-1.0, 1.0]. The statistics for the scaled point cloud are min. value: -0.931, max. value: 0.985, mean: 0.0, and standard Deviation: 0.111.

Measuring some points in a normalized point cloud.

With this normalization, my point cloud data is ready to play nicely with the deep learning algorithms that I will be using soon.

Creating a Point Cloud Dataset for 3D Deep Learning

For the past two years, I have been working with robots. Earlier this year I stopped focusing on cameras only and decided to start working with LiDARs. So after much research, I settled for a 32 beams RoboSense device.

RoboSense Helios LiDAR

I had to spend some time setting it up, especially creating a suitable mount able to also carry a camera. After some playing around, the LiDAR was finally ready and I declare that I am in love with this kind of data.

Testing the LiDAR at night

The next step for my project was to start developing a system to detect and track objects in 3D using LiDAR point clouds. The applications are multiple but include detecting fixed objects (buildings, traffic signs, etc.) to create 3D maps, as well as detecting moving objects (pedestrians, cars, etc.) to avoid collisions.

Before any of the above-mentioned applications could be developed, I first needed to learn how to efficiently load point cloud data into TensorFlow, the tool that I use for Deep Learning. For now, my dataset consists of 12,200 point cloud-image pairs. The image is used as context to know what the LiDAR was looking at. I also pre-processed all point clouds to only show data approximately within the field of view of the camera, as opposed to the original 360° view.

Filtered point cloud with context image

Trying to load the data into TensorFlow was more challenging than I had expected. First, the point clouds were stored as PCD (Point Cloud Data) files, which is a file format for storing 3D point cloud data. TensorFlow cannot directly work with this type of file, so conversion was needed. Enter, the Open3D library, an easy-to-use tool to manipulate point clouds. Using this tool I could easily load a PCD file and extract the points as NumPy arrays of X, Y, and Z coordinates. Another tool, PyPotree, a point cloud viewer for large datasets was used to visualize and confirm that the points were extracted correctly on Google Colab.

Visualizing a point cloud within Google Colab with PyPotree

So armed with the new tools I uploaded 12,200 PCDs and 12,200 JPGsto my Google Drive and connected it to a Google Colab. I then created some code to load the PCDs, extract the points and put them in a NumPy array, a structure that TensorFlow can easily process. I ran the code confidently and watched in horror how after waiting for several minutes, the Colab complained that it had run out of memory while converting the point clouds. Bad news, as I plan to collect and process a lot more data than I currently have.

Fortunately, this is a common problem when dealing with large datasets, and tools like TensorFlow have the functionality to deal with such situations. The needed solution is the Dataset API, which offers methods to create efficient input pipelines. Quoting the API’s documentation: Dataset usage follows a common pattern:

  1. Create a source dataset from your input data.
  2. Apply dataset transformations to preprocess the data.
  3. Iterate over the dataset and process the elements.

Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

So, in essence, the Dataset API will allow me to create a pipeline and the data will be loaded in parts as the training loop in TensorFlow requests it, avoiding running out of memory. So, I reviewed how to use the API, and created some code to make a data pipeline. Following step 1 of the abovementioned pattern, the code first loaded a list of URLs for all of the PCDs and the images, then in step 2, the PCDs were to be loaded and converted to points in NumPy, and the images loaded and normalized. But here is when I ran into trouble again.

To be efficient, everything in the Dataset API (and all TensorFlow APIs apparently) runs as Tensors in a graph. The Dataset API provides functions to load data from different formats, but there were none for PCDs. After studying different possible solutions, I decided that instead of having my data as multiple PCD and JPEG files and having TensorFlow load them and pre-process them, I would instead pre-process all of the data offline, and pack it in an HDF5 file.

The Hierarchical Data Format version 5 (HDF5), is an open-source file format that supports large, complex, heterogeneous data. I obviously verified that the Dataset API supports this type of file. The main advantage of using this format, apart from playing nicely with TensorFlow, is that I can pack all of my data into one nicely structured large file that I can easily move around. I created a simple Python script to load all of the PCDs, extract the points, and pack them together with their corresponding context file into a nice HDF5 file.

Python script to pack point clouds and images into an HDF5 file

After loading the HDF5 file (approx 18 GB) into my Drive, I went back to Colab and added the corresponding Dataset API code. Essentially, step 1 of the pattern loaded the images and points from the HDF5 file and created the corresponding pairs, step 2 did some random selection of points from the point cloud (I will explain why in a later post), and normalized the images, and step 3 was ready to nicely serve the data upon request.

The final Dataset API code to load and serve data during training

I tried the data pipeline with a very basic training code, and it worked beautifully. No more out-of-memory error. I am not sure if this is the most efficient way to serve my data but it did the trick, and especially, creating the pipeline was a first great exercise in point cloud data manipulation. Next up, training the first TensorFlow model using point clouds.

Simplifying Point Cloud Labeling with Contextual Images and Point Cloud Filtering

Annotating point clouds from multi-line 360° LiDAR is exceedingly difficult. Providing context in the form of camera frames and limiting the point cloud to the Field Of View (FOV) of the camera simplifies things. To achieve this, we first had to replace our old, and not so stable LiDAR mount, with a sturdier one capable of also holding a camera.

With the LiDAR and the camera closer together, the next step was to synchronize and store the data coming from both sensors. The collected data was huge, so a simple Python script was created to allow for the selection of sequences of interest. Once the sequences were visually selected, they were saved in a format that the annotation tool CVAT can understand.

It was noted that although now CVAT provided the camera frames as context to annotate the LiDAR point clouds, the point clouds were still too large (360° horizontally, 32° vertically). It was not easy to know which part of the cloud corresponded to the camera frame (visual context), and many objects were still hard to identify.

To solve the issue, a C++ program using the Point Cloud Library (PCL) was created. PCL’s FrustumCulling filter was used for this purpose. The filter lets the user define a vertical and horizontal Field Of View (FOV), as well as the near and far plane distance. After some testing, the best parameters were defined to approximate the FOV of the camera. The points of the input cloud that fall outside of the approximated FOV are filtered out, and the points of the output closely match what the camera sees. This greatly facilitates the annotation of objects in the point cloud. Watch the above video for more exciting details.

Demonstrating SEDRAD, The Self Driving Ads Robot at AppWorks.

On Saturday, May 14, 2022, we demonstrated SEDRAD at the AppWorks offices in Taipei, Taiwan. The goal was to get approval to use the robot during their upcoming Demo Day #24. The demonstration was a big success and SEDRAD is set to navigate autonomously while showing information about the participating startups in the event.

AppWorks is the leading startup accelerator in Taiwan. It helps early-stage startups with resources and advice and facilitates their access to industry experts and potential investors. AppWorks admits two batches of startups per year, and at the end of each period, they hold their awaited Demo Day. We are excited to showcase SEDRAD during the upcoming event. Stay tuned for more information.

Testing 3D Annotation Tools

Before you can train a supervised Deep Learning model, you must first label your data.

Today I am testing Intel OpenVINO’s CVAT and MATLAB’s Lidar Labeler annotation tools for 3D data. First impressions, CVAT makes it easier to navigate the point cloud, but a small bug makes it hard to place the initial cuboid, making things a little slower overall. Lidar Labeler’s navigation is a little more difficult, but because it has no bug when placing the cuboid, it overall becomes faster to use. CVAT being free, if they were to fix the bug, it would become the preferred tool for now.

#DeepLearning #MachineLearning #AI #SelfDrivingCars #LiDAR

RTK Heading + LiDAR (Temporary) Mount Ready

After a few days of playing with some Makeblock (blue) metal pieces, I finally created a temporary mount for my RTK Heading (Dual GNSS) + 32-beans LiDAR system. It should be enough to test the sensors while a more stable one is built. I also conducted a quick indoor test of the LiDAR, it has been raining for two weeks so no chance to go outdoors yet.

SEDRAD, The Self-Driving Ads Robot is Finally Here

I am pleased to announce that the first version of SEDRAD, The Self-Driving Ads Robot, is finally here. I have released it as part of the final submission of the OpenCV Spatial AI Competition #Oak2021. Watch the video to learn more about what SEDRAD is capable of doing, and if you have any questions, don’t hesitate to contact me.