For the past two years, I have been working with robots. Earlier this year I stopped focusing on cameras only and decided to start working with LiDARs. So after much research, I settled for a 32 beams RoboSense device.
I had to spend some time setting it up, especially creating a suitable mount able to also carry a camera. After some playing around, the LiDAR was finally ready and I declare that I am in love with this kind of data.
The next step for my project was to start developing a system to detect and track objects in 3D using LiDAR point clouds. The applications are multiple but include detecting fixed objects (buildings, traffic signs, etc.) to create 3D maps, as well as detecting moving objects (pedestrians, cars, etc.) to avoid collisions.
Before any of the above-mentioned applications could be developed, I first needed to learn how to efficiently load point cloud data into TensorFlow, the tool that I use for Deep Learning. For now, my dataset consists of 12,200 point cloud-image pairs. The image is used as context to know what the LiDAR was looking at. I also pre-processed all point clouds to only show data approximately within the field of view of the camera, as opposed to the original 360° view.
Trying to load the data into TensorFlow was more challenging than I had expected. First, the point clouds were stored as PCD (Point Cloud Data) files, which is a file format for storing 3D point cloud data. TensorFlow cannot directly work with this type of file, so conversion was needed. Enter, the Open3D library, an easy-to-use tool to manipulate point clouds. Using this tool I could easily load a PCD file and extract the points as NumPy arrays of X, Y, and Z coordinates. Another tool, PyPotree, a point cloud viewer for large datasets was used to visualize and confirm that the points were extracted correctly on Google Colab.
So armed with the new tools I uploaded 12,200 PCDs and 12,200 JPGsto my Google Drive and connected it to a Google Colab. I then created some code to load the PCDs, extract the points and put them in a NumPy array, a structure that TensorFlow can easily process. I ran the code confidently and watched in horror how after waiting for several minutes, the Colab complained that it had run out of memory while converting the point clouds. Bad news, as I plan to collect and process a lot more data than I currently have.
Fortunately, this is a common problem when dealing with large datasets, and tools like TensorFlow have the functionality to deal with such situations. The needed solution is the Dataset API, which offers methods to create efficient input pipelines. Quoting the API’s documentation: Dataset usage follows a common pattern:
- Create a source dataset from your input data.
- Apply dataset transformations to preprocess the data.
- Iterate over the dataset and process the elements.
Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.
So, in essence, the Dataset API will allow me to create a pipeline and the data will be loaded in parts as the training loop in TensorFlow requests it, avoiding running out of memory. So, I reviewed how to use the API, and created some code to make a data pipeline. Following step 1 of the abovementioned pattern, the code first loaded a list of URLs for all of the PCDs and the images, then in step 2, the PCDs were to be loaded and converted to points in NumPy, and the images loaded and normalized. But here is when I ran into trouble again.
To be efficient, everything in the Dataset API (and all TensorFlow APIs apparently) runs as Tensors in a graph. The Dataset API provides functions to load data from different formats, but there were none for PCDs. After studying different possible solutions, I decided that instead of having my data as multiple PCD and JPEG files and having TensorFlow load them and pre-process them, I would instead pre-process all of the data offline, and pack it in an HDF5 file.
The Hierarchical Data Format version 5 (HDF5), is an open-source file format that supports large, complex, heterogeneous data. I obviously verified that the Dataset API supports this type of file. The main advantage of using this format, apart from playing nicely with TensorFlow, is that I can pack all of my data into one nicely structured large file that I can easily move around. I created a simple Python script to load all of the PCDs, extract the points, and pack them together with their corresponding context file into a nice HDF5 file.
After loading the HDF5 file (approx 18 GB) into my Drive, I went back to Colab and added the corresponding Dataset API code. Essentially, step 1 of the pattern loaded the images and points from the HDF5 file and created the corresponding pairs, step 2 did some random selection of points from the point cloud (I will explain why in a later post), and normalized the images, and step 3 was ready to nicely serve the data upon request.
I tried the data pipeline with a very basic training code, and it worked beautifully. No more out-of-memory error. I am not sure if this is the most efficient way to serve my data but it did the trick, and especially, creating the pipeline was a first great exercise in point cloud data manipulation. Next up, training the first TensorFlow model using point clouds.