Synchronizing LiDAR and Camera Data for Offline Processing Using ROS

Synchronized point cloud – image pair

Robots and other complex machines are usually equipped with a wide variety of sensors. Often, when performing tasks such as driving autonomously or collecting data for offline processing, it is useful to synchronize the different sensors. For example, in my work with 3D Machine Learning, I had to collect a dataset consisting of LiDAR point clouds as well as camera frames to serve as the context.

Why is it important to synchronize this data? There are different reasons. One is that different sensors collect data at different rates. For instance, my LiDAR collects data at 10 Hz (10 samples per second) while my camera takes images at 3 Hz. For every image, we have about 3 point clouds. Another reason is related to how fast your machine is moving. If you were collecting data out of sensors mounted on a car, the difference between a point cloud collected at a given time and the corresponding context image collected about a second later would be big if the car is moving very fast. The image might be showing things that the LiDAR could not catch earlier (occluded objects, etc).

My LiDAR’s ROS Topic with point clouds coming at 10 Hz
My camera’s ROS Topic with images coming at 3 Hz

Synchronizing sensor data seems to have two obvious advantages. First, it allows us to drop unnecessary data from faster sensors, hence saving space and complexity in the subsequent processing. Second, by making sure the data is collected at approximately the same time, we are confident that all the data represents the same state of the world, as perceived by the sensors, at that time. In the rest of this article, I will describe how I synchronized data from a LiDAR and a camera using the Robot Operating System (ROS). The same approach can be used for an arbitrary number and type of sources.

First, what is ROS, and why I used it? ROS is an open-source robotics middleware suit. Its main goal is to provide services on top of an Operating System (Ubuntu for me) to facilitate the development of robotics applications. The two main reasons to use ROS are that it provides commonly used functionality (in our case synchronizing data), and message-passing between processes (in my case, the SDKs of the LiDAR and the camera send the messages). Collecting and synchronizing the data without a tool like ROS would require a larger amount of code and added complexity. If you are working with robots, ROS is worth learning.

So, my robot’s computer is running a ROS node provided by my LiDAR’s manufacturer that is publishing messages of type PointCloud2 at 10 Hz, and another node provided by my camera’s manufacturer publishing Image data at 3 Hz. A ROS node subscribes to both topics (ROS messaging mechanism) in order to receive the data. A ROS package called message_filters is used to synchronize the topics by using an ApproximateTime Policy synchronizer. This policy is useful when the data from the different sensors don’t come at the exact time, but can be collected within a small enough time delta.

Some of the most important aspects of the code include creating subscribers to the LiDAR and camera data topics, creating a synchronizer with the ApproximateTime policy, and registering a callback for the synchronizer, as shown in the code snipped below.

// Create the subscribers
message_filters::Subscriber<Image> image_sub(nh, "/zed/rgb/image_rect_color", 1);
message_filters::Subscriber<CameraInfo> camera_info_sub(nh, "/zed/rgb/camera_info", 1);
message_filters::Subscriber<PointCloud2> point_cloud_sub(nh, "/rslidar_points", 1);

// Create the synchronizer
typedef sync_policies::ApproximateTime<Image, CameraInfo, PointCloud2> MySyncPolicy;
  
// ApproximateTime takes a queue size as its constructor argument, hence MySyncPolicy(10)
Synchronizer<MySyncPolicy> sync(MySyncPolicy(10), image_sub, camera_info_sub, point_cloud_sub);
sync.registerCallback(boost::bind(&callback, _1, _2, _3));

The registered callback will be triggered whenever there is LiDAR and Camera data that arrive at nearly the same time. Every other data will be ignored. Multiple things can be done within the callback, but in my case, I just save all synchronized point cloud-image pairs for offline processing.

// Synchronizer Callback, it will be called when camera and LiDAR data are in sync.
void callback(const ImageConstPtr& image, const CameraInfoConstPtr& camara_info, const PointCloud2ConstPtr& point_cloud)
{
  printf("%s\n", "All data in sync!" );

  // Process data

}

The main reason for collecting such data is that I can for instance use the context image to annotate the point cloud for object detection. It is a lot simpler to know what is in front of the robot from the context image than from the point cloud. Finally, as described in a previous post, some filtering can be done on the point cloud to only retain the points directly visible by the camera as shown in the image below.

Top: filtered point cloud and context image. Down: full point cloud.

The full code is presented below. Notice that to run the code you need to have ROS installed as well as multiple other libraries. The instructions to install all those libraries are out of the scope of this article.

#include <message_filters/subscriber.h>
#include <message_filters/synchronizer.h>
#include <message_filters/sync_policies/approximate_time.h>

#include <sensor_msgs/Image.h>
#include <sensor_msgs/CameraInfo.h>
#include <sensor_msgs/PointCloud2.h>

#include <ros/ros.h>
#include <pcl_ros/point_cloud.h>

#include <cv_bridge/cv_bridge.h>
#include <sensor_msgs/image_encodings.h>

#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>
#include <fstream>
#include <sstream>    

#include <boost/filesystem.hpp>                                    
#include <ros/package.h>

static const std::string OPENCV_WINDOW = "Image window";

using namespace sensor_msgs;
using namespace message_filters;
using namespace std;


std::string root_output_dir, lidar_output_dir, camera_output_dir;
bool debug;

// Synchronizer Callback, it will be called when camera and LiDAR data are in sync.
void callback(const ImageConstPtr& image, const CameraInfoConstPtr& camara_info, const PointCloud2ConstPtr& point_cloud)
{
  printf("%s\n", "All data in sync!" );

  // Convert ROS PointCloud2 to PCL point cloud
  pcl::PointCloud<pcl::PointXYZI> cloud;
  pcl::fromROSMsg(*point_cloud, cloud);

  // Create a date string from the point cloud's timestamp to use in the file name of the saved data
  const int output_size = 100;
  char output[output_size];
  std::time_t raw_time = static_cast<time_t>(point_cloud->header.stamp.sec);
  struct tm* timeinfo = localtime(&raw_time);
  std::strftime(output, output_size, "lidar_%Y_%m_%d_%H_%M_%S", timeinfo);

  // Creates a string containing the millisencods to be added to the previously created date string
  std::stringstream ss; 
  ss << std::setw(9) << std::setfill('0') << point_cloud->header.stamp.nsec;  
  const size_t fractional_second_digits = 4;
  
  // Combine all of the pieces to get the output file name
  std::string output_file = lidar_output_dir + "/" + std::string(output) + "." + ss.str().substr(0, fractional_second_digits)+".pcd";

  // Save the point cloud as a PCD file
  pcl::io::savePCDFileASCII (output_file, cloud);
  printf("%s\n", output_file.c_str() );

  // Convert the ROS image to an OpenCV image
  cv_bridge::CvImagePtr cv_ptr;
  try{
    cv_ptr = cv_bridge::toCvCopy(image, sensor_msgs::image_encodings::BGR8);
  }catch (cv_bridge::Exception& e){
    ROS_ERROR("cv_bridge exception: %s", e.what());
    return;
  }

  // Update GUI Window
  if (debug){
    cv::imshow(OPENCV_WINDOW, cv_ptr->image);
    cv::waitKey(3);
  }

  // Create the filename for the image
  output_file = camera_output_dir + "/" + std::string(output) + "." + ss.str().substr(0, fractional_second_digits)+".jpg";
  
  // Save the image
  cv::imwrite(output_file, cv_ptr->image);
  
}


int main(int argc, char** argv)
{

  // Initialize the ROS node
  ros::init(argc, argv, "lidar_and_cam_synchronizer");
  ros::NodeHandle nh("~");

  // Get the paremeters or use default values
  nh.param("root_output_dir", root_output_dir, ros::package::getPath("shl_robot")+"/data/lidar"); 
  nh.param("debug", debug, false);

  // Create the subscribers
  message_filters::Subscriber<Image> image_sub(nh, "/zed/rgb/image_rect_color", 1);
  message_filters::Subscriber<CameraInfo> camera_info_sub(nh, "/zed/rgb/camera_info", 1);
  message_filters::Subscriber<PointCloud2> point_cloud_sub(nh, "/rslidar_points", 1);

  // Create the synchronizer
  typedef sync_policies::ApproximateTime<Image, CameraInfo, PointCloud2> MySyncPolicy;
  
  // ApproximateTime takes a queue size as its constructor argument, hence MySyncPolicy(10)
  Synchronizer<MySyncPolicy> sync(MySyncPolicy(10), image_sub, camera_info_sub, point_cloud_sub);
  sync.registerCallback(boost::bind(&callback, _1, _2, _3));

  // Create the folder for the current run. A new folder will be created each time the node runs
  const int output_size = 100;
  char output[output_size];
  std::time_t raw_time = static_cast<time_t>(ros::Time::now().sec);
  struct tm* timeinfo = localtime(&raw_time);
  std::strftime(output, output_size, "run_%Y_%m_%d_%H_%M_%S", timeinfo);

  // Combine all of the pieces to get the output folders for this run
  lidar_output_dir = root_output_dir + "/" + std::string(output)+"/pcd";
  camera_output_dir = root_output_dir + "/" + std::string(output)+"/jpg";

  boost::filesystem::create_directories(lidar_output_dir);
  boost::filesystem::create_directories(camera_output_dir);

  std::cout << lidar_output_dir << std::endl;

  // Call spin() to let the node run until stopped.
  ros::spin();

  return 0;
}

3D Object Detection with Open3D-ML and PyTorch Backend

3D Object Detection on the Kitti Dataset, photo provided by Open3D

In previous articles, I described how I used Open3D-ML to do Semantic Segmentation on the SemanticKITTI dataset and on my own dataset. Now it is time to move to another important aspect of the Perception Stack for Autonomous Vehicles and Robots, which is Object Detection from Point Clouds. Make sure to install Open3D-ML with PyTorch support if you want to run the code described in this article.

Visualizing the Kitti Dataset

The first thing to do is to download the popular Kitti dataset and visualize it. We are interested only in the data containing 3D bounding box annotations of different objects (pedestrians, cars, etc) seen by an autonomous vehicle. The steps are described below:

Step 1: Download the Kitti dataset

cd path-to-Open3D-ML/scripts/download_datasets
./download_kitti.sh path/to/save/dataset

Make sure to replace path-to-Open3D-ML with the path to the Open3D-ML installation and path/to/save/dataset with the path where you wish to save the dataset.

Step 2: Clone or update my repository

If you have not done it yet, clone my repository containing the Python code.

git clone https://github.com/carlos-argueta/open3d_experiments.git

If you previously cloned it, then just pull the new code.

git pull

Step 3: Navigate to the repository and activate the Conda environment

conda activate myenv
cd open3d_experiments

Make sure to replace myenv with your actual environment’s name.

Step 4: Run the script to visualize the first 400 point clouds with their bounding boxes

python3 view_detection_torch.py

The code is simple, first, a dataset is built passing the path to where Kitti was saved, so make sure to replace /path/to/save/dataset/Kitti with the path to your Kitti dataset. Some attributes are printed for a sanity check and you can ignore that part, finally, a visualizer is created passing as parameters the dataset and some indices referring to the point clouds that we want to visualize, in this case, we use range(400) to visualize the first 400 frames, but you can change this to other values to view other parts of the dataset.

import open3d.ml.torch as ml3d  # or open3d.ml.tf as ml3d
# construct a dataset by specifying dataset_path
dataset = ml3d.datasets.KITTI(dataset_path='/path/to/save/dataset/Kitti')

# get the 'all' split that combines training, validation and test set
all_split = dataset.get_split('all')

# print the attributes of the first datum
print(all_split.get_attr(0))

# print the shape of the first point cloud
print(all_split.get_data(0)['point'].shape)

# show the first 400 frames using the visualizer
vis = ml3d.vis.Visualizer()
vis.visualize_dataset(dataset, "training", indices=range(400))
Visualizing the Kitti Dataset with Open3d-ML

As you can see from the previous video, a window will open where you can select different point clouds and view the different bounding boxes included. These boxes were manually created and are part of the training set portion of Kitti.

Object Detection on the Kitti Testing Set and on Custom Data

Let’s now use a pre-trained object detection model on unannotated data. We will use both the testing portion of the Kitti dataset, as well as my own custom data, and we will see how the model performs on these two different datasets.

Assuming you have cloned or updated the repository as described above, in order to run the object detection script, do the following:

conda activate myenv
cd open3d_experiments
python3 detection_torch.py

The code first loads a pipeline configuration file for the Point Pillars model followed by creating the model with it. Make sure to replace /path/to/Open3D/ with the path where you cloned the Open3D repository when installing it. Next, the paths to both the Kitti dataset and a small part of my personal dataset (provided with the repo) are added to the configuration. The Kitti dataset is then loaded with the utilities provided by Open3D-ML and the custom dataset with the custom function provided within the script.

# Load an ML configuration file
cfg_file = "/path/to/Open3D/build/Open3D-ML/ml3d/configs/pointpillars_kitti.yml"
cfg = _ml3d.utils.Config.load_from_file(cfg_file)

# Load the PointPillars model
model = ml3d.models.PointPillars(**cfg.model)

# Add path to the Kitti dataset and your own custom dataset
cfg.dataset['dataset_path'] = '/path/to/save/dataset/Kitti'
cfg.dataset['custom_dataset_path'] = './pcds'

# Load the datasets
dataset = ml3d.datasets.KITTI(cfg.dataset.pop('dataset_path', None), **cfg.dataset)
custom_dataset = load_custom_dataset(cfg.dataset.pop('custom_dataset_path', None))

Next, the object detection pipeline is created using the model and configuration object as parameters, followed by loading the model parameters which will be downloaded if not yet present locally. The test split of the Kitti dataset will be selected and the Open3D visualizer will be created.

# Create the ML pipeline
pipeline = ml3d.pipelines.ObjectDetection(model, dataset=dataset, device="gpu", **cfg.pipeline)

# download the weights.
ckpt_folder = "./logs/"
os.makedirs(ckpt_folder, exist_ok=True)
ckpt_path = ckpt_folder + "pointpillars_kitti_202012221652utc.pth"
pointpillar_url = "https://storage.googleapis.com/open3d-releases/model-zoo/pointpillars_kitti_202012221652utc.pth"
if not os.path.exists(ckpt_path):
 cmd = "wget {} -O {}".format(pointpillar_url, ckpt_path)
 os.system(cmd)

# load the parameters of the model
pipeline.load_ckpt(ckpt_path=ckpt_path)

# Select the test split of the Kitti dataset
test_split = dataset.get_split("test")

# Prepare the visualizer 
vis = Visualizer()

Finally, an empty list is created to store the point clouds with the detections for later visualization. The first loop in the following code gets the first 10 data frames from the Kitti test set, runs the inference, uses a custom function to filter out detections with low scores, and creates a dictionary pred with the format expected by the visualizer. The dictionary is then added to the data list.

The second loop obtains frames from the custom dataset, pre-processes them to make them compatible with the pipeline using the provided prepare_point_cloud_for_inference method, and then runs the inference, adding the results to the list just like in the first loop. The last line runs the visualizer so that we can inspect the results. For more details about how the provided custom methods described in the paragraphs above work, please consult the source code included in the repository.

# Variable to accumulate the predictions
data_list = []

# Let's detect objects in the first few point clouds of the Kitti set
for idx in tqdm(range(10)):
    # Get one test point cloud from the SemanticKitti dataset
    data = test_split.get_data(idx)
    
    # Run the inference
    result = pipeline.run_inference(data)[0]
    
    # Filter out results with low confidence
    result = filter_detections(result)
    
    # Prepare a dictionary usable by the visulization tool
    pred = {
    "name": 'KITTI' + '_' + str(idx),
    'points': data['point'],
    'bounding_boxes': result
    }
    
    # Append the data to the list    
    data_list.append(pred)
   
    
# Let's detect objects in the first few point clouds of the custom set
for idx in tqdm(range(len(custom_dataset))):
    # Get one point cloud and format it for inference
    data, pcd = prepare_point_cloud_for_inference(custom_dataset[idx])
 
    # Run the inference
    result = pipeline.run_inference(data)[0]
    # Filter out results with low confidence
    result = filter_detections(result, min_conf = 0.3)
    
    # Prepare a dictionary usable by the visulization tool
    pred = {
    "name": 'Custom' + '_' + str(idx),
    'points': data['point'],
    'bounding_boxes': result
    }
    # Append the data to the list  
    data_list.append(pred)

# Visualize the results
vis.visualize(data_list, None, bounding_boxes=None)
3D Object Detection with a Point Pillars Model on the Kitti and Custom Datasets

As you can see in the video above, with the Kitti test set, the model can detect pedestrians and cars without much issue. With my custom dataset, pedestrians seem to be detected but no cars were. This probably means that re-training is needed with my own data. This is often the case when Machine Learning models encounter datasets that differ from the ones they were trained with. In future articles, I will explore training models using Open3D-ML with my own data.

Filtering a Point Cloud to Match the Field of View of the Camera

LiDAR-Camera Setup of my Research Robot

In a previous post, I described why and how I was collecting a Point Clouds dataset. My setup is depicted in the image above, where a 360°, 32-beam LiDAR is placed above a stereo camera. One of the steps mentioned in the article was to crop (or filter) the Point Cloud to only show points approximately within the Field of View (FoV) of the camera of my robot. The reason is that the camera frames provide the necessary context to understand what the LiDAR is seeing, making things like annotating the clouds easier.

In this post, I will describe how to process a full Point Cloud to only retain the points within the camera FoV. There are multiple libraries that could be used to achieve this but I used a popular and powerful C++ library called Point Cloud Library (PCL). I am an Ubuntu user so as usual, my instructions are for Ubuntu and may work without many modifications for other Linux distros.

Here are the steps:

Step 1: Install the PCL library

sudo apt install libpcl-dev

Step 2: Clone my PCL experiments repo

git clone https://github.com/carlos-argueta/pcl_experiments.git

The two folders that interest us are filter_camera_fov which contains the C++ code and pcds which contains a small dataset composed of full 360°LiDAR scans and JPG images captured by the front-facing camera of my robot. The idea is to filter the scans to only retain content that can be seen by the camera.

Step 3: Go to the filter_camera_fov folder and create the build folder

cd pcl_experiments/filter_camera_fov
mkdir build
cd build

Step 4: Build the filter

cmake ..
make

Step 5: Run the filter

Running the filter will override the PCD files, if you intend to do something with the original files, please make a copy of the pcds folder before running the code below.

./filter ../../pcds

That’s it, the program will load all the files in the pcds folder (or any other folder with PCD files that you specify), and then filter them one by one, overwriting the original files with the filtered Point Clouds. The following video shows the entire process.

Running the filter and viewing the results

If you want to quickly visualize the PCDs (before or after filtering), you can use the pcl_viewer utility. To install and use it you can follow these simple steps:

Step 1: Install pcl_tools

sudo apt-get install pcl-tools

Step 2: Visualize PCD file with pcl_viewer

Assuming you are at the root of the repo’s directory, run the following code.

Change the name of the file to view another one of the provided Point Clouds.

pcl_viewer pcds/lidar_2022_05_05_16_04_21.2595.pcd
Visualizing the filtered Point Clouds with pcl_viewer

For details about how the code works please see the filter.cpp file inside the filter_camera_fov folder. In this article, I will just briefly describe how the filter function of the program works.

void filter(std::vector<PCD, Eigen::aligned_allocator<PCD> > &data){
  cout<<endl<<endl<<"Applying Filter"<<endl;

  PointCloud::Ptr cloud_filtered (new PointCloud);

  // Create the filter
  pcl::FrustumCulling<PointT> fc;
  // The following parameters were defined by trial and error. 
  // You can modify them to better match your expected results
  fc.setVerticalFOV (100);
  fc.setHorizontalFOV (100);
  fc.setNearPlaneDistance (0.0);
  fc.setFarPlaneDistance (150);
   
  // Define the camera pose as a rotation and translation with respect to the LiDAR pose.
  Eigen::Matrix4f camera_pose = Eigen::Matrix4f::Identity();
  Eigen::Matrix3f rotation = Eigen::Quaternionf(0.9969173, 0, 0, 0.0784591  ).toRotationMatrix();
  Eigen::RowVector3f translation(0, 0, 0);
  // This is the most important part, it tells you in which direction to look in the Point Cloud
  camera_pose.block(0,0,3,3) = rotation; 
  camera_pose.block(3,0,1,3) = translation;
  cout<<"Camera Pose "<<endl<<camera_pose<<endl<<endl;
  fc.setCameraPose (camera_pose);
   
  // Go over each Point Cloud and filter it
  for (auto & d : data){
    // Run the filter on the cloud
    PointCloud::Ptr cloud_filtered (new PointCloud);
    fc.setInputCloud (d.cloud);
    fc.filter(*cloud_filtered);
    // Update the cloud 
    d.cloud = cloud_filtered;
    // Replace the PCD file with the filtered data
    pcl::io::savePCDFileASCII (d.f_name, *d.cloud);
    
  }

}

The filter function receives a vector (like a list) of loaded PCD structures. The PCD structure is a C struct that has a file name f_name and a Point Cloud cloud field. In lines 7 to 13 above, a FrustumCulling filter is created. The provided parameters of the filter are good defaults that I found by trial and error, but feel free to experiment with different values. The parameters define a frustum and only points falling within it will be kept. This is great as it is related to something called the viewing frustum, which is in turn related to the fields of view of cameras etc. Feel free to follow the provided links to read more about it if you want.

Next, in lines 15–23 we define a transformation matrix representing the camera pose, which is basically a rotation and translation of the frustum. Think of it as moving the frustum to point in the direction in which we want to see, and hence keep points within that view. The provided values for the rotation and translation matrices roughly correspond to the transformation necessary to align my camera’s coordinate system with my LiDAR’s coordinate system. If you capture your own cloud-image pairs, you may need to adjust this to match your hardware setup.

Finally, the loop in lines 25–36, picks one Point Cloud at a time from the input vector, applies the filter, replaces the old cloud with the filtered cloud in the vector, and overrides the older PCD file with the filtered content. That is essentially the heart of the program. Other functions in the program deal with loading all PCD files into the vector, visualizing a cloud before and after filtering, etc. Please consult the source code for more details.

Semantic Segmentation with Open3D-ML, PyTorch Backend, and a Custom Dataset

Note: Instructions to download, run, and troubleshoot the code introduced in this article are provided at the end.

As part of my experimentation with Open3D-ML for Point Clouds, I wrote articles explaining how to install this library with Tensorflow and PyTorch support. To test the installation, I explained how to run a simple Python script to visualize a labeled dataset for Semantic Segmentation called SemanticKITTI. In this article, I go over the steps I followed to do inference on any Point Cloud, including the test portion of SemanticKITTI, as well as on my private dataset.

The rest of this article assumes that you have successfully installed and tested Open3D-ML with PyTorch backend by following my previous article. Having done so also means you have downloaded the SemanticKITTI dataset. To run a Semantic Segmentation model on unlabeled data, you need to load an Open3D-ML pipeline. The pipeline will consist of a Semantic Segmentation model, a dataset, and probably other pre/post-processing steps. Open3D-ML comes with modules and configuration files to easily load and run popular pipelines.

To do inference on new Point Clouds, we will use a popular model called RandLA-Net presented in a 2019 paper titled RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. Conveniently, Open3D-ML has an implementation of this method and has configurations to load and run such a method on the SemanticKITTI dataset without much effort.

To load the configuration file we need the following code, making sure to replace /path/to/Open3D/ with the path where you cloned the Open3D repository when installing.

# Load an ML configuration file
cfg_file = "/path/to/Open3D/build/Open3D-ML/ml3d/configs/randlanet_semantickitti.yml"
cfg = _ml3d.utils.Config.load_from_file(cfg_file)

Next, we will create a RandLANet model using the configuration object, and we will add the paths to the SemanticKITTI dataset as well as to our custom dataset. Make sure to replace /path/to/save/dataset/SemanticKitti/ with the path where you saved the SemanticKITTI data when installing Open3D-ML. For now, the custom dataset is pointing to some of my personal Point Clouds collected with my robot and provided in the repo accompanying this article.

# Load the RandLANet model
model = ml3d.models.RandLANet(**cfg.model)

# Add path to the SemanticKitti dataset and your own custom dataset
cfg.dataset['dataset_path'] = '/path/to/save/dataset/SemanticKitti/'
cfg.dataset['custom_dataset_path'] = './pcds'

The next step is to load the datasets. To load the SementicKITTI dataset, Open3D-ML has convenient helper classes and methods.

# Load the datasets
dataset = ml3d.datasets.SemanticKITTI(cfg.dataset.pop('dataset_path', None), **cfg.dataset)

custom_dataset = load_custom_dataset(cfg.dataset.pop('custom_dataset_path', None))

A simple custom function is added to load the custom dataset. Notice that this dataset has to be in the PCD format.

def load_custom_dataset(dataset_path):
 print("Loading custom dataset")
 pcd_paths = glob.glob(dataset_path+"/*.pcd")
 pcds = []
 for pcd_path in pcd_paths:
  pcds.append(o3d.io.read_point_cloud(pcd_path))
 return pcds

Next, a pipeline is created using the configuration, model, and dataset objects. If not available, the model parameters (checkpoint) are downloaded before being loaded into the pipeline.

# Create the ML pipeline
pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=dataset, device="gpu", **cfg.pipeline)

# Download the weights.
ckpt_folder = "./logs/"
os.makedirs(ckpt_folder, exist_ok=True)
ckpt_path = ckpt_folder + "randlanet_semantickitti_202201071330utc.pth"
randlanet_url = "https://storage.googleapis.com/open3d-releases/model-zoo/randlanet_semantickitti_202201071330utc.pth"

if not os.path.exists(ckpt_path):
    cmd = "wget {} -O {}".format(randlanet_url, ckpt_path)
    os.system(cmd)

# Load the parameters of the model.
pipeline.load_ckpt(ckpt_path=ckpt_path)

To run the model on an unlabeled Point Cloud from the SemanticKITTI test set, we first pick a given data point by its index, then we run the inference action from the pipeline. You can change the value of the variable pc_idx to select another Point Cloud.

# Get one test point cloud from the SemanticKitti dataset
pc_idx = 58 # change the index to get a different point cloud
test_split = dataset.get_split("test")
data = test_split.get_data(pc_idx)

# run inference on a single example.
# returns dict with 'predict_labels' and 'predict_scores'.
result = pipeline.run_inference(data)

A Point Cloud data instance in the SemanticKITTI dataset is loaded as a Python dictionary containing the keys “point”“feat”, and “label”. The last two have None and a Numpy array filled with 0s as values respectively and are not used during inference. The “point” key is associated with a Numpy array containing the x, y, and z coordinates of the LiDAR points. To visualize the result of the inference using the Open3D visualizer, we need to create a Point Cloud object from the “point” part of the dictionary, and then colorize the points with the labels returned by the inference.

# Create a pcd to be visualized 
pcd = o3d.geometry.PointCloud()
xyz = data["point"] # Get the points
pcd.points = o3d.utility.Vector3dVector(xyz)

# Get the color associated with each predicted label
colors = [COLOR_MAP[clr] for clr in list(result['predict_labels'])] 
pcd.colors = o3d.utility.Vector3dVector(colors) # Add color data to the point cloud

# Create visualization
custom_draw_geometry(pcd)

The SemanticKITTI dataset has 19 classes plus a background class. A color mapping from class label to point color has to be provided. For readability, the RGB colors are defined as integers, but the visualizer uses doubles from 0.0 to 1.0 so some code to do the conversion is provided.

# Class colors, RGB values as ints for easy reading
COLOR_MAP = {
    0: (0, 0, 0),
    1: (245, 150, 100),
    2: (245, 230, 100),
    3: (150, 60, 30),
    4: (180, 30, 80),
    5: (255, 0., 0),
    6: (30, 30, 255),
    7: (200, 40, 255),
    8: (90, 30, 150),
    9: (255, 0, 255),
    10: (255, 150, 255),
    11: (75, 0, 75),
    12: (75, 0., 175),
    13: (0, 200, 255),
    14: (50, 120, 255),
    15: (0, 175, 0),
    16: (0, 60, 135),
    17: (80, 240, 150),
    18: (150, 240, 255),
    19: (0, 0, 255),
}

# Convert class colors to doubles from 0 to 1, as expected by the visualizer
for label in COLOR_MAP:
 COLOR_MAP[label] = tuple(val/255 for val in COLOR_MAP[label])

The custom function that draws the Point Cloud with the result of the semantic segmentation is as follows.

def custom_draw_geometry(pcd):
 vis = o3d.visualization.Visualizer()
 vis.create_window()
 vis.get_render_option().point_size = 2.0
 vis.get_render_option().background_color = np.asarray([1.0, 1.0, 1.0])
 vis.add_geometry(pcd)
 vis.run()
 vis.destroy_window()

To run the inference on our private data, we follow a similar process. An index for the desired data point is provided, and a custom function to load and pre-process a PCD file is executed, before passing the resulting dictionary to the pipeline. Then, we colorize and display the segmented Point Cloud.

# Get one test point cloud from the custom dataset
pc_idx = 2 # change the index to get a different point cloud
data, pcd = prepare_point_cloud_for_inference(custom_dataset[pc_idx])

# Run inference
result = pipeline.run_inference(data)

# Colorize the point cloud with predicted labels
colors = [COLOR_MAP[clr] for clr in list(result['predict_labels'])]
pcd.colors = o3d.utility.Vector3dVector(colors)

# Create visualization
custom_draw_geometry(pcd)

The custom function to prepare the data receives a PCD obtained from the list of PCDs, removes non-finite points (nan and +/-inf values), obtains the points data from the PCD, and constructs a dictionary appropriate for the pipeline with it. It then returns the PCD and the dictionary.

def prepare_point_cloud_for_inference(pcd):
 # Remove NaNs and infinity values
 pcd.remove_non_finite_points()
 
 # Extract the xyz points
 xyz = np.asarray(pcd.points)
 
 # Set the points to the correct format for inference
 data = {"point":xyz, 'feat': None, 
 'label':np.zeros((len(xyz),), dtype=np.int32)}
 
 return data, pcd

To run the code and see the results yourself, activate your Conda environment and follow these steps.

Step 1: Clone the repo

git clone https://github.com/carlos-argueta/open3d_experiments.git

Step 2: Run the code

cd open3d_experiments
python3 semantic_torch.py

Step 3: Troubleshooting

If you get the error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_gather)

Open the file /path/to/your/conda-env/lib/python3.9/site-packages/open3d/_ml3d/torch/modules/losses/semseg_loss.py, making sure to replace /path/to/your/conda-env with the path of your Conda environment and python3.9 with your version of Python.

semseg_loss.py code before the fix

Next, find line 9 and add .to(device) at the end of it.

semseg_loss.py code after the fix

Close and save the file, and that should fix the problem.

If you get the error: ModuleNotFoundError: No module named ‘tensorboard’ then run:

pip install tensorboard

Step 4: Enjoy!

Semantic Segmentation on SemanticKITTI and private data.
Selecting other Point Clouds from SemanticKITTI and private dataset.

Installing Open3D-ML for 3D Computer Vision with PyTorch

3D Semantic Segmentation with Open3D-ML using PyTorch backend.

In a previous post, I introduced my reasons to test Open3D-ML and the steps to install it with TensorFlow as the backend. In this post, I go over the steps to install the same library with PyTorch as the backend. Many of the steps are similar but there are some important differences. I hope this post is helpful for people interested in testing Open3D-ML.

To install Open3D-ML with PyTorch, follow the steps below. Note that my system is Ubuntu 20.04.4 LTS and I have a Cuda-enabled GPU, therefore, the instructions here presented may vary depending on your system.

Step 1: Install Conda

Using Conda is the recommended way to try anything new without risking breaking your system. To install Conda follow the official steps here.

Step 2: Create and activate a Conda environment

Make sure to replace myenv with the actual name that you want to use.

conda create --name myenv
conda activate myenv

Step 3: Install Node.js and Yarn

To install Node.js and Yarn you can follow the steps below:

curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo npm install -g yarn

Step 4: Install PyTorch with GPU support

conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge

Step 5: Install the cuDNN library

conda install -c anaconda cudnn

Step 6: Test PyTorch installation

To test the installation, run the following Python code. If the output is True, then all is working fine.

import torch
torch.cuda.is_available()

Step 7: Install Jupyter Lab

conda install -c conda-forge jupyterlab

Step 8: Clone Open3D

git clone https://github.com/isl-org/Open3D

Step 9: Install dependencies

cd Open3d
./util/install_deps_ubuntu.sh

Step 10: Create the build directory and clone Open3D-ML

mkdir build
cd build
git clone https://github.com/isl-org/Open3D-ML.git

Step 11: Configure the installation

This is assuming you have a Cuda-enabled GPU. Make sure to replace /path/to/your/conda-env/bin/python with the correct path to your Python. You can get this path by typing which python in a terminal. Also do not forget the two dots at the end of the command.

cmake -DBUILD_CUDA_MODULE=ON -DGLIBCXX_USE_CXX11_ABI=OFF -DBUILD_PYTORCH_OPS=ON -DBUILD_CUDA_MODULE=ON -DBUNDLE_OPEN3D_ML=ON -DOPEN3D_ML_ROOT=Open3D-ML -DBUILD_JUPYTER_EXTENSION:BOOL=ON -DBUILD_WEBRTC=ON -DPython3_ROOT=/path/to/your/conda-env/bin/python ..

If you get the following error:

CMake Error at /path/to/your/conda-env/lib/python/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
libraries. Please set the proper cuDNN prefixes and / or install cuDNN.

Open the CMakeCache.txt file inside the build directory and edit the following lines to point to the cuDNN library. Make sure to replace /path/to/your/conda-env/ with the path to your Conda environment.

//Folder containing NVIDIA cuDNN header files
CUDNN_INCLUDE_DIR:FILEPATH=/path/to/your/conda-env/include//Path to a file.
CUDNN_INCLUDE_PATH:PATH=/path/to/your/conda-env/include//Path to the cudnn library file (e.g., libcudnn.so)
CUDNN_LIBRARY:PATH=FILEPATH=/path/to/your/conda-env/lib/libcudnn.so//Path to a library.
CUDNN_LIBRARY_PATH:FILEPATH=/path/to/your/conda-env/lib/libcudnn.so

Then run this again:

cmake -DBUILD_CUDA_MODULE=ON -DGLIBCXX_USE_CXX11_ABI=OFF -DBUILD_PYTORCH_OPS=ON -DBUILD_CUDA_MODULE=ON -DBUNDLE_OPEN3D_ML=ON -DOPEN3D_ML_ROOT=Open3D-ML -DBUILD_JUPYTER_EXTENSION:BOOL=ON -DBUILD_WEBRTC=ON -DPython3_ROOT=/path/to/your/conda-env/bin/python ..

Step 12: Build the library

make -j$(nproc)

Step 13: Install as a Python package

make install-pip-package

If you get ModuleNotFoundError: No module named ‘yapf’

pip install yapf

If you get ModuleNotFoundError: No module named ‘jupyter_packaging’

pip install jupyter-packaging

If you get ModuleNotFoundError: No module named ‘ipywidgets’

pip install ipywidgets

Then try installing again.

make install-pip-package

Step 14: Test Open3D installation

python -c "import open3d"

Step 15: Downloading and preparing a dataset

In this step, we will be downloading the SemanticKITTI dataset. This dataset is over 80 GB so make sure to have plenty of space and time. The following steps will download and prepare the dataset. Make sure to replace /path/to/save/dataset with the desired path.

cd Open3D-ML/scripts/download_datasets
./download_semantickitti.sh /path/to/save/dataset

Step 16: Loading and visualizing the dataset

In order to visualize the SemanticKITTI dataset, save the following Python code in a file and run it. Remember to replace /path/to/save/dataset/ with the path where the SemanticKITTI dataset was saved.

import open3d.ml.torch as ml3d  

# construct a dataset by specifying dataset_path
dataset = ml3d.datasets.SemanticKITTI(dataset_path='/path/to/save/dataset/')

# get the 'all' split that combines training, validation and test set
all_split = dataset.get_split('all')

# print the attributes of the first datum
print(all_split.get_attr(0))

# print the shape of the first point cloud
print(all_split.get_data(0)['point'].shape)

# show the first 340 frames using the visualizer
vis = ml3d.vis.Visualizer()
vis.visualize_dataset(dataset, 'all', indices=range(340))

Right when you run the Python script, a visualizer opens and loads the first 340 data frames. You can change the number of frames loaded in the code. Once opened, you can explore the point clouds based on intensity, but the most interesting part is to explore the point clouds based on the semantic label of each point. The videos below show two examples.

In the first video, you can see how by selecting multiple frames you can play them as an animation. Make sure to select labels as the data type from the presented options.

Viewing some frames with their semantic labels as an animation with Open3D-ML

The second video shows how you can select a given frame and inspect the semantic objects present by activating and deactivating certain labels. When certain colors are too light and difficult to see, you can change the color to improve visibility.

Inspecting the semantic objects present in a frame with Open3D-ML

And that’s it. Open3D-ML is a great tool for visualizing point cloud datasets. It can be used with both TensorFlow and PyTorch as the backend. This post explained how to install it with PyTorch.

The next step is to study the datasets to see how they were labeled. Then, I will go over training/testing 3D models with Open3D. Hopefully, this will bring me closer to performing the same operations with my custom data.

Selecting a Dataset for a Natural Language Processing Paper

The following article describes the process I followed to discover and download a new text dataset. Often researchers have to go through this process and I hope that with this article I can make someone’s life easier.

A while ago I was working on an area of Natural Language Processing named Sentiment and Emotion analysis, which in simple words aims to determine the sentiment (negative, positive, neutral), or emotion (happy, sad, etc.) expressed in a text. After a few years of not working on NLP, I find myself doing part-time research on Sarcasm Detection, which is closely related to my previous work.

The main goal of this research is to develop a sarcasm detection algorithm that can compare or surpass the state-of-the-art (SOTA), and publish a paper about it. To achieve this, the shortest path is to find a dataset widely used by many researchers in their latest work and develop algorithms using it. The best way to find such a dataset is through the Papers With Code academic website.

To find both the SOTA and a dataset for Sarcasm Detection, I visited that awesome website, searched for “Sarcasm”, and checked the date of publication of the papers returned. I focused on papers published in and after 2020. I was lucky to discover a 2020 paper introducing a completely new and “unbiased” dataset for Sarcasm Detection named iSarcasm, and later a series of 2022 papers competing on Task 6 (Sarcasm Detection) of The 16th International Workshop on Semantic Evaluation SemEval-2022. The best part was that the dataset used during SemEval-2022 Task 6 was an extended version of iSarcasm called iSarcasmEval.

The rest of this article will describe the steps taken to download and prepare the datasets, which are composed of Twitter data. Oftentimes when working with Twitter data, researchers have to follow similar steps, and therefore I hope this article is helpful to someone.

The iSarcasm Dataset

The first dataset I processed was the one introduced in 2020 by the paper titled iSarcasm: A Dataset of Intended Sarcasm. The idea was to present a dataset of tweets labeled by the authors themselves as opposed to labeled by a third party, which often introduced bias or labeling errors.

We show the limitations of previous labelling methods in capturing intended sarcasm and introduce the iSarcasm dataset of tweets labeled for sarcasm directly by their authors.

From the Papers With Code page, you can access the GitHub repository that contains the data. The repo contains two CSV files with the ID of the tweets, a sarcasm_label column with two possible values (sarcastic and not_sarcastic), and a sarcasm_type that identifies the specific type of sarcasm expressed. Why the ID of the tweets and not the texts themselves? Researchers often do this for privacy reasons and to respect some of Twitter’s guidelines. Having only an ID implies two things, first that we need to use a Twitter crawler to obtain the actual texts, and second, we may not be able to retrieve all texts as some tweets might have been deleted by their authors.

A quick view of the iSarcasm dataset

To get this dataset ready, perform the following steps.

Step 1: Clone the repo

git clone https://github.com/silviu-oprea/iSarcasm.git

Step 2: Generate query for Tweets Lookup endpoint

The Twitter API allows developers to do many things within the Twitter platform. We are particularly interested in retrieving tweets based on their IDs. For that, we can use the Tweets Lookup endpoint from the API. This tool allows us to lookup up to 100 tweets per call. We, therefore, need a simple piece of Python code that helps us group all the IDs of the dataset into query strings of 100 IDs each.

Python script to group tweet IDs into query strings of 100 IDs each

import pandas as pd
import math

# Read the CSV file with Pandas
data = pd.read_csv("iSarcasm/isarcasm_train.csv")

# Extract the column containing the ids and convert it to a list
tweet_ids = data.loc[:, "tweet_id"].tolist()

# Create a file that will contain the queries
file = open("isarcasm_queries_train.txt", "w")
# Go over all of the ids and create query strings with groups of 100 ids
query = ""
for idx, tweet_id in enumerate(tweet_ids):
    query += str(tweet_id) + ","
    
    if (idx + 1 ) % 100 == 0:
        query = query[:-1] + "\n"
        file.write(query)
        query = ""

# Do not forget the last line
query = query[:-1] + "\n"
file.write(query)

# Close the file
file.close()

Make sure to have this script next to the folder containing the dataset. Replace “iSarcasm/isarcasm_train.csv” with “iSarcasm/isarcasm_test.csv” and “isarcasm_queries_train.txt” with “isarcasm_queries_test.txt” to process the test file in the same way.

The queries file with 100 IDs per line

Step 3: Get the tweets with the Tweets Lookup endpoint

We can now retrieve the tweets from Twitter using the Tweets Lookup endpoint. To do so, you first need to fulfill a few requirements, like having a developer account and creating a project. These easy-to-follow steps are fully described in this quick-start document.

Prerequisites

To complete this guide, you will need to have a set of keys and tokens to authenticate your request. You can generate these keys and tokens by following these steps:

Sign up for a developer account and receive approval.

Create a Project and an associated developer App in the developer portal.

Navigate to your App’s “Keys and tokens” page to generate the required credentials. Make sure to save all credentials in a secure location.

Assuming you have met all of the requirements, we can retrieve the tweets with a simple Python script. The script will query the Twitter API and obtain 100 results at a time, some of which will contain a tweet and others an error (the tweet has been deleted or became private). In the end, all the retrieved tweets are matched with their labels in the dataset and saved in a new text file, with one tweet and label, tab-separated, per line.

Python script to retrieve tweets and save them with their corresponding labels

import os
import pandas as pd

import requests
import json

# Get this token from your developper portal
os.environ["BEARER_TOKEN"] = '<your_bearer_token>'


# To set your enviornment variables in your terminal run the following line:
# export 'BEARER_TOKEN'='<your_bearer_token>'
bearer_token = os.environ.get("BEARER_TOKEN")


'''
This function creates the url for our GET /tweets request.
Arg: query - the string with 100 comma-separated ids to retrieve
Return: a valid url for the request
'''
def create_url(query):
    tweet_fields = ""
    ids = "ids="+query
    # You can adjust ids to include a single Tweets.
    # Or you can add to up to 100 comma-separated IDs
    url = "https://api.twitter.com/2/tweets?{}&{}".format(ids, tweet_fields)
    return url


'''
This is a method required by bearer token authentication.
Arg: r - a request
Return: a valid request
'''
def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2TweetLookupPython"
    return r


'''
This method connects to the endpoint and tries to retrieve the tweets
Arg: url - a valid GET /tweets url
Return: a json response, hopefully with the tweets
'''
def connect_to_endpoint(url):
    response = requests.request("GET", url, auth=bearer_oauth)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()

'''
This method connects to the endpoint multiple times to retrieve all of the tweets 
Arg: queries_path - the path of the file containing all of the queries (comma-separated tweet ids)
Return: ids - a list of the ids of the retrieved tweets 
Return: tweets - a list of texts, the actual tweets 
Return: errors - a list of errors, one per each unavailable tweet 
'''
def get_tweets(queries_path):
    
    # Open the queries file
    file = open(queries_path, "r")

    # Empty lists to accumulate the results
    tweets = []
    ids = []
    errors = []

    # Go over every query string
    for line in file:
        
        # Create the GET request with the current query
        line = line.strip()
        url = create_url(line)
        # Retrieve the tweets
        json_response = connect_to_endpoint(url)
        
        # Accumulate the tweets
        if "data" in json_response:
            for tweet in json_response["data"]:
                
                tweets.append( tweet["text"])
                ids.append(int(tweet["id"]))
                
        # Accumulate the errors     
        if "errors" in json_response:
            for error in json_response["errors"]:
                errors.append(error["title"])
                
    return ids, tweets, errors
    

ids, tweets, errors = get_tweets("isarcasm_queries_train.txt")

print("Retrieved tweets:", len(tweets))
print("Missing tweets:", len(errors))

# Load the original dataset from the CSV file and match ids to labels
data = pd.read_csv("iSarcasm/isarcasm_train.csv")
labels = data.loc[data['tweet_id'].isin(ids), "sarcasm_label"].tolist()

# Save the tweets and their matching labels into a new tab-separated file
file = open("isarcasm_train.txt", "w")
for tweet, label in zip(tweets, labels):
    tweet = " ".join(tweet.split("\n"))
    file.write(tweet + "\t" + label + "\n")
file.close()

Before running the above script, make sure to replace the <your_bearer_token> with your actual token. Also to process the test dataset, change isarcasm_queries_train.txt to isarcasm_queries_test.txtiSarcasm/isarcasm_train.csv to iSarcasm/isarcasm_test.csv, and isarcasm_train.txt to isarcasm_test.txt.

The retrieved tweets dataset, each line has a tweet and its label separated by a tab

The iSarcasmEval Dataset

The second dataset, which I consider an extension of the first, was introduced in the paper describing the Task 6 of The 16th International Workshop on Semantic Evaluation SemEval-2022. Luckily, this dataset already provided the text of a tweet and not just its ID. Apart from the column determining whether the tweet is sarcastic or not, it has other columns for the different types of sarcasm.

Viewing parts of the iSarcasmEval dataset with pandas

The information is the same as the previous dataset but in a different format. Like with the previous data, I was only interested in knowing if the tweet is sarcastic or not. The following steps helped me process this dataset and format it just like the previous one.

Step 1: Clone the repo

git clone https://github.com/iabufarha/iSarcasmEval.git

Step 2: Reformat the dataset

The following script reformats the dataset to match the format of the previous one (iSarcasm). I use “sarcastic” and “not_sarcastic” as my labels, feel free to change them to whatever you like.

Python script to reformat the iSarcasmEval dataset

import pandas as pd
import math

# Load the CSV file with Pandas
train_data = pd.read_csv("iSarcasmEval/train/train.En.csv")

# Get only the columns with the text and the sarcastic label
tweets_and_labels = train_data.loc[:, ["tweet", "sarcastic"]].values.tolist()

# Create a new file to save the dataset
file = open("isarcasmeval_train.txt", "w")

# For every tweet-label pair
for count, (tweet, label) in enumerate(tweets_and_labels):
	# If the label is 1, then save the tweet with "sarcastic" label, otherwise use "not_sarcastic"
    if label == 1:
        label = "sarcastic"
    else:
        label = "not_sarcastic"

    # Some rows had no tweet and Pandas interpreted the empty cell as a nan.
    # The code below skips those 
    if not isinstance(tweet, str) and math.isnan(tweet):
        #print(count, tweet)
        continue
    
    # This line removes new lines in the tweet    
    tweet = " ".join(tweet.split("\n"))

    file.write(tweet + "\t" + label + "\n")
file.close()

Finally, to process the test set change “iSarcasmEval/train/train.En.csv” to “iSarcasmEval/test/task_A_En_test.csv” and “isarcasmeval_train.txt” to “isarcasmeval_test.txt”.

The reformated iSarcasmEval dataset where each line has a tweet and its label separated by a tab

Next Step: Do your Machine Learning!

After having retrieved and formatted your data, you are ready to do some cool sarcasm detection. Optionally, you may want to save the tweets in two different files, one for sarcastic and one for non-sarcastic tweets, avoiding having to include the labels inside the files. To do this you just need to make some simple changes to the scripts above. Have fun!

Testing Open3D-ML for 3D Object Detection and Segmentation

Point clouds with semantic labels from the Semantic KITTI dataset.

When starting out new research, my approach is usually to test different related things until enough experience allows me to begin connecting the dots. Before I could start building custom models for 3D object detection, I acquired a LiDAR and played around with some data. One next obvious step was to find out how the research world was labeling such data before I could label my own.

There are some very popular point clouds datasets for autonomous driving out there, with the most popular being the KITTI datasetNuScenesWaymo Open Dataset among others. I spent some time studying the KITTI dataset a while ago, and in general, noticed how hard it was to find the right tools to visualize the data. That was until I discovered Open3D, which made it simple for me to process and visualize point clouds. Open3D can be optionally bundled with Open3D-ML, which includes tools to visualize annotated point cloud data, and train/build/test 3D machine learning models (more on that in a future post).

Visualizing bounding boxes with Open3D. Image by Open3D via https://github.com/isl-org/Open3D-ML

The Open3D-ML GitHub page provides easy instructions to install the library with pip, but this only works with specific versions of CUDA and TensorFlow. Because I wanted to use the newer versions of such libraries, I decided to build Open3D from source. When doing this, I noticed that some steps were missing or were not clear enough. To simplify the life of anyone interested in building this library, I include below the steps that I followed to install and test Open3D-ML. Note that my system is Ubuntu 20.04.4 LTS and I have a Cuda-enabled GPU, therefore, the instructions here presented may vary depending on your system.

Step 1: Install Conda

Using Conda is the recommended way to try anything new without risking breaking your system. To install Conda follow the official steps here.

Step 2: Create and activate a Conda environment

Make sure to replace myenv with the actual name that you want to use.

conda create --name myenv
conda activate myenv

Step 3: Install Node.js

To install Node.js you can follow the steps below:

curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo npm install -g yarn

Step 4: Install TensorFlow

To install TensorFlow follow the official steps here.

Step 5: Install Jupyter Lab

conda install -c conda-forge jupyterlab

Step 6: Clone Open3D

git clone https://github.com/isl-org/Open3D

Step 7: Install dependencies

cd Open3d
./util/install_deps_ubuntu.sh

Step 8: Create the build directory and clone Open3D-ML

mkdir build
cd build
git clone https://github.com/isl-org/Open3D-ML.git

Step 9: Configure the installation

This is assuming you have a Cuda-enabled GPU. Make sure to replace /path/to/your/conda/env/bin/python with the correct path to your Python. Also do not forget the two dots at the end of the command.

cmake -DBUILD_CUDA_MODULE=ON -DGLIBCXX_USE_CXX11_ABI=ON -DBUILD_TENSORFLOW_OPS=ON -DBUNDLE_OPEN3D_ML=ON -DOPEN3D_ML_ROOT=Open3D-ML -DBUILD_JUPYTER_EXTENSION:BOOL=ON -DBUILD_WEBRTC=ON -DPython3_ROOT=/path/to/your/conda/env/bin/python ..

Step 10: Build the library

make -j$(nproc)

Step 11: Install as Python package

make install-pip-package

Step 12: Test Open3D installation

python -c "import open3d"

Step 13: Test Open3D-ML with TensorFlow installation

python -c "import open3d.ml.tf as ml3d"

Step 14: Downloading and preparing a dataset

In this step, we will be downloading the SemanticKITTI dataset. This dataset is over 80 GB so make sure to have plenty of space and time. The following steps will download and prepare the dataset. Make sure to replace /path/to/save/dataset with the desired path.

cd Open3D-ML/scripts/
./download_semantickitti.sh /path/to/save/dataset

Step 15: Loading and visualizing the dataset

In order to visualize the SemanticKITTI dataset, save the following Python code in a file and run it. Remember to replace /path/to/save/dataset/ with the path where the SemanticKITTI dataset was saved.

import open3d.ml.tf as ml3d

# construct a dataset by specifying dataset_path
dataset = ml3d.datasets.SemanticKITTI(dataset_path='/path/to/save/dataset/SemanticKitti/')

# get the 'all' split that combines training, validation and test set
all_split = dataset.get_split('all')

# print the attributes of the first datum
print(all_split.get_attr(0))

# print the shape of the first point cloud
print(all_split.get_data(0)['point'].shape)

# show the first 100 frames using the visualizer
vis = ml3d.vis.Visualizer()
vis.visualize_dataset(dataset, 'all', indices=range(340))

Right when you run the Python script, a visualizer opens and loads the first 340 data frames. You can change the number of frames loaded in the code. Once opened, you can explore the point clouds based on intensity, but the most interesting part is to explore the point clouds based on the semantic label of each point. The videos below show two examples.

In the first video, you can see how by selecting multiple frames you can play them as an animation. Make sure to select labels as the data type from the presented options.

Viewing some frames with their semantic labels as an animation with Open3D-ML

The second video shows how you can select a given frame and inspect the semantic objects present by activating and de-activating certain labels. When certain colors are too light and difficult to see, you can change the color to improve visibility.

Inspecting the semantic objects present in a frame with Open3D-ML

Step 16: Troubleshooting

When performing the steps above, I encountered the following exceptions. Fixing them was easy, in case you find them as well.

If you get ModuleNotFoundError: No module named ‘yapf’

pip install yapf

If you get ModuleNotFoundError: No module named ‘jupyter_packaging’

pip install jupyter-packaging

And that’s it. Open3D-ML is a great tool for visualizing point cloud datasets. The next step is to study the datasets to see how they were labeled. Then, I will go over training/testing 3D models with Open3D. Hopefully, this will bring me closer to performing the same operations with my custom data.

Normalizing (Feature Scaling) Point Clouds for Machine Learning

Capturing some point cloud and image data with my robot.

Continuing my work on Machine Learning with point clouds in the realm of autonomous robots, and coming from working with image data, I was faced with the following question: does 3D data need normalization like image data does? The answer is a clear YES (duh!). Normalization, or feature scaling, is an important preprocessing step for many machine learning algorithms. The main benefit is that it encloses all features in a common boundary without losing information. This makes the flow of algorithms like gradient descent smooth and avoids bias toward features with values higher in magnitude.

Take the following image captured by my robot during one of our exploratory trips. The pixels in the image have the following statistics, min. value: 0, max value: 255, mean: 94.170, standard deviation: 74.270. This large spread in the values does not play nicely with machine learning algorithms.

Image captured by the robot while exploring the city

There are multiple ways to scale the pixels of an image. A common one is to enclose all of the values within a range of -1.0 and 1.0. The simple code snippet below achieves just that and changes the values of the above image to obtain the following statistics, min. value: -1.0, max. value: 1.0, mean: -0.261, and standard deviation: 0.583.

def normalize_image(image):
  image = tf.cast(image, dtype=tf.float32)
  image = image/127.5
  image -= 1

  return image

In the case of point clouds, where the data is composed of at least the XYZ coordinates of the points, the range of values can also be large. Take my RoboSense lidar, which has a horizontal resolution of 360°, a vertical resolution of 32°, and a detection distance of about 150 meters, the values each point can take vary widely. My initial question was, what would it mean to normalize this data? After spending some time searching, I found that many researchers do something similar to what I described above for images, which is to enclose the points within values of -1.0 and 1.0. In the case of points in 3D, this is equivalent to scaling down the point cloud to fit within a unit sphere.

So, for the point cloud shown below, which was taken exactly where the image above was taken, the statistics of the points are min. value: -96.804, max. value: 98.091, mean: -0.320, and standard deviation: 11.373.

Point cloud captured by the robot while exploring the city

In order to enclose all points within a unit sphere, the mean values for X, Y, and Z are computed and subtracted from the values of every point, this results in moving the entire point cloud to the origin (X = 0, Y= 0, Z = 0). Then the distances between all points and the origin are computed, and the coordinates of every point are divided by the maximum of such distances, effectively scaling all distances to the range -1.0 and 1.0. The code snipped below achieves this and the following three animations show the result.

def normalize_pc(points):
	centroid = np.mean(points, axis=0)
	points -= centroid
	furthest_distance = np.max(np.sqrt(np.sum(abs(points)**2,axis=-1)))
	points /= furthest_distance

	return points

In the first animation, the distance tool is used to check the distance from where the robot was to some random points. The shown distances are the original unnormalized ones.

Measuring some distances from robot location in an unnormalized point cloud.

The second animation shows the same point cloud after the points have been scaled. Using the distance measurement tool it can be seen that no distance is larger than one meter.

Measuring some distances from robot location in a normalized point cloud.

Finally, the third animation uses the point measurement tool to verify that every coordinate falls within the range [-1.0, 1.0]. The statistics for the scaled point cloud are min. value: -0.931, max. value: 0.985, mean: 0.0, and standard Deviation: 0.111.

Measuring some points in a normalized point cloud.

With this normalization, my point cloud data is ready to play nicely with the deep learning algorithms that I will be using soon.

Creating a Point Cloud Dataset for 3D Deep Learning

For the past two years, I have been working with robots. Earlier this year I stopped focusing on cameras only and decided to start working with LiDARs. So after much research, I settled for a 32 beams RoboSense device.

RoboSense Helios LiDAR

I had to spend some time setting it up, especially creating a suitable mount able to also carry a camera. After some playing around, the LiDAR was finally ready and I declare that I am in love with this kind of data.

Testing the LiDAR at night

The next step for my project was to start developing a system to detect and track objects in 3D using LiDAR point clouds. The applications are multiple but include detecting fixed objects (buildings, traffic signs, etc.) to create 3D maps, as well as detecting moving objects (pedestrians, cars, etc.) to avoid collisions.

Before any of the above-mentioned applications could be developed, I first needed to learn how to efficiently load point cloud data into TensorFlow, the tool that I use for Deep Learning. For now, my dataset consists of 12,200 point cloud-image pairs. The image is used as context to know what the LiDAR was looking at. I also pre-processed all point clouds to only show data approximately within the field of view of the camera, as opposed to the original 360° view.

Filtered point cloud with context image

Trying to load the data into TensorFlow was more challenging than I had expected. First, the point clouds were stored as PCD (Point Cloud Data) files, which is a file format for storing 3D point cloud data. TensorFlow cannot directly work with this type of file, so conversion was needed. Enter, the Open3D library, an easy-to-use tool to manipulate point clouds. Using this tool I could easily load a PCD file and extract the points as NumPy arrays of X, Y, and Z coordinates. Another tool, PyPotree, a point cloud viewer for large datasets was used to visualize and confirm that the points were extracted correctly on Google Colab.

Visualizing a point cloud within Google Colab with PyPotree

So armed with the new tools I uploaded 12,200 PCDs and 12,200 JPGsto my Google Drive and connected it to a Google Colab. I then created some code to load the PCDs, extract the points and put them in a NumPy array, a structure that TensorFlow can easily process. I ran the code confidently and watched in horror how after waiting for several minutes, the Colab complained that it had run out of memory while converting the point clouds. Bad news, as I plan to collect and process a lot more data than I currently have.

Fortunately, this is a common problem when dealing with large datasets, and tools like TensorFlow have the functionality to deal with such situations. The needed solution is the Dataset API, which offers methods to create efficient input pipelines. Quoting the API’s documentation: Dataset usage follows a common pattern:

  1. Create a source dataset from your input data.
  2. Apply dataset transformations to preprocess the data.
  3. Iterate over the dataset and process the elements.

Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

So, in essence, the Dataset API will allow me to create a pipeline and the data will be loaded in parts as the training loop in TensorFlow requests it, avoiding running out of memory. So, I reviewed how to use the API, and created some code to make a data pipeline. Following step 1 of the abovementioned pattern, the code first loaded a list of URLs for all of the PCDs and the images, then in step 2, the PCDs were to be loaded and converted to points in NumPy, and the images loaded and normalized. But here is when I ran into trouble again.

To be efficient, everything in the Dataset API (and all TensorFlow APIs apparently) runs as Tensors in a graph. The Dataset API provides functions to load data from different formats, but there were none for PCDs. After studying different possible solutions, I decided that instead of having my data as multiple PCD and JPEG files and having TensorFlow load them and pre-process them, I would instead pre-process all of the data offline, and pack it in an HDF5 file.

The Hierarchical Data Format version 5 (HDF5), is an open-source file format that supports large, complex, heterogeneous data. I obviously verified that the Dataset API supports this type of file. The main advantage of using this format, apart from playing nicely with TensorFlow, is that I can pack all of my data into one nicely structured large file that I can easily move around. I created a simple Python script to load all of the PCDs, extract the points, and pack them together with their corresponding context file into a nice HDF5 file.

Python script to pack point clouds and images into an HDF5 file

After loading the HDF5 file (approx 18 GB) into my Drive, I went back to Colab and added the corresponding Dataset API code. Essentially, step 1 of the pattern loaded the images and points from the HDF5 file and created the corresponding pairs, step 2 did some random selection of points from the point cloud (I will explain why in a later post), and normalized the images, and step 3 was ready to nicely serve the data upon request.

The final Dataset API code to load and serve data during training

I tried the data pipeline with a very basic training code, and it worked beautifully. No more out-of-memory error. I am not sure if this is the most efficient way to serve my data but it did the trick, and especially, creating the pipeline was a first great exercise in point cloud data manipulation. Next up, training the first TensorFlow model using point clouds.

Simplifying Point Cloud Labeling with Contextual Images and Point Cloud Filtering

Annotating point clouds from multi-line 360° LiDAR is exceedingly difficult. Providing context in the form of camera frames and limiting the point cloud to the Field Of View (FOV) of the camera simplifies things. To achieve this, we first had to replace our old, and not so stable LiDAR mount, with a sturdier one capable of also holding a camera.

With the LiDAR and the camera closer together, the next step was to synchronize and store the data coming from both sensors. The collected data was huge, so a simple Python script was created to allow for the selection of sequences of interest. Once the sequences were visually selected, they were saved in a format that the annotation tool CVAT can understand.

It was noted that although now CVAT provided the camera frames as context to annotate the LiDAR point clouds, the point clouds were still too large (360° horizontally, 32° vertically). It was not easy to know which part of the cloud corresponded to the camera frame (visual context), and many objects were still hard to identify.

To solve the issue, a C++ program using the Point Cloud Library (PCL) was created. PCL’s FrustumCulling filter was used for this purpose. The filter lets the user define a vertical and horizontal Field Of View (FOV), as well as the near and far plane distance. After some testing, the best parameters were defined to approximate the FOV of the camera. The points of the input cloud that fall outside of the approximated FOV are filtered out, and the points of the output closely match what the camera sees. This greatly facilitates the annotation of objects in the point cloud. Watch the above video for more exciting details.