Multimodal-LLMs for Explanations in Automated Driving

Evaluate LLMs for enhancing trustworthiness in automated driving through identification of relevant objects and anticipation of their future behaviour and explanation of the actions taken by the car.

Being a master’s student

List of projects

A key challenge in automated driving is to understand the environment and to identify the crucial objects and actors in the scene. These actors have a strong influence on any decision taken, albeit a self-driving car or a human driver. It is therefore critical to understand why a certain action was taken. The goal of this project is to test whether LLMs are capable of identifying the relevant objects in a scene that caused an action, and then explaining them to a human user. We will use state-of-the-art multi-modal LLMs that take both images and text and inputs and apply them to real-world driving datasets. There is some freedom for the student to expand the scope of the study and its experiments. Additional experiments might include the fine-tuning of LLMs for the specific task.

Goal

Design a pipeline that processes DriveLM dataset and feeds it to an implemented MLLMs.

Evaluation will be based on the DriveLM challenge presented in CVPR.
https://github.com/OpenDriveLab/DriveLM/tree/main
https://opendrivelab.com/DriveLM/

Learning outcome

Automated Driving
Generative AI (LLM + MLLM)
MLLM tuning
Model evaluation & tuning
Experimental design & execution
HPC experience

Qualifications

Programming experience in Python
Familiarity with Large Language Models

Supervisors

Nassim Belmecheri
Helge Spieker

References

[1] Sima, C., Renz, K., Chitta, K., Chen, L., Zhang, H., Xie, C., Luo, P., Geiger, A., & Li, H. (2023). DriveLM: Driving with Graph Visual Question Answering.