Multimodal-LLMs for Explanations in Automated Driving
Evaluate LLMs for enhancing trustworthiness in automated driving through identification of relevant objects and anticipation of their future behaviour and explanation of the actions taken by the car.
A key challenge in automated driving is to understand the environment and to identify the crucial objects and actors in the scene. These actors have a strong influence on any decision taken, albeit a self-driving car or a human driver. It is therefore critical to understand why a certain action was taken. The goal of this project is to test whether LLMs are capable of identifying the relevant objects in a scene that caused an action, and then explaining them to a human user. We will use state-of-the-art multi-modal LLMs that take both images and text and inputs and apply them to real-world driving datasets. There is some freedom for the student to expand the scope of the study and its experiments. Additional experiments might include the fine-tuning of LLMs for the specific task.
Goal
Design a pipeline that processes DriveLM dataset and feeds it to an implemented MLLMs.
Evaluation will be based on the DriveLM challenge presented in CVPR.
https://github.com/OpenDriveLab/DriveLM/tree/main
https://opendrivelab.com/DriveLM/
Learning outcome
- Automated Driving
- Generative AI (LLM + MLLM)
- MLLM tuning
- Model evaluation & tuning
- Experimental design & execution
- HPC experience
Qualifications
- Programming experience in Python
- Familiarity with Large Language Models
Supervisors
- Nassim Belmecheri
- Helge Spieker
References
- [1] Sima, C., Renz, K., Chitta, K., Chen, L., Zhang, H., Xie, C., Luo, P., Geiger, A., & Li, H. (2023). DriveLM: Driving with Graph Visual Question Answering.