Datasets and Code

ROad event Awareness Dataset
for Autonomous Driving
Multi-domain Endoscopic
Surgical Action Dataset
Surgical Action Dataset

March 2021

ROAD - The ROad event Awareness Dataset for Autonomous Driving

We are proud to announce the release of the new ROad event Awareness Dataset for Autonomous Driving (ROAD). The dataset is publicly available on GitHub at

ROAD is the first benchmark of its kind, designed to allow the autonomous vehicles community to investigate the use of semantically meaningful representations of dynamic road scenes to facilitate situation awareness and decision making for autonomoous driving.

ROAD is a multilabel dataset containing 22 long-duration videos (ca 8 minutes each) comprising 122K frames annotated in terms of *road events*, defined as triplets E = (Agent, Action, Location) and represented as ‘tubes’, i.e., a series of frame-wise bounding box detections.

ROAD has the ambition to become the reference benchmark for agent and event detection, intention and trajectory prediction, future events anticipation, modelling of complex road activities, instance- and class-incremental continual learning, machine theory of mind and automated decision making.

An original 3D-RetinaNet baseline model is available at:

Please cite our arXiv preprint when using ROAD in your work.

March 2021

The SARAS-MESAD Multi-domain Endoscopic Surgical Action Dataset

In our SARAS work, we have captured endoscopic video data during radical prostatectomy under two different settings ('domains'): real procedures on real patients, and simplified procedures on artificial anatomies ('phantoms'). As shown in our MIDL 2020 challenge (over real data only), variations due to patient anatomy, surgeon style and so on dramatically reduce the performance of even state-of-the-art detectors compared to nonsurgical benchmark datasets. Videos captured in an artificial setting can provide more data, but are characterised by significant differences in appearance compared to real videos and are subject to variations in the looks of the phantoms over time. Inspired by these all-too-real issues, this challenge's goal is to test the possibility of learning more robust models across domains (e.g. across different procedures which, however, share some types of tools or surgeon actions; or, in the SARAS case, learning from both real and artificial settings whose list of actions overlap, but do not coincide).

The challenge provides two datasets for surgeon action detection: the first dataset (Dataset-R) is composed by 4 annotated videos of real surgeries on human patients, while the second dataset (Dataset-A) contains 6 annotated videos of surgical procedures on artificial human anatomies. All videos capture instances of the same procedure, Robotic Assisted Radical Prostatectomy (RARP), but with some difference in the set of classes. The two datasets share a subset of 10 action classes, while they differ in the remaining classes (because of the requirements of SARAS demonstrators). These two datasets provide a perfect opportunity to explore the possibility of exploiting multi-domain datasets designed for similar objectives to improve performance in each individual task.

Link to full challenge proposal description.
July 2020

The SARAS-ESAD Endoscopic Surgical Action Dataset

Minimally Invasive Surgery (MIS) is a very sensitive medical procedure, whose success depends on the competence of the human surgeons and the degree of effectiveness of their coordination. The SARAS (Smart Autonomous Robotic Assistant Surgeon) EU consortium,, is working towards replacing the assistant surgeon in MIS with two assistive robotic arms. To accomplish that, an artificial intelligence based system is required which not only can understand the complete surgical scene but also detect the actions being performed by the main surgeon. This information can later be used infer the response required from the autonomous assistant surgeon. The correct detection of surgeon action and its localization is a critical task to design the trajectories for the motion of robotic arms. This challenge has recorded four sessions of complete prostatectomy procedure performed by expert surgeons on real patients with prostate cancer. Later, expert AI and medical professions annotated these complete surgical procedures for the actions. Multiple action instances might be present at any point during the procedure (as, e.g., the right arm and the left arm of the da Vinci robot operated by the main surgeon might perform different coordinated actions). Hence, each frame is labeled for multiple actions and these actions can have overlapping bounding boxes.

The bounding boxes, in the training data, are selected to cover both the ‘tool performing the action’ and the ‘organ under the operation’. A set of 21 actions is selected for the challenge after the consultation with the expert medical professionals. From a technical point of view, then, a suitable online surgeon action detection system must be able to: (1) locate and classify multiple action instances in real time; (2) connect the detection associated bounding boxes.

To the best of our knowledge, this challenge presents the first benchmark dataset for action detection in the surgical domain, and paves the way for the introduction, for the first time, of partial/full autonomy in surgical robotics. Within computer vision, other datasets for action detection exist, but are of limited size.

Link to the ESAD Grand Challenge website.