In this new projects we consider the issue of vision-based autonomous driving, i.e., the problem of endowing cars to
self-drive based on streaming videos captured by cameras mounted on them. In such a setting, which closely mimicks how human drivers ‘work’, the car needs to reconstruct
and understand the surrounding environment from the incoming video sequence(s). A crucial task of video
understanding is to recognise and localise (in space and time) different actions or events appearing in the video:
for instance, the vehicle needs to perceive the behaviour of pedestrians by identifying which kind of activities (e.g.,
‘moving’ versus ‘stopping’) they are performing, when and where this is happening.
In the computer vision literature this problem is termed spatio-temporal action localisation or, in short, action detection.
Unlike current human action detection datasets such
as J-HMDB, UCF-101, LIRIS-HARL, DALY or AVA, the Road Event and Activity Detection (READ) dataset we introduce here is specially designed
from the perspective of self-driving cars, and includes spatiotemporal actions performed not just by humans but by all
road users, including cyclists, motor-bikers, drivers of vehicles large and small, and obviously pedestrians.
We strongly believe, a belief back up by clear evidence, that an awareness of all the actions and events taking place,
and their location within the road scene, is essential for inherently safe self-driving cars.