The goal of the project is to couple our current work on deep learning for multiple action tube detection with part-based modelling to create a new deep learning architecture able to represent complex activities composed a number of ‘atomic’ actions (e.g. cooking a meal is made up by ‘opening the fridge’, ‘taking out ingredients from the fridge’, ‘chopping ingredient on the counter’, etc).
The system would need to work in real time. For this we propose a new architecture, termed DL-DPM+, based on an evolution of a recent CNN implementation of part-based models for
object detection. The architecture needs to be implemented, tested and further refined after feedback from empirical evaluation.