Research Project: Neural-Symbolic Video Captioning
Current video search engines on the Web are text-based, i.e., they process queries based on the words posted around the videos in Web pages. Any search method based on such textual 'labels' is bound to have frustrating performances on all but the simplest queries (Fig.1).

More sophisticated video retrieval would be extremely useful in a plethora of scenarios, as diverse as (1) online movie/TV/music/news/sports video databases, (2) private videos in social networks, (3) advertising/instruction videos of products, (4) educational/course videos in online schools/universities, and (5) live surveillance video streams to detect potential crimes and/or dangerous situations for children, the elderly, or the mentally or physically disabled.

Fig.1's example suggests that references to semantic notions such as places and activities may be responsible for bad performance.

We propose to push the boundaries of video search and querying by developing the concepts behind a novel class of search engines, able to automatically extract semantic content from videos (e.g., what happens in the video, who is involved and where), and to use this content for semantic video search and querying.
Relevant papers:
  • Daniel Vasile and Thomas Lukasiewicz
    Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks
    In: Panetto H., Debruyne C., Proper H., Ardagna C., Roman D., Meersman R. (eds) On the Move to Meaningful Internet Systems. OTM 2018 Conferences. OTM 2018. Lecture Notes in Computer Science, vol 11230. Springer, Cham
  • Silvio Olivastri, Gurkirt Singh, and Fabio Cuzzolin
    An end-to-end baseline for video captioning
    ArXiv, March 2019
 Lab Member(s): Silvio Olivastri, Ruomei Yan, Thomas Lukasiewicz