Current video search engines on the Web are text-based, i.e., they process queries based on the words posted around the videos in Web pages.
Any search method based on such textual 'labels' is bound to have frustrating performances on all but the simplest queries (Fig.1).
More sophisticated video retrieval would be extremely useful in a plethora of scenarios, as diverse as (1) online movie/TV/music/news/sports video databases, (2) private videos in social networks, (3) advertising/instruction videos of products, (4) educational/course videos in online schools/universities, and (5) live surveillance video streams to detect potential crimes and/or dangerous situations for children, the elderly, or the mentally or physically disabled.
Fig.1's example suggests that references to semantic notions such as places and activities may be responsible for bad performance.
We propose to push the boundaries of video search and querying by developing the concepts behind a novel class of search engines, able to automatically extract semantic content from videos (e.g., what happens in the video, who is involved and where), and to use this content for semantic video search and querying.