A fully implemented audio-visual search engine

The large thesaurus of detectors can be viewed as the core of a dictionary for video. The elements in such a thesaurus, individually or in combination, provide a semantic understanding of video content.

In order to reach this goal of semantic understanding, VIDI-Video will improve on machine learning techniques, visual and audio analysis techniques and interactive search methods The approach is to let the system learn many, mostly weak, semantic detectors instead of modelling a few of them carefully. The combination of many detectors describing different aspects of the video content will render a much richer basis for the semantics than existing methods.

The concrete output of the project will be a system, consisting of a learning part and a runtime system. The learning part will consist of units for video processing, visual analysis, audio analysis, and learning integrated feature detectors. The runtime system applies the learned detectors to incoming video streams after which users can query the system using an interactive ontology-based multimedia user interface.