A fully
implemented audio-visual search engine
The large thesaurus of detectors can be viewed as the core
of a dictionary for video. The elements in such a
thesaurus, individually or in combination, provide a
semantic understanding of video content.
In order to reach this goal of semantic understanding,
VIDI-Video will improve on machine learning techniques,
visual and audio analysis techniques and interactive search
methods The approach is to let the system learn many,
mostly weak, semantic detectors instead of modelling a few
of them carefully. The combination of many detectors
describing different aspects of the video content will
render a much richer basis for the semantics than existing
methods.
The concrete output of the project will be a system,
consisting of a learning part and a runtime system. The
learning part will consist of units for video processing,
visual analysis, audio analysis, and learning integrated
feature detectors. The runtime system applies the learned
detectors to incoming video streams after which users can
query the system using an interactive ontology-based
multimedia user interface.