This paper is concerned with producing high-level text reports and explanations of human activity in video from a single, static camera. The motivation is to enable surveillance analysts to maintain situational awareness despite the presence of large volumes of data. The scenario we focus on is urban surveillance where the imaged person is medium/low resolution. The final output is text descriptions that not only describe, in human-readable terms, what is happening but also explain the interactions that take place. The input to the reasoning process is the information obtained from video processing methods that provide an abstraction from the image data to qualitative (i.e. human-readable) descriptions of observed human activity. Explanations of global scene activity, particularly where interesting events have occurred, is achieved using an extensible, rule-based method. The complete system represents a general technique for video understanding, which requires a guided training phase by an experienced analyst.
- Visual surveillance