We present a neural network based approach to key frame extraction in the compressed domain. The proposed method is an amalgamation of both the MPEG-7 descriptors namely motion intensity descriptor and spatial activity descriptor. Shot boundary detection and block motion estimation techniques are employed prior to the extraction of the descriptors. The motion intensity (“pace of action”) is obtained using a fuzzy system that classifies the motion intensity into five categories proportional to the intensity. The spatial activity matrix determines the spatial distribution of activity (“active regions”) in a frame. A neural network is used to pick those frames as key frames which have high intensity and maximum spatial activity at the center of the frame. Results are compared against two well-known key frame extraction techniques to demonstrate the advantage and robustness of the proposed approach. Results show that the neural network approach performs much better than selecting first frame of the shot as a key frame and selecting middle frame of the shot as a key frame methods.

Publication Date



Copyright 2003 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in February 2014.

Document Type


Department, Program, or Center

Chester F. Carlson Center for Imaging Science (COS)


RIT – Main Campus