Title: Dynamic video summarisation using stacked encoder-decoder architecture with residual learning network

Authors: M. Dhanushree; R. Priya; P. Aruna; R. Bhavani

Addresses: Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Tamil Nadu, India

Abstract: In the past decade, video summarisation has emerged as one of the most challenging research fields in video understanding. Video summarisation is abstracting an original video by extracting the most informative parts or key events. In particular, generic video summarisation is challenging as the key events do not contain specific activities. In such circumstances, extensive spatial features are needed to identify video events. Thus, a stacked encoder-decoder architecture with a residual learning network (SERNet) model is proposed for generating dynamic summaries of generic videos. GoogleNet characteristics are extracted for each frame in the proposed model. After the bi-directional gated recurrent unit encodes video features, the gated recurrent unit decodes them. Both the encoder and decoder architectures leverage residual learning to extract hierarchical dense spatial features to increase video summarisation F-scores. SumMe and TVSum are used for experiments. Experimental results demonstrate that the suggested SERNet model has an F-score of 55.6 and 64.23 for SumMe and TVSum. Comparing the proposed SERNet model against state-of-the-art approaches indicates its robustness.

Keywords: video abstraction; dynamic video summarisation; deep learning; residual learning; skip connections; GoogleNet; long-term memory; gated recurrent unit; stacked encoder; key shot selection; kernel temporal segmentation.

DOI: 10.1504/IJIEI.2024.137702

International Journal of Intelligent Engineering Informatics, 2024 Vol.12 No.1, pp.27 - 59

Received: 02 Sep 2023
Accepted: 29 Dec 2023

Published online: 02 Apr 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article