Title: Prediction of fine-grained human activities in videos using pose-based and object-based features
Authors: Ashwini S. Gavali; S.N. Kakarwal
Addresses: CSMSS, Chh. Shahu College of Engineering, Aurangabad, Maharashtra, India ' CSE Department, ICEEM, Aurangabad, India
Abstract: Human activity prediction in videos deals with anticipating the intention of human activity before it is fully observed. Activity prediction becomes more challenging when fine-grained details are to be considered. This paper presents a deep learning-based approach for predicting complex, fine-grained, and long-duration human actions in videos. Along with prediction, our approach also localises human action spatially with bounding boxes. This approach works by considering the sequential nature of the activities in the video. Each high-level activity is represented as a sequence of local actions (low-level activities). Given a partially observed video, local actions are detected and tracked first, and then these local detections are used for predicting future high-level actions. Fine-grained activity involves interactions with different objects, so we used a combination of the human pose feature and the object feature to predict fine-grained activity more accurately. We evaluated results on the publicly available MPPI cooking activity dataset.
Keywords: activity prediction; fine-grained activity; local actions; ResNet-50; YOLO object detection; compact prediction tree; convolutional neural network.
DOI: 10.1504/IJCVR.2025.148211
International Journal of Computational Vision and Robotics, 2025 Vol.15 No.5, pp.624 - 639
Accepted: 25 Nov 2023
Published online: 01 Sep 2025 *