Article: Sparse landmarks for facial action unit detection using vision transformer and perceiver Journal: International Journal of Computational Science and Engineering (IJCSE) 2024 Vol.27 No.5 pp.607 - 620 Abstract: The ability to accurately detect facial expressions, represented by facial action units (AUs), holds significant implications across diverse fields such as mental health diagnosis, security, and human-computer interaction. Although earlier approaches have made progress, the burgeoning complexity of facial actions demands more nuanced, computationally efficient techniques. This study pioneers the integration of sparse learning with vision transformer (ViT) and perceiver networks, focusing on the most active and descriptive landmarks for AU detection across both controlled (DISFA, BP4D) and in-the-wild (EmotioNet) datasets. Our novel approach, employing active landmark patches instead of the whole face, not only attains state-of-the-art performance but also uncovers insights into the differing attention mechanisms of ViT and perceiver. This fusion of techniques marks a significant advancement in facial analysis, potentially reshaping strategies in noise reduction and patch optimisation, setting a robust foundation for future research in the domain. Inderscience Publishers - linking academia, business and industry through research

Title: Sparse landmarks for facial action unit detection using vision transformer and perceiver

Authors: Duygu Cakir; Gorkem Yilmaz; Nafiz Arica

Addresses: Faculty of Engineering and Natural Sciences, Department of Software Engineering, Bahcesehir University, Istanbul, Turkey ' Faculty of Engineering and Natural Sciences, Department of Computer Engineering, Bahcesehir University, Istanbul, Turkey ' Faculty of Engineering, Department of Information Systems Engineering, Piri Reis University, Istanbul, Turkey

Abstract: The ability to accurately detect facial expressions, represented by facial action units (AUs), holds significant implications across diverse fields such as mental health diagnosis, security, and human-computer interaction. Although earlier approaches have made progress, the burgeoning complexity of facial actions demands more nuanced, computationally efficient techniques. This study pioneers the integration of sparse learning with vision transformer (ViT) and perceiver networks, focusing on the most active and descriptive landmarks for AU detection across both controlled (DISFA, BP4D) and in-the-wild (EmotioNet) datasets. Our novel approach, employing active landmark patches instead of the whole face, not only attains state-of-the-art performance but also uncovers insights into the differing attention mechanisms of ViT and perceiver. This fusion of techniques marks a significant advancement in facial analysis, potentially reshaping strategies in noise reduction and patch optimisation, setting a robust foundation for future research in the domain.

Keywords: action unit detection; sparse learning; vision transformer; perceiver.

DOI: 10.1504/IJCSE.2024.141343

International Journal of Computational Science and Engineering, 2024 Vol.27 No.5, pp.607 - 620

Received: 01 Apr 2023
Accepted: 09 Sep 2023
Published online: 09 Sep 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Sparse landmarks for facial action unit detection using vision transformer and perceiver

Keep up-to-date