Forthcoming Articles
International Journal of Arts and Technology

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.
Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.
Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.
Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.
Register for our alerting service, which notifies you by email when new issues are published online.
International Journal of Arts and Technology (33 papers in press) Regular Issues
Abstract: In response to the problems of insufficient personalised feedback and poor real-time performance in traditional music teaching, this paper proposes an interactive teaching mode based on wireless sensor networks and multimodal data fusion. By deploying multiple sensors to collect real-time data on students performance movements, audio, and physiological states, the Gaussian Bayesian algorithm is fused and denoised before uploading to the cloud platform. Then, the weighted matrix factorisation algorithm is used to generate personalised error correction content and push it out. Experiments have shown that this mode reduces sensor data transmission by 78.8%, reduces energy consumption to 0.33J, and achieves a push hit rate of 95.2%, forming an efficient interactive loop and providing precise solutions for the digitisation of music teaching. Keywords: Wireless Sensor Network; Multimodal; Data Fusion; Music Interactive Teaching; Gaussian-Bayesian; Personalized Push. DOI: 10.1504/IJART.2026.10077421
Abstract: To tackle the declining precision of sentiment analysis for online educational website reviews via the LDA topic model, I propose the innovative TWBEWC-TFWW-LDA algorithm, integrating emotion word co-occurrence-based theme word bags (TWBEWC) and topic feature word weighting (TFWW). It constructs emotional topic word bags, extracts sentiment-laden topic words via semantic similarity, weights them by significance and distribution, and performs LDA clustering. Experiments show that with 15 emotion topic feature words, its text clustering accuracy, recall and F1 reach 0.812, 0.802 and 0.810 respectively; it also achieves 88%, 96% and 90% accuracy in classifying aversion, surprise and neutrality. This enhanced accuracy refines online education sentiment analysis for college students, optimizing course design and teaching methods. Keywords: Online education; LDA topic model; sentiment classification; sentiment word co-occurrence; feature word weighting. DOI: 10.1504/IJART.2026.10077461
Abstract: This paper proposed generative adversarial network (GAN) with a shared latent space (sLS-GAN) to improve controllability and cultural adaptability in art style transfer. By integrating a variational autoencoder (VAE) with adversarial learning, it constructs a shared latent space that enables high-quality bidirectional translation between greyscale and colour image domains. Latent-space alignment improves realism and semantic coherence, while cycle-consistency regularises forward-backward mappings. A dual up-sampling/dual down-sampling design enhances structural stability across domains. In addition, a residual saliency network strengthens salient-region modelling and improves efficiency, reducing reliance on explicit content-preservation constraints. Experiments on the WikiArt Paintings and SemArt datasets show that sLS-GAN achieves an FID of 106.45 on WikiArt and outperforms representative baselines in Inception Score and PSNR, indicating improved semantic consistency, diversity, and perceptual quality. In greyscale colourisation, sLS-GAN reduces parameters by 89.5% and FLOPs by 87.8% versus conventional models, delivering substantial computational savings. Keywords: Generative Adversarial Network; Grayscale–Color Image Translation; Bidirectional Variational Autoencoder; Cycle Consistency; Residual Saliency Network. DOI: 10.1504/IJART.2026.10077967 Application of Digital Image Media Technology in Film Animation ![]() by Yan Liu, Feng Tang Abstract: The development of media technology in the digital age has enabled many traditional artists to achieve new developments in their creations. The emergence of this expression method has greatly enriched the forms of artistic expression. This paper studies the application of digital image media technology in film and television animation. In the experimental part, it is applied to teaching, and experiments are conducted on several processes of animation production. The experimental results show that its application effect in the post-production link is most obvious. The scores of the experimental class are mainly concentrated in the 80100 points stage. There are 20 students in Class A who have reached 90100 points, 18 students in Class B who have reached 90100 points, and only five students in Class C who have reached 90100 points, and only six students in Class D who have reached 90100 points. It can be seen that the application of digital image media technology has effectively improved the production effect of animation. At the end of the paper, a brief summary of specific technologies and applications is given. Keywords: Film and Television Animation; Digital Image Technology; Post Production; Image Fusion Algorithm. DOI: 10.1504/IJART.2026.10074102 Multimedia Art Data Optimisation by Integrating UO-CRUSH and Q-learning Algorithm VR Technology ![]() by Hongying Song, Xiaohong Wang Abstract: A resource management framework based on virtual reality (VR) technology is proposed to address the limitations and poor presentation effects of traditional multimedia art data flat creation. It integrates the UO-CRUSH algorithm driven by resource interest and the Q-learning algorithm driven by user interest to optimise the storage and scheduling of multimedia art data. The experimental results show that the proposed model has an average computation time of 426 seconds, completes an average of 752 tasks in the maximum cycle, and has a uniform distribution of resource placement groups (137153). In the instance verification, the classification accuracy is not less than 90%, the interactive response speed is 172 ms, and the frame rate of the picture reaches 78fps, which combines good immersion and economy. This model effectively improves the optimisation effect of multimedia art data. Keywords: Multimedia art data; Virtual reality; Scalable replica hashing algorithm; Reinforcement learning algorithm; Resource management. DOI: 10.1504/IJART.2026.10074165 Developing a Community Participation Strategy for the Preservation of Historic Buildings in Shanghai ![]() by Lu Chen Abstract: This study introduces a novel approach that is specifically designed to align with China's unique national circumstances, allowing for the active involvement of the community. We propose a "community participation framework" that incorporates distinctive local features, formulated through an extensive review and synthesis of both domestic and international literature, as well as drawing insights from effective community governance practices in China. The framework seeks to harmonise the interactions between community members, historic structures, and municipal authorities. Within this structure, these stakeholders can create a cohesive and self-sustaining system, mutually benefiting from the preservation efforts of historical architecture. To guarantee ongoing advancements in this initiative, oversight and management practices were implemented, utilising performance evaluation metrics to measure the protective measures' effectiveness. Our findings offer valuable insights for furthering the engagement of social forces in the conservation of historical buildings in China. Keywords: Community Engagement; Conservation of Historic Buildings; Layered Analysis; Framework Development. DOI: 10.1504/IJART.2026.10074166 Quantitative Evaluation of the Effectiveness of Preservation of Modern Urban Residential Buildings in Shanghai (1910-1949) ![]() by Lu Chen Abstract: This study examines modern residential buildings in Shanghai 19101949), analysing the dynamic relationships among architecture, inhabitants, the environment, and society. Addressing the urgent need to conserve Shanghai historical structures, it establishes key evaluation metrics for preservation effectiveness. Employing a quantitative approach validated by a BP neural network model, the analysis demonstrates that current conservation strategies fail to fully harness their potential to drive economic growth, enhance resident quality of life, foster environmental sustainability, or enrich urban cultural heritage. Recognising the intrinsic link between historical preservation and collective well-being encompassing human, environmental, and societal dimensions the research develops optimisation strategies to maximise conservation outcomes. The goal is to significantly elevate the impact of preservation efforts on these buildings, establishing a benchmark for scholarly research, policy formulation, and heritage management in Shanghai and analogous contexts. Keywords: Quantitative Assessment of Conservation Effectiveness; AHP; FCE;BP. DOI: 10.1504/IJART.2026.10074167 Image based Digital Processing Technology and System for Music Signals ![]() by Huan Li Abstract: Traditional music signal digital processing only focuses on the time-frequency characteristics of audio signals, ignoring the image features related to music signals, which leads to limitations in the overall understanding and representation of music signals. By comprehensively utilizing the characteristics of audio and image signals, a more comprehensive and accurate music signal processing method was proposed to comprehensively and accurately understand music emotions. The music emotion data from MediaEval Emotion in Music, MagnaTagATune Dataset, EmoReact Dataset, and DEAM Dataset were selected, preprocessed, and mapped to represent music emotions using Valence and Arousal. The Long Short-Term Memory (LSTM) - Residual Network (ResNet) model was constructed, with the LSTM module used to extract audio features and the ResNet module used to extract image features. In the fusion layer, the features extracted by the LSTM module and ResNet module were fused 1:1. A music recommendation system was constructed based on user historical preferences and recognition results of music emotions. The experimental results showed that the average accuracy of LSTM-ResNet model in music emotion classification on DEAM Dataset was as high as 98.5%. The combination of LSTM and ResNet can enhance the performance of music emotion classification and provide new methods for music recommendation tasks. Keywords: Music Signals; Emotional Classification; Music Recommendations; Music Images; Long Short-Term Memory; Residual Network. DOI: 10.1504/IJART.2026.10074476 Classification and Recognition of Visual Communication Elements using Multimodal Fusion Affective Computing ![]() by Yihan Yang, Xuehang Wu Abstract: This paper aims to study a method for classifying and recognising visual communication elements based on multimodal fusion affective computing technology, in order to improve the accuracy of information transmission and the ability to express emotions. First, this paper utilises Pythons Scrapy library to automatically collect, filter, and preprocess image and text data using open-source computer vision libraries and regular expressions. Then, this paper uses the ResNet model to extract image features and the Transformer-based bidirectional encoder representation (BERT) model to extract text features, and fuses them through an attention mechanism. Finally, this paper uses the support vector machine (SVM) algorithm to classify the features, thus completing the classification and recognition of visual communication elements. Experimental results show that the proposed model performs well in emotion classification and recognition tasks, with high accuracy and stability. Keywords: Multimodal Fusion Affective Computing; Visual Communication Element; Residual Network; Bidirectional Encoder Representations from Transformers; Attention Mechanism. DOI: 10.1504/IJART.2026.10075026 Exploration on High Dynamic Dance Video Keyframe Extraction Based on Clustering Algorithm ![]() by Zhuoying Qi Abstract: The amount of dance videos is also increasing, and how to watch dance videos rapidly and efficiently is already an issue that needs to be solved nowadays The research on an efficient way to watch the key information of dance video is of great help to dance learning and dance posture analysis Based on this, this paper studied the keyframe extraction of high dynamic dance video, and proposed a keyframe extraction method based on K-means clustering algorithm (KMA) Firstly, this paper proposed a method of shot segmentation by using histogram features for edge extraction, and then compared the similarity between video frames Finally, the KMA was used to match the video frames with the nearest cluster, and the dance video keyframes were determined by evaluating the sum of the similarity between the cluster centre and all sample frames of the cluster After putting forward the keyframe extraction method, this paper analysed its extraction effect, and drew the following conclusions through experimental research: compared with the extraction results of the traditional keyframe extraction method, the precision of the extraction results of the improved KMA was 6.5% higher, and the recall rate was 8.8% higher. The keyframe extraction method based on the improved KMA has good results. Keywords: Keyframe Extraction; High Dynamic Dance Video; Clustering Algorithm; K-means Clustering Algorithm. DOI: 10.1504/IJART.2026.10075220 Aesthetic Relevance of Generative Artificial Intelligence ![]() by Umberto Roncoroni Abstract: This article examines the impact of generative artificial intelligence (GAI) on contemporary art, creative practice, and education. Evaluating the benefits and drawbacks of GAI is difficult due to the accelerated development of technology and weak academic relationships among aesthetics and computer science. To elucidate GAI, we propose setting aside metaphysical dilemmas and concentrating on the aesthetic problems of AI such as romantic influences, technocentric approaches to creativity, black boxes, and misunderstanding about the properties of digital media. Through an approach that combines philosophical analysis, computer science, and art-based research, we compare the Dadaist Poem by Tzara with GAI to verify its coherence with contemporary art development. We found that GAI contradicts contemporary art innovations and the aesthetic potential of digital media. The results demonstrate why GAI will jeopardize the development of creative and significant art and proffer a review of theories, methods, and interdisciplinary references. Keywords: Aesthetics; Avant-gardes; Computational Creativity; Contemporary Art; Dadaism; Digital Media; Generative Artificial Intelligence; Interactivity; Postmodernism; Public Art. DOI: 10.1504/IJART.2026.10075314 Design of Three-Dimensional System of Computer Aided Dance Teaching Technology Management ![]() by Hongmei Li Abstract: Traditional dance teaching has problems such as single means, low efficiency, and difficulty in intelligent scheduling of teaching content. To this end, this paper designs a computer-aided three-dimensional dance teaching technology management system, constructs a three-layer system architecture with logic representation layer, business logic layer and database as the core, and integrates embedded communication protocols to achieve stable transmission and response of teaching data between multiple modules; in terms of action recognition algorithm, the system extracts multi-resolution features based on high-resolution network (HRNet), and performs convolution, interpolation and cascade operations between different resolutions through the sequence multi-scale feature fusion model to obtain highly semantic and accurately positioned joint point heat maps, and then constructs a geometric relationship estimation network with a three-layer structure, predicts the position relationship according to the trunk and limb joints, and matches the connection, completing the accurate modelling of dance movements. Experimental results show that the system has the highest accuracy of 95.3% in dance movement recognition tasks and 85.3% in posture estimation. This paper verifies the effectiveness and practical value of the technology management system that combines multi-module system design with multi-scale sequence fusion algorithm in dance teaching. Keywords: Dance Teaching; Technology Management; Embedded System; Feature Fusion Algorithm; Pose Estimation. DOI: 10.1504/IJART.2026.10075649 Modelling of Outdoor Building Facade Automated Image Decoration Design System Based on Point Cloud Semantic Segmentation ![]() by Jing Liu Abstract: This paper aims to solve the problems of insufficient accuracy of point cloud semantic segmentation and poor system scalability in the automated decoration design of building facades, and constructs a three-module system including point cloud semantic segmentation, semantic-driven image decoration design, and result mapping. This paper addresses heterogeneous complex facade components and boundary fuzziness by optimising point cloud semantic segmentation with multi-scale feature fusion and boundary refinement achieving classification accuracy of 0.820.93 mIoU 0.87 and improved structural IoU 0.84 with 4.7-pixel keypoint error reduction. A semantic-constrained decoration module integrates style rules and geometric alignment yielding style matching 8.0 and structural alignment 8.46. The system automates 138 tasks hourly with 92.54% completion and 7.76% manual intervention demonstrating efficient structural-semantic-decoration integration. Keywords: Point Cloud Semantic Segmentation; Building Facade Modeling; Automated Image Decoration Design; Three-dimensional Structure Recognition; Multi-style Adaptability. DOI: 10.1504/IJART.2026.10075668 Two Dimensional Digital Art Animation Synthesis System Based on BP Neural Network ![]() by Zhouzhou Cheng, Xiao Xia Abstract: At present, 2D animation still faces problems such as difficulty in getting rid of the production methods mainly based on manual creation and low creation efficiency. This paper aims to use BP neural network to construct and study a two-dimensional digital art animation synthesis system, in order to improve the production efficiency of two-dimensional animation. This paper first introduces the structure, advantages, and algorithm principles of BP neural network, and then explains the technology of two-dimensional digital art animation production. After constructing a two-dimensional digital art animation synthesis system model, this paper performs performance and image processing of the synthesis system, checks and confirms the advantages and disadvantages of the model, and then provides a rough overview of the artificial system. The results showed that the maximum error of the BP neural network in the experiment was less than 9%, and the average error was reduced to 5%. Keywords: 2D Digital Art Animation ; BP Neural Network ; Poisson Equation; Bezier Curve. DOI: 10.1504/IJART.2026.10075881 Cross-Platform Music Recommendation Method and Innovation Path Based on the Internet of Things and Blockchain ![]() by Jin Ma, Yi Li Abstract: Traditional music recommendation systems have not yet implemented cross platform applications, and the improvement of recommendation effectiveness is limited by the limitations of the recommendation system itself. This paper aims to study how to analyse and investigate cross platform music recommendation methods based on IoT and blockchain technology. This paper tests four recommendation algorithm models for different recommendation list lengths. Experimental data shows that when the recommendation list length is 60, the accuracy, recall, and F1 score of the hybrid recommendation algorithm model are 62.14%, 50.64%, and 0.559, respectively, which are superior to the other three recommendation algorithms. In addition, when the number of users using the system is 60, the security of the system is 97.80%. A series of data proves that the cross platform music recommendation system based on the Internet of Things and blockchain designed in this paper is feasible and worthy of further promotion and application. Keywords: Music Recommendation Method; Internet of Things; Blockchain Technology; Music Platform. DOI: 10.1504/IJART.2027.10076014 Integrating Actor-Network Theory and Speculative Design: Exploring Innovations in HCI Education from a More-than-Human Perspective ![]() by Jiawei Li, Zhiyong Fu, Jiayue Wang, Lin Zhu, Jiaxuan Xu Abstract: This study explores integrating Actor-Network Theory (ANT) and speculative design into Human-Computer Interaction (HCI) education to cultivate students' More-than-Human Design capabilities. Traditional human-centered approaches are insufficient for complex socio-technical challenges; students need new frameworks to understand human and non-human actor interactions. ANT analyzes HCI systems, emphasizing human-technology co-construction while recognizing human agency. Speculative design provides innovative methods for exploring HCI possibilities through provocative artifacts and narratives. A workshop guided students in applying ANT and speculative design to analyze More-than-Human systems, comparing human participation and AI-assisted generation results. This demonstrated AI's potential to enrich HCI education. The research offers a comprehensive framework combining theory and practice, fostering students' critical thinking and innovative awareness for envisioning future human-computer interaction designs. Keywords: Human-Computer Interaction education; More-than-Human Design; Actor-Network Theory; Speculative Design; Design pedagogy. DOI: 10.1504/IJART.2027.10076214 Personalised News Recommendation System Based On Computer Artificial Intelligence Technology ![]() by Qiang Wang Abstract: In this paper, intelligent recommendation, an artificial intelligence algorithm, was incorporated into the design of digital media. Collaborative filtering algorithm was used to design a personalised news recommendation system. According to the current user's historical reading records, praise and sharing behaviour, collaborative filtering algorithm was used to recommend interesting news reports for users. Specific types of news were pushed to users based on algorithmic recommendations and were available to users through personalised news display pages. According to the validation of experimental data results, after algorithm processing, the existing calculated similarity was 0.85638. The top few target objects with the highest similarity were selected and pushed to users. Finally, data and opinions were collected through a survey questionnaire. The research results showed that integrating artificial intelligence technology into digital media can improve the design, innovation, and user experience of digital media, and provide users with more intelligent and personalised services. Keywords: Collaborative Filtering; Digital Media; Personalize News; Intellectualized System; Artificial Intelligence. DOI: 10.1504/IJART.2027.10076522 Analysis of Brushstrokes during the Creation of a Painting of a Peach ![]() by Otoniel Igno-Rosario, Claudia Hernández-Aguilar, Luis Manuel Hernández-Simón, Flavio Arturo Domínguez-Pacheco, Jose Alberto Medina-Pérez Abstract: This study presents a systematic video-based analysis of brushstrokes during the creation of a peach painting. Using a dual-camera setup that simultaneously captured top and side views, we recorded 23 preliminary strokes and the complete sequence of 281 strokes that formed the final artwork. Each stroke was processed frame by frame to extract spatial-temporal features, including three-dimensional coordinates, orientation angles, and stroke velocity. Our research was motivated by the limited availability of annotated data on brushstrokes and the need for interpretability of human stroke dynamics. The study is presented as a preliminary case painting that generates evidence-based data to support further research in the field of computational art and robotic painting. Code is available at https://github.com/oton-lab/brushstrokes. Keywords: brushstroke analysis; video processing; artistic painting; computational creativity. DOI: 10.1504/IJART.2027.10076644 Knowledge Mapping and Mechanistic Insights in Emotional Digital Design: An Empirical Study Combining Clustering and Thematic Analysis ![]() by Lan Ma, Yiyuan Ding, Chenxi Dong, Tang Liu, Zhenyu Li, Wenlei Mao, Tianyi Liu, Fernando Jorge Matias Sanches Oliveir, Xue Zhu, Lianfa Xu, Guangyi Tang, Feng Sha Abstract: This study maps the knowledge structure of emotional digital design using bibliometric co-word clustering and three-round thematic analysis of 3,817 Web of Science records. Six major research clusters were identified: design and technology, health and care, education and learning, mental health and emotion regulation, workplace studies, and conversational agents, revealing their evolution over time. Thematic analysis highlights research hotspots and cross-disciplinary trends that guide the fields development. By integrating quantitative mapping with qualitative interpretation, the study provides both a structural overview and detailed thematic insights. This dual-method approach enables researchers and practitioners to better understand the fields current state and future directions. It also addresses the limitations of traditional keyword co-occurrence analysis its lack of semantic depth and mechanistic insight by combining bibliometric clustering with multi-round thematic analysis, delivering a structural map and explanatory understanding of emotional digital design. Keywords: Emotional Digital Design; Affective Design Environments; Emotion-Centered Interaction; Experiential Design; User Engagement. DOI: 10.1504/IJART.2027.10076828 Application of Deep Learning Algorithms in the Transfer of Ethnic and Folk Art Design Styles in Cultural and Creative Products ![]() by Wanli Gu Abstract: This paper aims to address the issue of similar forms but different spirits in the process of transferring ethnic folk art styles to cultural and creative products. This paper proposes a deep transfer framework with cultural semantic guidance and multi-scale style decoupling. Firstly, this paper constructs annotation data covering typical artistic styles of various ethnic groups, and designs a semantic embedding module to encode intangible attributes such as pattern symbolism, colour taboos, and composition paradigms as style guidance signals. Secondly, this paper designs a dual path feature decoupling network in the content encoder to maintain the structural semantics of the product carrier. The experimental results show that the method proposed in this paper achieves FID accuracy of 27.3 Keywords: Ethnic Folk Art; Deep Learning; Style Transfer; Cultural and Creative Product Design; Cultural Semantic Embedding. DOI: 10.1504/IJART.2027.10076929 Intelligent Data Processing and Visual Design: Big Data Computing Promotes Innovation in Graphic Communication ![]() by Xinchun Wang, Dichen Li, Haolin Xiong Abstract: Traditional visual design is often limited to static presentation, lacking dynamic interaction and intelligent analysis, making it difficult to fully showcase the complexity and changing trends of data. This paper combines the language image pre training (CLIP) model and generative adversarial network (GAN) to extract semantic information from text descriptions, generate high-quality graphic designs, and achieve semantic based visualisation of creative data. This paper uses the CLIP model to analyse user provided text descriptions, extract semantic features of the data, and convert them into high-dimensional feature vectors. These features are then input into a GAN generator, which generates high-quality graphs that conform to the semantic description through generative adversarial mechanisms. The results showed that the SSIM and PSNR values of the generated images were 0.95 and 33.5 dB, respectively, and the frame rate and response time of dynamic interaction were 45 FPS and 32 ms, respectively. Keywords: Intelligent Data Processing; Visual Design; Big Data Computing; Graphical Communication; Contrastive Language-Image Pretraining. DOI: 10.1504/IJART.2027.10077157 Analysis on The Construction Strategy of Intelligent Music Teaching Classroom Based on Emotional Education ![]() by Xinyu Du Abstract: This paper proposes and validates a human-computer interaction teaching scheme for emotional education, which solves the problem of teacher centred and monotonous interactive activities in middle school music classrooms. This paper combines Bayesian skin colour modelling, elliptical contour fitting, GMM tracking, and a hybrid method of multi class SVM to design and implement a gesture recognition system suitable for teaching scenarios. This paper conducted experiments at school A, including questionnaire interviews (600 students, 6 teachers) and system validation (approximately 2,400 gesture samples), to evaluate the current status and method performance. The main results showed that the proposed gesture recognition method achieved excellent accuracy on teaching terminals with a testing accuracy of 92.5%, while maintaining low latency (about 25 milliseconds) on resource limited devices, achieving a good balance between accuracy and real-time performance. Keywords: Interactive Teaching in Music Classroom; Gesture Recognition Algorithm; Human-computer Interaction; Skin Color Feature Extraction. DOI: 10.1504/IJART.2028.10077158 WayangFusionNet: Multi-Scale Cross-Modal Transformer for the Sustainability of Wayang Kulit Character Heritage Preservation ![]() by Andy Pramono, I-Cheng Chang, Betty Dewi Puspasari Abstract: The conservation of intangible cultural heritage is essential within the context of fast globalisation. Wayang Kulit, a traditional Indonesian art form, is an important entertainment medium and also an embodiment of important moral and cultural values. However, its existence is increasingly threatened by waning youth interest and limited digital documentation. Artificial intelligence facilitates deeper analysis, improved recognition, and sustainable digital cultural preservation. This paper describes WayangFusionNet, a new hybrid model that combines multi-scale feature extraction from EfficientNetV2B3 with a cross-modal transformer for character recognition. It outperforms existing models, achieving a test accuracy of 99.27%. Qualitative analyses, such as confusion matrices, class activation maps, and t-SNE visualisations, validate the model's ability to establish and convey distinct features. The experiment results confirm the ability of digital technology to preserve endangered art forms and revive the art's values for future use. Keywords: Multi-Scale Cross-Modal Network; Feature Pyramid Networks; Hybrid Network; Indonesian Wayang Kulit Classification. DOI: 10.1504/IJART.2027.10077160 A Self-supervised Sub-style Separation Framework for Artistic Painting Classification ![]() by Rui Huang, J.I.A. CUI, Che Jiang, Chengran Hu, Meng Qi, Zhelin Li Abstract: Art style classification remains challenging due to subjective style delineation and the coexistence of multi-stylistic features within a single painting, which limit the effectiveness of conventional feature-learning strategies. This study proposes a Self-Supervised Learning-based sub-style Modelling method (SSLM) that models the sub-style distribution in enhanced views of a painting using introduced variance and covariance losses to extract a more stable and discriminative style representation. Experiments on style databases demonstrate that SSLM outperforms state-of-the-art methods in handling style ambiguity. Furthermore, we introduce a style uncertainty index to quantify the dominance of principal styles over sub-styles. Based on this metric, we construct a new dataset, P2, using a style-cleaning algorithm to enhance style purity. The accuracy of supervised models on P2 is improved through experiments, demonstrating the efficiency of cleaning style uncertainty. The proposed study offers new insights into art style classification through the sub-style modelling mechanism and style uncertainty quantification. Keywords: Style classification; Sub-style modelling; Style uncertainty; Image representation learning; Art style recognition. DOI: 10.1504/IJART.2028.10077162 Image Style Transfer and Visual Expression Based on Neural Networks ![]() by Lianlian He, Wei Sun, Dongxian Yu Abstract: Given the difficulty in balancing style and content and the limitation of visual expression in current image style transfer, this paper applies an improved DiffStyler model. First, ResNet-50 extracts multi-scale features (shallow conv1, middle res3, deep res5) with spatial alignment via upsampling and channel concatenation. The SE (squeeze-and-excitation) module dynamically adjusts channel weights through sigmoid-constrained intervals. Second, a dual-path Transformer architecture uses VGG19 (visual geometry group)-extracted style features as Key vectors and content features as Query vectors, achieving cross-domain alignment via similarity matrix calculations. Third, the DDPM (denoising diffusion probabilistic models) framework injects with a Keywords: Neural Networks; Image Style Transfer; Visual Expression; Style-Content Balance; Diffusion Model. DOI: 10.1504/IJART.2027.10077167 Practice of Multimodal Music Teaching Mode Based on Artificial Intelligence ![]() by Jie Liu Abstract: This paper takes EG and CG as the research objects to explore the differences between EG and CG. The pre-test is mainly to test the difference in music between EG and CG, and to compare with the students after the experiment. The post-experiment is divided into three phases: post-test, questionnaire, and interview. The post-test uses students' final exam scores as indicators to compare the learning effects of EG and CG, and to test the effect of the multimodal teaching model in music teaching. After the test, the same questionnaire was distributed to EG and CG to determine whether their musical interest improved. On this basis, the teacher randomly selected 10 students from EG to investigate their interest in multimodal teaching methods. The average scores of the pre-test and post-test of EG were 21.85 and 27.03, respectively, indicating that multimodal teaching can promote students' music learning. Keywords: Multimodal Music Teaching Model in Practice; Artificial Intelligence; Mel Frequency Inversion Factor; Multimodal Teaching Model. DOI: 10.1504/IJART.2027.10077173 Emotion Recognition in Artworks: Multimodal Data Analysis Based on Deep Learning Algorithms ![]() by Ying Bai, Liping Ouyang Abstract: Traditional art emotion recognition methods have limitations such as single feature extraction, strong subjectivity, and low recognition accuracy. This paper proposes a deep multimodal feature fusion method based on cross attention mechanism, which achieves fine-grained bidirectional interaction between visual, textual, and metadata patterns. This study uses an accurate cross database matching strategy to construct a high-quality multimodal dataset containing 32178 valid samples, integrating visual images, text descriptions, and structured metadata. The accuracy of this method on the test set is 92.7%, with an F1 score of 91.4%, which is significantly better than visual input only (85.2%) and text input only (79.8%). Compared with early and late fusion strategies, it improved accuracy and F1 score by approximately 4.3 to 5.7 percentage points, demonstrating its potential for application in digital humanities, intelligent curation, and other fields. Keywords: Artworks Research; Emotion Recognition; Multimodal Fusion; Cross-Attention Mechanism; Transformer Architecture. DOI: 10.1504/IJART.2027.10077282 3D Geometric Reconstruction Method of Damaged Cultural Relics Based on Multimodal Data Fusion and Deep Learning ![]() by Feng Li, Yajie Bai Abstract: The current 3D geometric restoration methods used for damaged cultural relics are susceptible to noise and information loss when using single modal data. For this purpose, this paper applies multimodal point cloud feature fusion and 3D generative adversarial processing. Firstly, a multimodal mapping function is used to encode the laser scanned point cloud, image sequence, and computed tomography (CT) slices into a dense 3D feature tensor. Next, the discriminator uses multi-scale convolution kernels to determine the geometric consistency between the generated point cloud and the ground truth point cloud at both local and global levels, and approximates the true distribution of lost artefacts by jointly optimising adversarial loss and geometric reconstruction error. The results show that when the defect rate is 10%40%, the chamfer distance of the proposed method increases from 0.30 mm to 0.47 mm; the average point spacing deviation increases from 0.13 mm to 0.22 mm. Keywords: 3D Geometric Reconstruction; Multimodal Feature Fusion; Generative Adversarial Loss; Geometric Consistency; Digital Heritage Preservation. DOI: 10.1504/IJART.2027.10077430 Design and Implementation of a Chinese Painting Style Copying System based on Image Recognition and Style Transfer Algorithm ![]() by Yifan Xue Abstract: Neural replication of Chinese ink painting demands stroke-level semantics and physics-aware modelling of ink-wash gradients and paper-fibre micro-textures. We integrated a stroke recogniser with a physics-guided, multi-scale style-transfer module, incorporating Darcy and anisotropic-diffusion priors, and evaluated fidelity, robustness, and efficiency.Trained on 2,900 Chinese ink paintings (2,400 internal; 500 external) spanning shan shui and hua niao genres, across resolutions up to 4,0962 and various Xuan papers/inks, our multi-branch CNN stroke recogniser and stroke-conditioned encoder-decoder outperformed baselines (Gatys NST, AdaIN, dual-path diffusion).Metrics showed superior performance: SSIM 0.928 (vs. 0.902), LPIPS 0.1680.186 (vs. 0.1980.218), SCDS 0.816 (vs. 0.772), stroke recognition macro-F1 0.892 and mIoU 0.803. Expert ratings (n=15) favoured our method (8.18.4 vs. 7.37.8). External validation SSIM reached 0.921, with efficient 4,0962 inference (12.4 s).This stroke-conditioned, physics-guided system achieves higher fidelity, cultural authenticity, cross-domain robustness, and high-resolution efficiency, advancing conservation-grade digital replication of cultural heritage. Keywords: Artificial Intelligence; Image Processing; Computer-Assisted; Pattern Recognition; Automated; Algorithms; Reproducibility of Results; Cultural Characteristics. DOI: 10.1504/IJART.2027.10077567 Digital display and virtual experience of cultural heritage based on image processing ![]() by Yuting Deng Abstract: In response to the contradiction between high fidelity reconstruction and lightweight virtual experience in digital display of cultural heritage, this paper proposes an end-to-end digital display framework that integrates multi view image processing and neural rendering optimisation. Firstly, based on the ETH 3D and MuralDH public datasets, this paper improves input consistency through preprocessing methods such as illumination normalisation, multispectral registration, and highlight separation. Secondly, in the 3D reconstruction stage, this paper introduces texture confidence weighting and multispectral feature fusion mechanisms, and then constructs a lightweight neural radiation field (NeRF) model, embedding a dynamic detail level mechanism for texture perception. The experiment shows that this method compresses the model volume to 16.6 MB, and at a high fidelity level of PSNR of 32.7 dB and SSIM of 0.936, the average frame rate on the network side is 42.3 fps, which is significantly better than the baseline scheme. Keywords: image processing; cultural heritage; digital display; neural radiation field; lightweight rendering; virtual experience. DOI: 10.1504/IJART.2028.10078101 Special Issue on: OA Intelligent Media Arts Convergence of Technology, Creativity, and Performance
Abstract: To solve the issues of structural mismatch, lack of controllability of harmony and texture, and insufficient realism of generated samples due to exposure bias in existing artificial intelligence models for polyphonic music generation, this study studies and designs a novel polyphonic artificial intelligence (AI) music generation algorithm based on transformer and adversarial mechanisms. Findings denote that the controllable choral transformer model achieves a test accuracy of up to 93.52% in the alto part, with a note error rate as low as 6.48%. The proposed model also achieves a training accuracy of 94.86%, a note-chord consistency score of 0.752, and a melody-chord pitch distance of only 0.853. The proposed algorithm, by combining relative position attention, multidimensional conditional control, and adversarial training, effectively improves the structural rationality and harmonic consistency of generated music, providing an efficient and feasible technical solution for AI-assisted music creation and personalised music generation. Keywords: transformer; music generation; polyphony; adversarial mechanism; relative position attention mechanism. DOI: 10.1504/IJART.2026.10078013
Abstract: Music style defines a musical works overall characteristics, while beats carry emotional undertones, and dance consists of rhythmic movements. This study introduces a music style and beat recognition model based on a genetic algorithm, optimising feature extraction and model selection to improve recognition accuracy. Results demonstrate that the model classifies music into three styles and correspondingly reclassifies the beat dataset. Combined with the previous music style recognition results, the most suitable model is selected from multiple beat recognition models through adaptive selection of fitness function to realise the final beat recognition of the song to be tested. Experimental results show that the model achieves good performance on F-measure and other metrics, and can effectively identify beat characteristics across different music styles. Through error analysis, it identifies the failure case laws of blues, rock and other styles, and provides data support for the accurate adaptation of performance actions and music emotions. Keywords: biological model; style recognition; beat detection; deep learning. DOI: 10.1504/IJART.2026.10078102
Abstract: With the advancement of AI in artistic creation, deep learning-based music generation research has become an integration direction of intelligent media and digital art. This paper proposes an automatic melody generation and arrangement auxiliary system for music production, based on the transformer architecture. The self-attention mechanism and style embedding model are constructed, and the system captures the long-term dependence between notes in the process of melody generation, thus realising the dynamic coordination of rhythm and pitch. The research results prove the effectiveness of transformer structure in the task of music sequence generation, and also show the application potential of artificial intelligence in music creation assistance, automatic arrangement and personalised melody generation. The system is applied in the fields of intelligent arrangement, digital music education, film and game music, etc., which provides a feasible path and technical basis for the realisation of man-machine collaborative creation. Keywords: transformer architecture; music melody; automatically generated; arranging auxiliary system. |
Open Access