Title: Artificial intelligence-based drug structure extraction and representation in chemistry documents
Authors: Xin Xu; Jiaheng Pan; Dazhou Li
Addresses: School of Computer Science and Technology, Shenyang University of Chemical Technology, No. 11 Street, Shenyang Economic and Technological Development Zone, Liaoning Province, Shenyang, China ' School of Computer Science and Technology, Shenyang University of Chemical Technology, No. 11 Street, Shenyang Economic and Technological Development Zone, Liaoning Province, Shenyang, China ' School of Computer Science and Technology, Shenyang University of Chemical Technology, No. 11 Street, Shenyang Economic and Technological Development Zone, Liaoning Province, Shenyang, China
Abstract: The feature extraction and representation of chemical bond linear molecular structures from chemistry publication images is of great significance to rediscover the properties of chemical structure, but the rule-based method and the existing deep learning methods are facing the problem of low recognition rate. This paper presents ChemRAL, an automatic conversion model for chemical molecular images and identifiers. ChemRAL employs an encoder-decoder architecture with a ResNet residual network for image feature extraction, and an attention-based LSTM long-term memory network for converting molecular structure images into chemical identifiers. Comparative evaluations demonstrate that the ChemRAL model outperforms existing methods in terms of cross entropy loss and accuracy of the longest common subsequence. The conducted experiments have successfully showcased the advancements achieved by the ChemRAL model. The findings unequivocally indicate that ChemRAL not only enhances the precision and effectiveness of molecular image feature extraction and representation but also provides a significant benchmark.
Keywords: artificial intelligence; drug discovery; drug structure extraction and representation; image characteristics extraction; drug information representation; LSTM.
DOI: 10.1504/IJCAT.2025.148139
International Journal of Computer Applications in Technology, 2025 Vol.76 No.1/2, pp.17 - 26
Received: 15 Sep 2023
Accepted: 18 Jan 2024
Published online: 27 Aug 2025 *