International Journal of Computational Vision and Robotics (IJCVR) Inderscience Publishers - linking academia, business and industry through research

Forthcoming Articles

International Journal of Computational Vision and Robotics

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

International Journal of Computational Vision and Robotics (85 papers in press)

Regular Issues

Fusion of periocular and forehead features for masked face recognition
by Diwakar Agarwal, Atual Bansal
Abstract: Most of the automated face recognition systems are not able to identify authorised users with masked faces. This paper proposed a face recognition system based on the fusion of information obtained from mostly uncovered regions of the masked face, i.e., periocular and forehead. Deep features are extracted from the periocular region by using the pre-trained inception-v3 model, while the handcrafted features such as local binary pattern (LBP) and histogram of oriented gradients (HOG) are extracted for the forehead region. The performance of the proposed method is evaluated on Georgia Tech Face Database (GTDB), Color FERET face image database, and self-acquired masked face database. Experimental results show that the fusion of periocular with forehead LBP and HOG features achieved notable verification and recognition rates as GTDB - 76.39% and 52%, respectively, Color FERET database - 96.63% and 95.62%, respectively, and self-acquired masked face database - 89.95% and 67%, respectively.
Keywords: biometrics; forehead; fusion; masked face; multimodal; periocular.
DOI: 10.1504/IJCVR.2024.10064605

Skin cancer recognition with novel deep learning methodology on mobile platform
by Phillip Ly, Abhishek Verma, Doina Bein
Abstract: The goal of this research is to create mobile applications that can leverage the power of deep learning to detect skin cancer in the early phase and save lives. In this paper we present: 1) novel deep learning-based methodology named as feature extraction with data augmentation and fine-tuning (FEDAFT) to develop compact mobile compatible model that perform effectively in both experimental and real-world situations; 2) our methodology is based on advanced data augmentation, transfer learning, and fine-tuning techniques and obtained top-1 accuracy of 88.35%, which is better than several of the other researches on skin cancer dataset; 3) the model is successfully deployed on the iOS and Android mobile systems; 4) furthermore, we create a composite dataset from several existing datasets for improved recognition accuracy.
Keywords: deep learning; skin cancer; melanoma; neural network; convolutional neural network; CNN; PHDB.
DOI: 10.1504/IJCVR.2024.10064682

Enhanced feature extraction technique using multiple pre-trained convolutional neural networks for improved brain tumour detection in MRI images
by Michael Chi Seng Tang
Abstract: Accurate brain tumour detection in MRI images remains challenging due to the tumours variability in appearance from image to image. This paper proposes an improved technique for identifying brain tumours in the images of MRI. To begin, features are extracted from MRI images using ResNet50 and ResNet101. The Chi-square test is then used to identify the five most significant features in each network. Concatenation of the features results in ten features per image. These features are used to train a classifier called K-nearest neighbour (KNN). The trained classifier is then evaluated on images from the testing set. The proposed method demonstrated accuracy, sensitivity, specificity, and precision values of 0.8492, 0.9351, 0.7143, and 0.8372, respectively. A comparison of performance demonstrates that the proposed method outperforms previously used methods for brain tumour detection on the same dataset in terms of accuracy, sensitivity, and precision.
Keywords: brain tumour detection; feature extraction; image processing; machine learning; medical imaging; computer vision; convolutional neural network; CNN; MRI images; artificial intelligence; feature selection.
DOI: 10.1504/IJCVR.2024.10065038

A novel spatial domain enhancement based image segmentation method with mean shift filtering and automated thresholding for leaf images on natural background
by Muhammed Shafi Koodakkal, Mohammed Ismail Bellary, Narayanan Sarala Sreekanth
Abstract: Identification of plants is essential for the production of fuel, healthy foods, medications, animal feed, and biodiversity conservation. It is difficult to recognize all of the plant species available in the world and identify some features that can be used to classify different plant species. Plant recognition relies on the shape and colour features of the leaves. The segmentation of a region from an image depends on background complexity. It will be simpler if the leaf is on a light background, and more challenging on a natural background. We proposed a method for the segmentation of leaf images on a natural background. One deep convolutional neural network has also trained to segment leaves from four different datasets. Through experimental runs, the obtained results show that our method achieves an accuracy rate of 98% and high PSNR and low RMSE scores, which is significantly better than the current state-of-the-art methods.
Keywords: image segmentation; enhancement; mean shift filtering; mask RCNN.
DOI: 10.1504/IJCVR.2024.10065087

A brief review on HCR works of Indian scripts
by Kalyanbrat Medhi, Diganta Kumar Pathak, Sanjib Kumar Kalita
Abstract: The aim of this paper is to describe the challenges faced by the researchers in the fields of optical character recognition (OCR) and handwritten character recognition (HCR) for Indian scripts. Several studies have been performed on OCR and HCR in Indian languages during the last few decades. The 11 Indian regional scripts are Devanagari, Bangla, Gujarati, Kannada, Malayalam, Oriya, Gurmukhi, Tamil, Assamese, Manipuri and Telugu. The aim of this paper is to present a review of the recent work done on OCR and the HCR of Indian scripts in the last few years, appearing in relevant conferences and journals. The survey is organised into five sections. Section 1 provides an introduction of various works, followed by components of OCR in Section 2. The properties of Indian scripts are presented in Section 3. Section 4 discusses different OCR methods and OCR research work done on Indian scripts. Section 5 concludes the paper.
Keywords: Indian scripts OCR; Indian handwritten character recognition; OCR survey; document analysis.
DOI: 10.1504/IJCVR.2024.10065452

Colour-spin: a novel PCL descriptor for 3D object recognition and detection
by Mohamed Hannat, Nabila Zrira, Fatima Zahra Ouadiay, Mohammed Majid Himmi, El Houssine Bouyakhf
Abstract: This paper addresses the challenge of three-dimensional (3D) object recognition and detection in real-world scenes by introducing a new feature descriptor, the colour-spin (CSpin). CSpin combines spin image features and RGB colour information, leading to enhanced feature robustness. The performance evaluation shows that CSpin significantly outperforms the traditional spin descriptor by improving accuracy by 11%. Although the colour-signature of histograms of orientations (CSHOT) descriptor presented better accuracy, CSpin’s lower computational time makes it more suitable for real-time applications. Additionally, the 3D bag of words (3D BoW) model was optimised using a support vector machine (SVM) with a nonlinear radial basis function (RBF) kernel and a codebook size of 250. Our approach achieved an overall recognition accuracy of 96.42% on the publicly available RGBD Washington dataset, outperforming other state-of-the-art methods. The proposed solution shows considerable promise for practical applications in 3D object recognition and detection.
Keywords: 3D object recognition; 3D object detection; colour-spin descriptor; CSpin; support vector machine; SVM; bag of words; BoW; point cloud library; PCL; RGBD Washington dataset.
DOI: 10.1504/IJCVR.2024.10065156

Maximum power generation and conversion using hybrid technique for grid connected PV system
by A. Renjith, P. Selvam
Abstract: In this paper present an adaptive dingo optimiser (ADO) and coyote optimisation algorithm (COA) for maximum power generation from PV panels even in low irradiance and climatic change. ADO is a combination of arithmetic optimisation algorithm (AOA) and dingo optimiser (DO) based on dingo animal behaviour and mathematical calculations. COA is used to ensure that power stability during the entire process based on the social behaviour of coyotes. The novelty of this paper is produce maximum power in under different conditions and maintains the stable power for any type of loads and demands by controlling signal of inverter and converter using proposed algorithms. Using MATLAB/Simulink platform, the maximum PV panel power is modelled, and the findings of system efficiency study are also validated. From various simulation results generated output power is maximum and stability is also more efficient even with any type of loads when compared to existing technology.
Keywords: irradiance; PV panel; maximum power generation; adaptive dingo optimiser; ADO; coyote optimisation algorithm; COA.
DOI: 10.1504/IJCVR.2024.10065174

Quantum support vector machine: a novel approach for predicting students academic performance in higher education
by Iti Batra, Subhranil Som, Pawan Whig
Abstract: This research paper investigates the impact of non-intellectual parameters on students academic growth, emphasising the transformative role of data mining in education. Through psychometric analyses of students behaviour and learning patterns, we aim to enhance academic performance. Previous studies, employing various mining techniques like neural networks, decision trees, KNN, naive Bayes, and SVM, have yielded accuracy rates below 89%. In response, our study proposes a novel approach, leveraging the quantum support vector machine (QSVM) for data classification to predict learners cumulative grade point index (CGPI). Results demonstrate a significant advancement with the radial basis function kernel in the QSVM model, achieving an impressive accuracy of approximately 98%. This breakthrough underscores the superior predictive capabilities of QSVM, marking a substantial improvement over existing methodologies. The findings suggest that QSVM holds great promise in accurately predicting academic performance based on psychological factors, paving the way for enhanced educational outcomes.
Keywords: non-intellectual parameters; academic performance; study habits; data mining; psychometric analyses; educational outcomes; psychological factors; neural networks; decision trees; K-nearest neighbours; KNN; naïve Bayes; support vector machines; SVM; quantum support vector machine; QSVM.
DOI: 10.1504/IJCVR.2024.10065562

ROI adaptive lossy compression in dermatology using deep learning and superpixels algorithm
by Saida Lemnadjlia, Melouah Ahlem, Amel Slim
Abstract: Medical images play a vital role in diagnosing and monitoring diseases, yet storage challenges arise due to their size. The necessity for compression becomes evident, though traditional methods compromise image quality. This study proposes a solution by selectively degrading non-critical image portions, adjusting the compression ratio based on content. The relevance of each segment is determined using a deep learning (DL) and super-pixels combination. Initially, a super-pixels method groups homogeneous pixels, and then the image is split into partitions, each classified by a DL model. A compression factor guides the ratio and degradation. Evaluation on skin images reveals superior results (CR: 42.07, PSNR: 63.87, MSE: 1.15, SS: 97.62) compared to existing techniques, affirming the method’s success.
Keywords: deep learning; super pixel; image compression; jpeg; adaptive; loss of data; medical images; quality factor; semantic.
DOI: 10.1504/IJCVR.2024.10065654

A novel approach of multi-classification of brain tumour with MRI images using transfer learning
by Rashmi Jolhe, Sudhir D. Sawarkar
Abstract: Brain tumour identification via MRI is vital for early diagnosis and treatment planning, yet manual detection is time-consuming and accuracy varies. Our framework employs transfer learning, using fine-tuned pre-trained networks to extract deep features from MR images. We fine-tuned and evaluated six pre-trained neural networks for a brain tumour classification challenge, aiming to enhance computation efficiency and mitigate overfitting risks. Standardised techniques were applied to tune model weights and extract relevant features, tailored for the 4-class tumour categorisation task. Both quantitative measures like classification accuracy and convergence speed, and qualitative aspects like required training resources, were analysed. Findings highlight the promise of properly fine-tuned pre-trained models to boost accuracy and efficiency for critical healthcare tasks. This evaluation establishes transfer learning as impactful for adapting computer vision models to real medical applications. Achieving 98.74% accuracy, our model addressed overfitting issues and yielded favourable computational results.
Keywords: deep learning; magnetic resonance imaging; MRI; glioma; meningioma and pituitary; perfusion MRI.
DOI: 10.1504/IJCVR.2024.10065707

Energy efficient and secure data aggregation based on rock hyrax swarm optimisation and deep recurrent auto encoder in WSNs
by K. Hemalatha, M. Amanullah
Abstract: In wireless sensor networks (WSNs), the sensor nodes typically have limited capabilities in terms of sensing, computation, and communication. To improve the overall lifetime and reliability of these networks, it is crucial to implement energy-efficient and secure data aggregation techniques. These approaches aim to optimise the usage of available resources while ensuring the integrity and confidentiality of the aggregated data. By employing such methods, WSNs can operate more efficiently and effectively, enabling a longer lifespan and improved reliability. This study presents a novel hybrid methodology to enhance data aggregation performance and security in WSNs. The methodology utilises rock hyrax swarm optimisation (RHSO) for feature extraction, inspired by the natural behaviours of rock hyrax swarms in search of food. Additionally, an attention recurrent autoencoder (ARAE) model, which combines autoencoder architecture with attention mechanisms and recurrent neural networks, is employed for classification.
Keywords: data aggregation; rock hyrax swarm optimisation; RHSO model; optimisation; recurrent method; auto encoder model; attack detection; wireless sensor networks; WSNs.
DOI: 10.1504/IJCVR.2024.10065783

Efficient high-speed and low-area hardware architectures of Lilliput cipher for resource-constrained RFID applications
by Pulkit Singh, Rachana Chandupatla, Archana Nagar, Bibhudendra Acharya
Abstract: The development of communication networks has made information security more important than ever for both transmission and storage. Secure communication suffers resource limitations. Lightweight cryptographic algorithms are basically developed to work on constraints devices where conventional cryptography cannot be applied efficiently. In this paper, 8-bit hardware architecture of Lilliput cipher is proposed, which takes 8-bit out of 64-bit of plaintext to perform encryption and decryption operations. The proposed hardware architecture can work for those devices that can operate on low-bit input-output configuration like microcontrollers. Moreover, a loop unrolling technique is also proposed to decrease latency and improve throughput of hardware architecture. The Spartan and Virtex FPGAs platforms are used to carry out the implementation results of proposed 8-bit and unroll architecture for different unroll factors. The results demonstrate that the proposed designs have advantages over previous similar state-of-the-art papers in terms of throughput/area, throughput, and execution speed.
Keywords: architecture; radio frequency identification; RFID; field programmable gate array; FPGA; resource-constrained; cipher.
DOI: 10.1504/IJCVR.2024.10065889

Self-supervised learning - recent advancements, global trends and research directions
by Narinder Kaur Seera, Neha Gupta
Abstract: Self-supervised learning (SSL) has emerged as a novel machine learning paradigm that enables learning feature representations when the annotated data is scarce. In SSL, a model is trained on pretext tasks which extract useful representations from the available data to perform various downstream tasks. This study conducts a comprehensive review on different approaches of self-supervised learning predictive, generative and contrastive and recent developments in this field. This paper also highlights the significant contributions of SSL in computer vision and natural language processing (NLP) with its various applications in similar domain. It has been found that generative and contrastive models have proved their significant contributions in the field of medical imaging and video classification. Finally, the study presents a set of open research problems in SSL and future directions for the readers. This paper attempts to provide a complete study on self-supervised learning covering all major achievements, challenges and progress since its inception.
Keywords: self-supervised learning; SSL; pretext task; generative learning; contrastive learning; computer vision; natural language processing; NLP.
DOI: 10.1504/IJCVR.2024.10065969

A very effective approach for the generation of cryptographic keys based on safe EEG features
by T. Senthil Kumar, L. Mohana Sundari, M. Bharaneedharan, C. Saravanakumar
Abstract: The term biometric-based cryptographic key generation refers to a specific kind of data mining in which knowledge discovery methods are used to obtain biometric data in order to create cryptographic keys for the purpose of encrypting secure data. This method is considered to be a data mining strategy. The science of concealing information is known as cryptography, and when combined with cognitive science, the solutions of cryptography establish a new branch of the field cognitive cryptography. In this article, we provide a novel approach that we believe will reduce the amount of inaccuracy that is associated with the electroencephalogram-based key creation process. The new approach is based on the window segmentation protocol, and all of the characteristics are first transformed to the binary mode before being utilised as the input for the new method. We were able to determine that the mean HTER for the single-channel system was 0.48%.
Keywords: security; cryptography; electroencephalogram; EEG; biometric cryptosystem.
DOI: 10.1504/IJCVR.2024.10066093

Improving license plate detection with YOLO-LPD algorithm
by Sahil Khokhar, Deepak Kedia
Abstract: License plate detection is a pivotal task with implications across diverse domains, including traffic management, law enforcement, and parking administration. This research paper introduces YOLO-LPD, an innovative evolution of the YOLOv7 algorithm, engineered to excel in license plate detection. YOLO-LPD seamlessly integrates novel data augmentation techniques that significantly enhance the diversity and richness of the training dataset, alongside optimising the detection head of the YOLO algorithm to augment feature extraction and representation. Through systematic comparisons against the baseline YOLOv7 algorithm and contemporary state-of-the-art methodologies, YOLO-LPD emerges as the frontrunner, boasting a remarkable F-score of 98.97%. This outstanding performance resonates across the intricacies of license plate detection, effectively surmounting challenges posed by adverse lighting conditions, partial occlusion, and low visibility scenarios. The practical ramifications of YOLO-LPDs advancements are extensive, from enhancing traffic flow to empowering law enforcement and optimising parking management.
Keywords: automatic license place recognition; ALPR; object detection; deep learning; machine learning; computer vision; intelligent transportation system.
DOI: 10.1504/IJCVR.2024.10066190

Detecting sleepiness of the driver using computer vision and deep learning techniques
by V. Pradeep, Nisha Tellis, P. Namratha, S. Shravya, Vshker Mayengbam
Abstract: The majority of auto accidents are caused by drowsy driving, and so we present a reliable and intelligent method for detecting it. The proposed method obtains the input by capturing the facial features of the driver by installing a camera inside the vehicle. The system applies the Viola-Jones object detection framework that uses Haar-like features and AdaBoost learning for feature selection for initially identifying and extracting the face from the image and then identifying the eye region on the face. The deep convolutional neural network model with Keras uses the input data, which will classify different levels of drowsiness. Eye closure techniques are used to determine whether the driver is feeling fatigued. The system evaluates the drivers’ level of fatigue and, if necessary, issues a warning message. Our experimental results validate the effectiveness of our approach.
Keywords: drowsiness detection; computer vision; facial expression recognition; convolutional neural networkp; deep learning.
DOI: 10.1504/IJCVR.2024.10066345

Weed identification from field crop images using deep learning techniques
by Sulakshana B. Mane, Kiran Shrimant Kakade, L. Sudha, Sankarsan Panda
Abstract: The identification of weeds is very important in the process of constructing a weed control system that is based on deep learning. Using photos of both weeds and crops, deep learning techniques provide assistance in the construction of a weed identification model. Deep learning algorithms that are used for weed identification have been subjected to a limited number of studies that have been undertaken to evaluate the potential impact that different photo backgrounds may have on these algorithms. Average f1-score of 77.5% and 68.4%, respectively, was attained by the models in order to satisfy their performance objective. In the process of building the model, the use of both uniform and non-uniform background photographs led to an enhancement in the performance of both the VGG16 and ResNet50 models, with the average f1-score values falling somewhere in the range of 92% to 99%.
Keywords: learning techniques; convolutional neural; visual group geometry; moving cameras.
DOI: 10.1504/IJCVR.2024.10066346

Parallel U-Net based virtual try-on system - VIRTUALTN
by Ved Patel, Yash Dalvi, Ameya Kulkarni, Anuj Neema, Dipti Jadhav
Abstract: Virtual try-on is a configurable person creation framework that includes virtual try-on and other fashion modification tasks. Authors propose a virtual try on method VIRTUALTN that generates clothing effects that are not possible with existing work, such as varied garment interactions such as wearing a top tucked into or over a bottom and layering of numerous clothes of the same type. VIRTUALTN specifically records each garment’s shape and texture, allowing these elements to be changed independently. Authors present a virtual try-on feature based on parallel U-net architecture and image processing that depicts customers how clothes look on actual models of various body shapes and sizes. This feature is achieved by a new generative AI model that generates life-like depictions of clothing. The objective evaluation of the proposed technique based on SSIM and FID shows acceptable results. The performance analysis shows that the proposed technique is comparable with many state-art-techniques.
Keywords: virtual try on; DeepFashion; multimodal dataset; image processing; structural similarity index; Fréchet Inception Distance; FID.
DOI: 10.1504/IJCVR.2024.10066444

UAV-based detection of landmines using infrared thermography
by Muhammad Umair Akram Butt, Zaighum Naveed, Usama Javed
Abstract: Landmines remain a pervasive threat in conflict-affected regions worldwide, exacting a toll on innocent lives. Approximately every two hours, another individual becomes a victim of these lethal explosive devices (Landmines Monitor 2023, 2023), with a significant proportion being innocent civilians. Current methods for landmine detection suffer from inefficiency, high costs, and risks to the operator and system integrity. In this paper, we present a novel, efficient, safe, and cost-effective approach to unearth these hidden dangers. Our proposed method integrates an unmanned aerial vehicle (UAV) with a thermal camera to capture high-resolution images of minefields. These images are subsequently transmitted to a base computer, where a state-of-the-art image processing algorithm is applied to identify the presence of landmines. Notably, our solution performs exceptionally well, particularly during evening hours, achieving an impressive detection accuracy of nearly 88%. These results exhibit great promise when compared to existing methods constrained by their design limitations.
Keywords: IR thermography; landmines detection; unmanned aerial vehicle; UAV; remote sensing; drone.
DOI: 10.1504/IJCVR.2024.10066473

Preventing crop damage due to animals using deep learning
by Naeem Ahmad, Rizwan Alam, Shuchi Sethi
Abstract: Even though the technology has matured enough, farmers are still following the old methods of dealing with animals in crop fields. Along with using the methods of fences and repellents for crop protection, the very best method is constant monitoring of crop fields, which is often used in farming practices. This constant monitoring makes sure that everything on the field goes as planned. The constant monitoring through traditional methods is very difficult for farmers, especially on rainy days, and fog when lands are near forests. To deal with this perennial problem, a framework has been proposed, that utilises modern technology such as deep learning and agriculture internet of things (AIoT) devices. In this framework, a deep learning technique is applied to video segments generated by video surveillance systems to monitor the animals gatherings in agricultural lands. The prediction method faster-RCNN applied in this framework has shown good accuracy. To justify our results, we evaluated a model on different performance metrics using a modified open image dataset of animal class.
Keywords: smart agriculture; deep learning; crop damage; animals; object detection.
DOI: 10.1504/IJCVR.2024.10066616

Human activity recognition using binarised spiking neural network with remora optimisation algorithm
by T. Senthil Prakash, Francis H. Shajin, Rajesh Kumar Singh, T. Eswarlal
Abstract: Human activity recognition (HAR) has great advantages in many applications, such as healthcare, entertainment, and human activity monitoring. Several classified methods are which does not accurately identifying the human activities. To overcome these issues, human activity recognition using binarised spiking neural network with remora optimisation algorithm is proposed to recognise the human transition motion. Initially, data’s are taken from human activities and postural transitions (HAPT) dataset. Then the input data’s are pre-processed by structural interval gradient filter for removing the noises. These pre-processed data are fed to ternary pattern and discrete wavelet transforms to extract the features through the data collected by the sensors. To improve human activity identification rate BSNN model is used. The weight parameters of BSNN are optimised by remora optimisation algorithm (ROA) for accurately recognise transitions and their activities. It is implemented in Python.
Keywords: binarised spiking neural network; BSNN; structural interval gradient filter; human activity recognition; HAR; remora optimisation algorithm; ROA; ternary pattern; discrete wavelet transforms.
DOI: 10.1504/IJCVR.2024.10066617

Artificial intelligence control learning for autonomous industrial robots
by Amit Yadav, Jyoti Vimal, Aseem Chandel, Trapti Jain, Dushyant Kumar Singh
Abstract: In recent days the technology shift can be well observed towards artificial if we talk of intelligent robots. Robot and intelligent robots which is based on control algorithm developed by using artificial intelligence. The idea of intelligent robot comes from human intelligence, speed of thinking, capabilities of balance and the real time decision making in the crowded environment. The robot which may avoid the obstacles, and reach to the destination safely. The research involves two main components. The first relates to mathematical modelling of robot, in terms of kinematic, dynamic, and electric drives models. The second part of research concerns: filtering, optimisation, system identification, and development of search algorithm for robot. These methods should be incorporate: 1) obstacle is static; 2) obstacle is moving slowly; 3) obstacle is moving fast. The search algorithm will provide a human like intelligence to the robot. The approach assumes external sensors and control algorithm for complete the task.
Keywords: control learning techniques; CLT; computer vision; expert system.
DOI: 10.1504/IJCVR.2024.10066618

Performance evaluation of neural network models to classify day-5 human embryos
by B.R. Shobha Rani, S. Bharathi, Balachander Agoramurthy, Srivatsa Sanath
Abstract: The success rate of the most recent ART treatments for infertility has significantly increased over time, rising from 25%30% to 65%70%, while simultaneously minimising miscarriages and birth defects. In this paper, in depth comparison of neural network models are discussed and proposed a state-of-art model to automate embryo detection process with a conventional microscopic image. Neural network models such as Faster R-CNN, YOLO and Mask R-CNN were implemented and assessed that Mask R-CNN model is suitable for embryo detection in the process of embryo grading. Detection process is an important criterion in automating embryo grading system helping embryologists to transfer viable embryos resulting in successful pregnancy. The effective approach implemented in this paper for detection process is also compared with other neural network models such as Faster R-CNN and YOLO and found that precision and recall of Mask R-CNN with Resnet101 as backbone has outperformed than other models. Selecting an optimised detection algorithm helps further in grading process resulting in successful pregnancy.
Keywords: in vitro fertilisation; IVF; inner cell mass; ICM; trophectoderm; TE; region proposal network; RPN; Mask R-CNN; Faster R-CNN; you only look once; YOLO; hyperparameters.
DOI: 10.1504/IJCVR.2024.10066666

A combined approach of radon and Fourier transform for multimodal medical image registration
by Manjusha Deshmukh, Sheetal Bukkawar
Abstract: Image registration is an essential imaging technique used to match two or more images of the same scene. However, different image registration methods are required to address specific situations, as geometric radiometric distortions, noise disturbance, data characteristics, and accuracy threshold levels vary across different tasks. This paper proposes a method to identify features from a given set of radon projections. The radon transform converts rotation into translation, and the magnitude of DFT is invariant to translation. The features extracted using their combinations are invariant to rotation. The RTFT approach is used to register images in different frequency bands for image fusion applications. The objective of this work is to address variations in scale, translation, and rotation. The algorithm is tested on PET and MRI image datasets, using around 23 images, to correct scale, translation, and rotation. The algorithm has been successfully tested with different datasets, and the required average computation time is in seconds, depending on the size of the images, with an accuracy of 98%.
Keywords: image registration; image fusion; radon transform; Fourier transform; image matching.
DOI: 10.1504/IJCVR.2024.10066771

Enhancing crop disease prediction in tropical regions through GAN-based transfer learning
by Amol Bhilare, Niraj Patel, Debabrata Swain, Hargeet Kaur
Abstract: Agribusiness is one of the critical areas for the endurance of humanity. Simultaneously, digitisation influenced many areas, making it easier to execute many complex operations. Utilising technologies is essential for farming to benefit both the producer and the client. Agriculture has a significant impact on the economy of developing countries. In India, 40%-50% of people still depend on agriculture as an essential source of revenue. But sustaining the crops gets harder when they are afflicted with multiple diseases. It is tough for the average pupil to understand the pattern of the conditions in plant leaves and find its solution. Through technologies and continuous vigilance, pathogens can be recognised early and treated, resulting in higher agricultural production. Here a transfer learning-based framework is proposed to detect the diseases in tropical region crops such as potato and tomato. This study has shown a promising solution to the issue by using EfficientNet-B5, a transfer learning-based model. This work applies re-sampling techniques such as augmentation and generative adversarial networks (GANs). The EfficientNetB5 model has shown the highest validation accuracy of 99.73% on GANs images.
Keywords: tropical region crop; generative adversial network; augmentation; EfficientNet; Softmax; Adam optimiser; transfer learning.
DOI: 10.1504/IJCVR.2024.10066824

Electronic health information systems to improve the detection of disease
by S.B. Mohan, Mukil Alagirisamy, Leo John Baptist Andrews, S. Allwin Devaraj
Abstract: The use of an electronic health information system (EHIS) makes it easier for healthcare workers to give clients personalised care, and it also makes it easier for service providers to share information. EHIS are being used more and more to diagnose and treat both communicable and non-communicable diseases. This is because of the growing amount of patient data and the need to provide ongoing care. Also, the fact that the COVID-19 pandemic happened in low- and middle-income countries (LMICs) with a lot of sick people has shown how important a strong EHIS is for keeping track of pandemics. The goal of this study is to figure out how durable and resilient hospital information systems are in light of the big changes happening in the world of healthcare. The study conducts a comprehensive evaluation of existing research on HIS, with a particular focus on their implementations in healthcare settings. The accuracy rate provided by the proposed system is 92.5%.
Keywords: electronic health information system; EHIS; healthcare apps; healthcare; HIS applications.
DOI: 10.1504/IJCVR.2024.10066870

Automated lesion segmentation of COVID-19 chest CT scan images using magneto-static active counter model
by S. Salini, B. Selvapriya
Abstract: The novel coronavirus disease, often referred to as COVID-19, is an airborne infection that is very infectious and has been responsible for a substantial amount of harm all over the globe. It got its name from the COVID-19 virus that caused it. The recent years have witnessed the appearance of more COVID-19 varieties, which has made the current scenario a great deal more complex and possibly harmful. This is due to the fact that the situation has become much more complicated. Evaluation and measurement of COVID-19 chest anomalies may be able to be of assistance in all of these areas. In this investigation, we make use of CT images to test the applicability of magneto-static active contour models (MACMs) for the segmentation of pneumonia caused by COVID-19, which is a disease caused by the coronavirus. MCMs are a productive method for image segmentation. The proposed way for adding more data to the training set can raise the average DSC from 0.7599 to 0.781. This method can drop the average MAE from 0.0074 to 0.0067 and raise the average SEN from 0.8104 to 0.8308.
Keywords: chest CT scans; COVID-19 infection; active contour models; level set methods; region-based models; edge-based models.
DOI: 10.1504/IJCVR.2024.10067000

Zigbee-based wireless sensor network in smart agriculture for crop production
by K. Sathiya Priya, C. Rajabhushanam
Abstract: When it comes to high-tech farm tracking cycles that are built on networks, "smart farming" is a new trend that uses tools, equipment, and monitors to put a focus on information and communication technology. Cloud computing, IoT, and other smart technologies could accelerate advancements and increase the use of AI and robotics in agriculture. Not only are these drastic changes causing a great deal of issues, but they are also disrupting the current agricultural practices. This research aims to investigate the kinds of tools and equipment used in Internet of Things agriculture to employ portable devices, as well as the issues that are expected to arise when technology is combined with conventional agricultural practices. This technology understanding also helps farmers during the whole food cycle, from growing to gathering. The ZigBee routing system drops the number of times routing is found by 15.2%.
Keywords: crop management; sustainable agriculture; smart farming; internet-of-things (IoT); advanced agriculture practices; issues and problems.
DOI: 10.1504/IJCVR.2024.10067001

Transfer learning based ResNet50v2 model for classification of COVID-19 in chest X-ray Images
by Farha Nausheen
Abstract: Coronavirus disease 2019 (COVID-19) is a lung-specific strain of influenza. One of the techniques used to diagnose it is the chest X-ray (CXR). One such technique is a convolutional neural network (CNN), one of the most well-liked deep learning (DL) designs. DL requires training a CNN for a purpose like classification using a big dataset. The transfer learning (TL) approach is utilised because the available datasets are either too small or too heterogeneous to build efficient feature representations. The popular ResNet architecture, which is TL-based, is well known for its great performance and generalisation on recognition tasks. In the proposed study, classification is carried out on a dataset that includes 10,192 normal and 3,616 COVID-19 chest X-ray images using the ResNet50v2 model. To evaluate the accuracy, specificity, sensitivity, and precision of the provided model, an experimental analysis is carried out. The testing accuracy is 98.746% and the testing loss is 0.04482 after 20 epochs. In order to expand the training set and enhance generalisation, we have included data augmentation.
Keywords: COVID-19-detection; deep learning; transfer learning; ResNet50v2; chest X-ray images.
DOI: 10.1504/IJCVR.2024.10067028

Deep learning based real time automated scrap identification through robotic vision
by Ami Munshi, Sapna Shah, Prateeksha Shanoj
Abstract: Everyday more than billions of tons of scrap is generated globally through industries, households, institutes etc. People throughout the world have acknowledged this as a problem and a serious threat to our environment. Many ideas and techniques are implemented to address this issue. In this paper, we have implemented robotic vision based real time process to segregate scrap material such as cardboard, metal, paper, plastic and glass. Comparison of transfer learning based deep neural networks such as MobileNet, DenseNet, VGG and ResNet is done. Further, idea to estimate distance of the classified object from the camera is also suggested. This segregation can be useful in premises such as waste management and automobile scrap industry. Further the segregated waste can facilitate processes such as recycling, reusing and safe disposal of the material.
Keywords: transfer learning; MobileNet; VGG; DenseNet; ResNet; segregation.
DOI: 10.1504/IJCVR.2024.10067096

Object identification and alerting method for pattern analysis
by Sulakshana B. Mane, Kiran Shrimant Kakade, S.M. Patil, Jayant Brahmane
Abstract: Security and surveillance is a less noticeable sector in civilian usage but is a panacea for defense and law enforcement agencies. This study suggests a method, TIAS. TIAS utilises computer vision (CV) and machine learning (ML) methods by forming a connection between the surveillance system, ML logic and the user interface. The proposed system creates a portal to infix data about targets from authorities/security agencies, then processes the data (textual/visual) received via security agencies and trains a model to recognise them in the public spaces via surveillance systems. The public spaces like ticket booths/stations/malls, etc., provide data to the system for verification in textual form as contact details, ID numbers, and visual form as image & video feed. The KNN classifier has been used along with OpenCV to provide a simple & robust ML and CV structure along with the Flask framework to provide an efficient user interface.
Keywords: face recognition; computer vision; security and surveillance; machine learning; image processing.
DOI: 10.1504/IJCVR.2024.10067181

Handling network congestion origins using recurrent neural networks to counter botnet-driven overloads
by Shallu Hassija, Khusboo Tripathi, Meenu Vijarania
Abstract: Network congestion caused by botnets is a critical threat to the stability and security of modern communication infrastructures. This research explores the proactive use of recurrent neural networks (RNNs) to address network congestion stemming from botnet-driven overloads. By forecasting the expected traffic patterns of each host through time series analysis, our proposed methodology aims to mitigate the negative impacts of botnet activities on network performance. Our study encompasses in-depth exploratory data analysis, RNN model training, and predictive analysis, all conducted on a comprehensive computer network traffic dataset. The results unequivocally demonstrate the potential of RNNs in identifying and flagging anomalous network patterns associated with botnet activities, presenting a promising strategy for addressing network congestion at its source. The primary findings of our research underscore the remarkable effectiveness of RNNs in accurately predicting and detecting network congestion events, achieving an accuracy rate that consistently surpasses 90%. This breakthrough in congestion detection has far-reaching implications for bolstering the security and resilience of modern communication infrastructures against the ever-evolving threat of botnets.
Keywords: network congestion; botnet; recurrent neural networks; RNNs; anomaly detection; predictive analysis; time series; communication infrastructure.
DOI: 10.1504/IJCVR.2024.10067212

A convolutional autoencoder-based cataract classification and disease localisation using fundus images
by Pammi Kumari, Priyank Saxena
Abstract: A cataract is a condition where vision becomes increasingly blurry as the eyes lens gradually becomes opaque. Early detection of cataract is crucial for a better prognosis. This work proposes the integration of an autoencoder designed for feature extraction with the pre-trained deep learning (DL) models used for classification. The dataset used in this work has been meticulously curated from multiple repositories available on retinal images, ensuring a comprehensive representation of cataract cases. The DL model trained on the extracted features from the autoencoder yields higher accuracy because it captures nonlinear correlations between features. Also, the gradient-weighted class activation mapping (Grad-CAM) method was incorporated at the last convolutional layer, highlighting the crucial regions of an image. The proposed technique achieves an exceptional 98% accuracy in cataract detection and localisation, as established by quantitative and qualitative data. This high level of accuracy makes it a valuable tool for early cataract detection.
Keywords: fundus images; cataract detection; deep learning; disease localisation; autoencoder; heatmap techniques.
DOI: 10.1504/IJCVR.2025.10067242

Designing of personalised ubiquitous healthcare service providing system for elderly patients
by Preeti Khanwalkar, Swetha Ravikanti, Vinod B. Durdi
Abstract: The recent advancements in wearable/sensing/smart devices, wireless communications, artificial intelligence, and cloud computing technologies have enabled personalised ubiquitous healthcare services for elderly patients. Elderly patients need special care due to declining cognitive capabilities, problems with memory/social interactions, etc. and thus need personalised ubiquitous healthcare services without their requests or interventions to make them active and independent. These services range from support in their daily tasks such as medicine reminders, managing nutritional diet, exercise routines, and other non-critical/critical services. For providing timely and relevant services, from the available plethora of services, this paper presents a design of a personalised ubiquitous healthcare service-providing system that acquires and analyses elderly patients personal and contextual information and builds a patient model to identify and provide healthcare services according to their requirements. The simulation results show that the proposed system, using the elderly patient model significantly reduces the time to identify and provide required services.
Keywords: personalisation; elderly care; ubiquitous computing; context-awareness; healthcare service providing system.
DOI: 10.1504/IJCVR.2024.10067302

Effective identification of MRI brain tumour images through artificial intelligence techniques
by Sateesh Amarneni, R.S. Valarmathi
Abstract: The research objective is to classify and segment tumour portion in MRI brain image. The preliminary research initiate with pre-processing followed with feature extraction with aid of Gray level co-occurrence matrix (GLCM); extracted features get utilised during classification normal, glioma, meningioma and pituitary. For classification, the research involves deep neural network (DNN) and later on to enhance the performance further optimisation techniques employed to identify the optimal weights parameters. The optimisation techniques involved in this process are rider optimisation algorithm (ROA), artificial fish swarm optimisation (AFSO), and Gray wolf optimisation (GWO). Afterwards, the abnormal classes are applied for segmented process using modified region growing techniques. The proposed ROA associate with DNN achieves 96.70% accuracy that is superior over other comparative techniques and existing DNN technique.
Keywords: Gray level co-occurrence matrix; GLCM; rider optimisation algorithm; ROA; artificial fish swarm optimisation; AFSO; Gray wolf optimisation; GWO; deep neural network; DNN.
DOI: 10.1504/IJCVR.2024.10067316

Using AI and computer vision to allocate parking spaces at seaports
by N. Krishnakumar, Suresh BabuJugunta, Surya KiranChebrolu, C. Antony, B. Supraja
Abstract: Many researches are carried out to find the solutions for parking lot management based on computer vision. In this research, we surveyed and analysed robust picture datasets that are publically accessible. These datasets were particularly constructed to evaluate computer vision-based algorithms for parking lot management approaches. As a result, we give a systematic and complete assessment of previous studies that utilise such datasets. A number of pertinent holes that call for more study were found via the evaluation of the relevant literature. One of these gaps is the demand of dataset-independent methodologies and methods that are suited for autonomous identification of the location of parking spots. Furthermore, the study of the datasets indicated that some characteristics that need to be included when building new benchmarks have not been integrated. These characteristics include the availability of video sequences and photographs recorded in a wider variety of settings.
Keywords: computer vision; artificial intelligence; AI; sea ports; parking system.
DOI: 10.1504/IJCVR.2024.10067376

A deep convolutional neural network for detecting volcanic thermal images
by Kiran Shrimant Kakade, Sulakshana B. Mane, R.A.Mabel Rose, Leo John Baptist
Abstract: A lot of volcanoes around the world have visible-wavelength cameras on them. These cameras are often thought to be one of the easiest types of tracking equipment. Keeping an eye on when powerful volcanic events happen could give us useful knowledge that we can use to study and keep an eye on volcano hazards. As the title of this paper suggests, we want to make deep learning models using convolutional neural networks (CNNs) to separate the meanings of visual pictures of powerful volcanic plumes. On 24 December 2018, 560 pictures were taken from three different places during the eruption. The data that the two CNN models were fed came from high-resolution video cameras on Mount Etna in Italy. These cameras were part of a ground-based network called Etna_NETVIS. Furthermore, semantic segmentation makes it possible to automatically track volcanic clouds and figure out geometric factors.
Keywords: volcano monitoring; thermal images; deep learning; nature.
DOI: 10.1504/IJCVR.2024.10067378

An optimised image dehazing framework using dark channel prior and MSRCR algorithms
by Divya Sharma, Shilpa Sharma, Harshal Patil, Vaibhav Bhatnagar
Abstract: Computer vision-based systems are typically designed for clear weather conditions but face challenges in adverse conditions like fog or haze. These conditions degrade image quality, making object recognition difficult due to blurry object boundaries and reduced visibility. Image dehazing aims to reduce noise and restore clarity. However, existing haze removal algorithms often result in issues like light scattering, colour distortion, and poor contrast. This paper introduces an improved image dehazing framework that uses a masked dark channel prior (MDCP) algorithm. The framework estimates atmospheric light by analysing the skys average intensity and incorporates unsharp masking to enhance image sharpness. Additionally, the multi-scale retinex colour restoration (MSRCR) algorithm is applied to further improve colour balance. The frameworks effectiveness is evaluated using PSNR and SSIM metrics, showing significant visual enhancement. A paired t-test comparison with Hes DCP algorithm indicates a notable improvement in the proposed methods performance.
Keywords: dark channel; unsharp masking; transmission; atmospheric light; multi scale retinex colour restoration; MSRCR; T-test.
DOI: 10.1504/IJCVR.2024.10067559

Paired pixels mean-based near reversible watermarking scheme for digital images
by Bharathi Chidirala, Bibhudendra Acharya
Abstract: In digital image watermarking, a host image carries secret information to protect copyrights. Reversible schemes aim to extract secret data and restore the host image, but they often involve hiding overflow/underflow information, leading to high computational complexity. Near-reversible data hiding schemes provide a practical compromise between reversibility and computational complexity, making them suitable for real-time and embedded systems where the efficiency of extraction is prioritised over perfect reversibility. This article introduces a near-reversible watermarking scheme based on mean values of paired pixels. In extraction, the scheme perfectly restores secret data and the host image with a minor deviation from the original values. The proposed scheme, tested on a standard image database, shows improved visual quality metrics compared to other methods, with an average PSNR of 53.32 dB for a watermarking capacity of 131,072 bits and an approximate SSIM of 0.9984 between the original and embedded images.
Keywords: near reversible; data hiding; pixel based; watermarking; copyright protection; data embedding; data extraction.
DOI: 10.1504/IJCVR.2024.10067685

Fruit quality monitoring and classification using convolutional neural network and fuzzy system
by K. Periyarselvam, S. Palanikumar, R. Ramamoorthi, K.B. Aruna
Abstract: The traditional techniques of hand plucking are human-made and hence susceptible to human effort. Conventional machine learning techniques for the categorisation of apple diseases rely on manually produced characteristics that are not robust and are difficult to understand. Artificial approaches like convolutional neural networks (CNNs) have emerged as a potentially useful approach to obtaining better levels of accuracy. In order to achieve a greater level of accuracy, this study studies several deep convolutional neural network (DCNN) applications to apple disease classification by making use of deep generative pictures. The purpose of our research is to achieve this objective by progressively modifying a baseline model. We use a fully learned deep convolutional neural network (DCNN) model that has fewer features and better recognition accuracy than other models like ResNet, SqeezeNet, and MiniVGGNet to do this. We did a study that compared our methods to both the most advanced CNN methods and the more standard methods that have been written about before. These methods use feature extraction to build a feature description with an accuracy rate of 93% to 95.6%.
Keywords: apple diseases; blotch; scab; rot; classification; deep learning.
DOI: 10.1504/IJCVR.2024.10067686

Underwater image processing and object detection based on deep learning
by M. Nalini, C. Pandi, Vinod Arunachalam, Sheshang Degadwala
Abstract: Object recognition from underwater sea images based on deep learning algorithms offers results that are superior. The purpose of this study is to offer a deep learning technique that may be used to perceive underwater items from improved deep underwater photos. It is used to improve the quality of the underwater picture with the very deep super-resolution network (VDSR), which has better visual clarity. Based on the suggested Border Collie Flamingo optimisation-based deep CNN classifier (BCFO-based deep CNN), the item is then found in the better underwater picture. The BCFO-based programme is the most important part of the work. Both the UIEB and DUO records are used. The suggested model got accuracy scores of 93.89% and 95.24%, sensitivity scores of 95.93 and 97.29%, and specificity scores of 98.64% and 99% when the training percentage was set to 80% and the k-fold was set to 10.
Keywords: object detection; underwater; deep learning; computer vision; very deep super-resolution network; VDSR.
DOI: 10.1504/IJCVR.2024.10067687

A self-driven action taking robot integrated with a multipurpose drone
by Chetan Vyas, Manisha Kumawat, Tanmay Kapil, Jay Shrivastava, Manisha Kowdiki, Kisanprasad Gunale
Abstract: This research paper investigates alternative materials for load-carrying drone construction to improve their durability and lifting capacity. The paper explores the potential benefits of using a load-carrying drone in conjunction with a robot that uses the Klann mechanism for movement. A load-carrying drone system that incorporates a self-driven robot capable of taking action is introduced. A user-friendly graphical user interface (GUI) is also developed, allowing users to map the drones trajectory on a 3D region by inputting GPS coordinates through a radio frequency (RF) transmitter. It is unique due to its integration with the Klann mechanised robot, which enables it to operate on any terrain. It compares different materials to find the most efficient for a load-carrying drone system. The system costs less than half of existing drones with payloads of 14 kg or more, including the self-driven action-taking robot. The system features a user-friendly GUI that enables real-time tracking of the drone.
Keywords: disaster management; heavy lifting drone; Klann mechanism; logistics; self-driven robot; user-friendly GUI; versatile.
DOI: 10.1504/IJCVR.2024.10067810

Memetic SAPSO approach for VLSI non-slicing floorplanning
by Sony Snigdha Sahoo, Prafulla Kumar Behera
Abstract: Swarm intelligence-based algorithms have become immensely popular in the last few years for dealing with NP-hard problems. Although no actual solutions exist for NP-hard problems, swarm intelligence algorithms have been used to provide optimal solutions for such problems in an optimal bounded time. Many real-life problems including travelling salesman problem, subset sum, packing, sequencing, constraint satisfaction, partitioning, etc are NP-hard. VLSI floorplanning problem has also been proved to be NP-hard, and its solution using various swarm intelligence algorithms is being explored. This paper proposes a memetic simulated annealing-based particle swarm optimisation (m-SAPSO) for generating optimal solutions to non-slicing floorplanning problems over Microelectronics Center of North Carolina (MCNC) benchmark circuits. A performance comparison has been provided among particle swarm optimisation (PSO), constrained PSO (C-PSO), simulated annealing-based particle swarm optimisation (SAPSO), and m-SAPSO over the MCNC benchmark circuits on the basis of area and deadspace. According to experimental results, m-SAPSO exhibits reductions of 28.9%, 22.6%, and 2.3% in area and 40.1%, 22.3%, and 26.9% in deadspace (DS) when compared to PSO, C-PSO, and SAPSO respectively.
Keywords: VLSI floorplanning; non-slicing floorplan; NP-hard; simulated annealing; SA; particle swarm optimisation; PSO; memetic algorithm; swarm intelligence.
DOI: 10.1504/IJCVR.2024.10067811

A new ECG segmentation for compression in e-health devices using wavelet transform and mathematical morphology
by T. Sripriya, V. Vasudhevan, M. Bharaneedharan, S. Ramesh
Abstract: There is a high incidence of cardiopulmonary disorders in the aged population. These diseases include cardiovascular disease (CVD) and chronic obstructive pulmonary disorder (COPD). For the purpose of correcting organ damage and avoiding additional harm, it is essential to diagnose cardiopulmonary illness as early as possible, to continue monitoring one's health throughout time, and to manage one's health. The majority of home health monitoring (HHM) devices, on the other hand, need to be coupled with a particular smartphone application. This makes it difficult to monitor numerous health indicators at once and reduces the likelihood that complete multi-indicator analysis will ever be possible. The proposed system demonstrates the capability to concurrently monitor multiple physiological parameters. The accuracy of the pretrained CNN-LSTM model in classifying ECG signals is 99.49%, demonstrating its high precision. Moreover, the model exhibits exceptional performance in terms of computational complexity and memory use.
Keywords: ECG arrhythmia; supporting vector machine; convolutional neural network; CNN; singular value decomposition.
DOI: 10.1504/IJCVR.2024.10067970

Computer vision and deep learning techniques for the classification of forest images
by C. Raju, C. Manonmani, Arumai Shiney Selvin Samuel, M.R. Kushalatha
Abstract: When applied to the categorisation of smoke and fire in forest photos, deep learning has the potential to identify forest conditions with greater precision. ForestResNet is a classification network that is suggested in this research to effectively identify forest conditions. It makes use of ResNet50 as a feature extraction network in order to provide speedy and accurate extraction of image feature information. We make use of two distinct deep learning architectures, namely ResNet50 and UNet, as well as two datasets that we have developed in-house, namely winter and coastal forest. Additionally, we concentrate on two distinct problem formalisations, namely multi-label patch (MLP) classification and semantic segmentation. There is a 75% true positive (TP) rate for pictures that show alien species and a 9% false positive (FP) rate. When it came to local trees, on the other hand, the TP rate was 95% and the FP rate was 10%.
Keywords: deep learning; segmentation; computer vision; forest images; multi-label patch; MLP.
DOI: 10.1504/IJCVR.2024.10068062

Autistic disorder diagnosis through EEG power spectrum
by A. Rajasekar, Nalini Chekuri, S.B. Mohan, G. Srinivas
Abstract: The objective of this study was to investigate the potential of electroencephalogram (EEG) power spectrum estimates as a biomarker for autism spectrum disorder (ASD). After establishing that there was a significant statistical difference in the features between the patients with autism and the control group, these power spectral density (PSD) estimates were classified with an average accuracy of 89.29% using the k nearest neighbour (KNN) classification algorithm. The present discovery implies the existence of potentially substantial disparities in the EEG patterns between individuals with autism and those without, indicating that the power spectrum of the EEG might potentially serve as a biomarker for autism. EEG recordings were obtained from a group of preschool-aged children, who had received an initial diagnosis of ASD from a multidisciplinary team. The electroencephalogram (EEG) recordings exhibited abnormal patterns in 78.0% of cases.
Keywords: autism; EEG; paroxysmal abnormalities; epileptic form abnormalities; diagnostic.
DOI: 10.1504/IJCVR.2024.10068278

Analysing Otsus thresholding and level-set algorithms reliability in underwater image segmentation
by Geomol George, Anusuya S
Abstract: Accurate object segmentation in underwater images is challenging due to noise, low contrast, and fluctuating illumination. This study evaluates the effectiveness of Otsus thresholding and level-set techniques for improving segmentation accuracy. The proposed algorithm first applies a color correction method to reduce color distortion in underwater images. Next, Otsus entropy method is used to semi-automatically determine the optimal segmentation threshold. Finally, a regularised level-set strategy and cuckoo search algorithm enhance the segmentation and extract relevant features. Experimental results show significant improvements, with an average Jaccard Index (JI) of 90.42%, Dice coefficient (DC) of 93.93%, and F-score of 94.93%. However, the method performs poorly on images with high noise and low contrast, exhibiting a high false negative rate (FNR) of 0.212. The findings highlight that the choice of segmentation algorithm for underwater images should consider the specific image properties, especially in fields like oceanography and marine biology.
Keywords: Otsu thresholding; level-set algorithm; cuckoo search algorithm; underwater; image segmentation; image pre-processing.
DOI: 10.1504/IJCVR.2024.10068414

Foreground and background image segmentation using HL herd EV with multilevel thresholding technique
by R. Sowmiya, P.D. Sathya
Abstract: The primary purpose of this work is to segment foreground and background image by multilevel thresholding using hybrid leader herd energy valley (HL herd-EV). Initially, the input image is pre-processed by non-local means (NLM) filtering technique and region of interest (ROI) extraction. Then, multiple threshold methods are used to segment background and foreground images. The multilevel threshold techniques are tuned using HL-herd EV by consideration of multi objectives such as entropy based kapur, ostu thresholding, masi entropy, Renyi entropy, Tsallic entropy and minimum cross entropy. The HL-herd EV is an amalgamation of hybrid leader corona virus herd optimiser (HLCHO) and energy valley optimisation (EVO). The HLCHO is the integration of hybrid leader-based optimisation (HLBO) with the coronavirus herd immunity optimiser (CHIO). Furthermore, HL herd-EV achieved extreme peak signal-to-noise ratio (PSNR) of 40.826 dB, dice coefficient of 0.889, and uniformity measure of 0.916.
Keywords: image segmentation; multilevel thresholding; energy valley optimisation; EVO; masi entropy; foreground and background image.
DOI: 10.1504/IJCVR.2024.10068415

Deep learning approaches for video compression - AI in computer vision
by P. Chandrakanth, T. Vamsivardhan Reddy, N. Krishna Kumar, Surya Kiran Chebrolu
Abstract: The demand for video streaming has skyrocketed, and this is a significant obstacle for service providers in terms of both the streaming and storage of video content. These problems are able to be resolved as a result of the development and advancements in deep learning. This research presents a technique for compressing videos using neural networks that outperforms the H.264/AVC video coding standard. The performance of this approach is evaluated using the multi-scale-structural similarity index (MS-SSIM). Encoder and decoder are the two components that make up the proposed neural network model, which is a multi-layer architecture designed to process information. The training of the two components of the model takes place simultaneously. The objective of the whole model was to make an effort to be profitable by using the temporal and spatial correlations that exist between the frames of a movie. The achieved compression ratios ranged from 27.87 to 34.48.
Keywords: general VQA datasets; preliminaries; spatial context; frequency domain supervision.
DOI: 10.1504/IJCVR.2024.10068572

Multi-view attention-driven recurrent neural network for faster 3D shape recognition
by Lalitha Swamy Thayumanavar, Sukumar Nandi, Bunil Kumar Balabantaray
Abstract: Recognition of 3D objects poses a significant challenge and remains a growing research area in computer vision and related fields. The existing view-based 3D object recognition methods have shown remarkable results in classifying object classes. However, the trade-off between computational efficiency and model performance is a crucial consideration in the field. Striking an optimal balance between computational efficiency and model accuracy is a challenging task in this field. In order to address this problem, we propose an end-to-end multi-view attention RNN for fast and rigor classification of 3D objects. To reduce the computational cost, we introduce MobileNetV2, pre-trained architecture as the backbone of our model. To correlate the multiple views for better understanding and for recognising complex inter dependencies, we incorporate the RNN and the spatial self-attention modules respectively. The proposed framework exhibits accuracy of 96.63% while significantly reducing computational time, surpassing state-of-the-art methods in 3D object classification on ModelNet40.
Keywords: object classification; attention; recurrent neural network; RNN; MobileNet.
DOI: 10.1504/IJCVR.2025.10068625

Automated image captioning enabled by intelligent parameter tuning-based CNN and RNN with attention mechanism
by Kiranmai Rage, T.M. Minipriya, K. Hema Priya, K. Valarmathi
Abstract: Image captioning has been considered as a recent paradigm, which is mainly employed to produce significant information from images. The existing image captioning systems do not get semantic discriminability in the resultant captions. It needs a new and intelligent model for solving the above-mentioned problem. The major aim of this technique enhances a novel image captioning model with hybrid deep learning with an attention mechanism. The CNN is fixed as the encoder architecture, which is responsible for retrieving the feature, and the RNN is fixed as the decoder architecture, which is responsible for creating the captions. Here, the parameter tuning of CNN and RNN is performed by the adaptive mixture ratio-based cat swarm optimisation (AMR-CSO). Finally, the outcomes expose that the offered framework enhances the working efficacy of image captioning image-text and matching. From the simulation findings, the precision rate and the accuracy rate of the designed method are 94% and 96%.
Keywords: convoultional neural network; image captioning; encoder-decoder framework; adaptive mixture ration-based cat swarm optimisation; recurrent neural network.
DOI: 10.1504/IJCVR.2025.10068725

Antibacterial activity against MDR pathogens with identification and profiling of marine actinomycetes
by M. Jayaprakashvel, V. Ramabhai
Abstract: This study explores the identification and characterisation of marine actinomycetes with antibacterial activity against multi-drug-resistant (MDR) pathogens. Molecular techniques, including DNA sequencing and phylogenetic analysis, were used to analyse the genetic composition and bioactive gene clusters of the strains. Out of 24 isolated marine actinomycetes, 15 demonstrated significant antibacterial activity against MDR pathogens such as Klebsiella pneumoniae, methicillin-resistant Staphylococcus aureus (MRSA), vancomycin-resistant Enterococcus (VRE), and Escherichia coli (E. coli). Strain KM2, identified as Streptomyces olivaceus (GenBank OQ299561.1), showed the highest inhibition rates, with zones of inhibition measuring 17 mm for MRSA, 16 mm for K. pneumoniae, 14 mm for VRE, and 13 mm for E. coli. These findings demonstrate the potential of marine actinomycetes as a promising source of new bioactive compounds to combat MDR pathogens, addressing the critical issue of antibiotic resistance.
Keywords: molecular profiling; marine actinomycetes; antibacterial activity; multi drug resistance pathogens; strains; gene sequencing.
DOI: 10.1504/IJCVR.2025.10068726

Design of ensemble model for real-time moving vehicles detection
by Bibhuprasad Mohanty, Tamanna Sahoo, Mihir Narayan Mohanty
Abstract: In this digital world, cities are converting to smart cities generally, where the traffic problem is still existing. It needs to detect moving vehicles whenever required by the authorities and the public. In this work, an ensemble model is proposed for vehicle detection that provides the desired level of accuracy. Different videos are taken from the database and fed to two different base models as convolution neural networks (CNNs) parallelly. However, the input to CNN1 is supplied the augmented raw input, and CNN2 has the corresponding features. The output from the base models is ensembled and felt to the meta classifier chosen as MLP. Wavelet transform is utilised for feature extraction those are fed to base CNN model 2. A unique vehicle detection model that performs feature extraction and classification, respectively is presented. The experimental results are tested in a two-position PTZ camera sequence and intermittent pan sequence with more than 94% average precision, with acceptable visual accuracy of moving Vehicles detection.
Keywords: moving Vehicles detection; feature extraction; deep learning techniques; ResNet 50; YOLO.

Hybrid CNN model for arrhythmia classification using ECG
by Shashank Dwivedi
Abstract: This paper presents a novel method for classifying heartbeats from ECG signals using a hybrid convolution neural network (CNN) architecture. The approach aims to improve the efficiency and reliability of arrhythmia diagnosis. The raw ECG data undergoes noise reduction and heartbeat segmentation before being converted into 2D ECG images. The training dataset is augmented to address feature variations among different arrhythmia groups. Three pre-trained neural networks (VGG, DenseNet, and ResNet) are fine-tuned with 2D ECG images and combined using a stacking technique to create a hybrid CNN model. This model is trained and evaluated on a dataset with 38 distinct heartbeats and 37 unique arrhythmias, achieving a classification accuracy of 99.03%. The proposed framework incorporates transfer learning and ensemble learning techniques, enhancing its resilience and performance in classifying various irregular heartbeats or arrhythmia.
Keywords: arrhythmia classification; electrocardiogram; hybrid CNN; transfer learning; ensemble learning.
DOI: 10.1504/IJCVR.2025.10068903

Automated security model for data driven IoT ecosystem
by V.S. Saranya, G. Ramachandran
Abstract: The development of IoT systems has brought complex operational challenges, particularly in ensuring the robust security of data-based environments. To address these challenges, a new automated security model has been proposed that leverages advanced data analytics to detect anomalies in IoT systems. This model integrates deep learning and artificial intelligence technology to identify anomalous behaviour, reducing the risk of manual intervention errors. The use of optimal neural network architectures such as autoencoders improves the classification process and the control systems ability to mitigate potential security threats. By combining these methods, the proposed model proposes a proactive approach to protecting IoT ecosystems, which is critical for data integrity and system reliability in todays connected world.
Keywords: IoT; artificial intelligence; deep learning; security; autoencoders; datacenters; BotIot datasets; data sampling; dimensionality reduction; resampling.
DOI: 10.1504/IJCVR.2025.10068939

Empowering gesture-controlled game for all abilities in the form of GestPlay using KNN
by Ganesh Kumar Yadav, Hardik Swarnkar, Harish Kumar, Dhruv Dutt Sharma
Abstract: Artificial intelligence that can acquire knowledge from data and forecast future events. The activities might be developed to help people with disabilities hone their social, cognitive, and athletic abilities. Computational devices that support webcams can also be used with these games to turn them into smart devices. Additionally, allow us to block players from touching the gadgets while playing specific games, enabling social separation in these trying times like COVID-19.Games that use machine learning to recognise hand gestures are gaining popularity because they give players a fresh and engaging way to connect with technology. A camera can follow hand motions and recognise gestures in real-time using computer vision techniques, enabling users to engage with the game through their gestures. Machine learning algorithms in the game adjust to the individual playing styles of each player, giving them a customised gaming experience. The game responds when players make certain hand movements that represent particular actions, like left and right. Machine learning hand gesture games could revolutionise how we play and interact with video games as technology advances.
Keywords: OpenCV; MediaPipe; Numpy; keyboard; OS; Tkinter.

Image super-resolution algorithm based on concatenating of Fourier transform and back-projection techniques
by Ahmad Faramarzi, Alireza Ahmadyfard, Hossein Khosravi
Abstract: This paper presents an algorithmic approach for image super resolution, which is an essential application in image processing and machine vision. Despite the high accuracy of deep learning-based methods, they require a large number of data samples and have high computational complexity and runtime. The proposed method first magnifies the image using the Fourier transform and then increases the resolution using Back-Projection and Gaussian filter procedures. The results show a significant improvement in PSNR and SSIM standards, with an execution time that is at least four times faster than the most optimal methods previously examined. The proposed algorithm also exhibits superior performance compared to deep learning-based approaches in terms of execution time.
Keywords: image enhancement; Fourier transform; super-resolution; Gaussian filter; back-projection; BP.
DOI: 10.1504/IJCVR.2025.10069878

LAEM: LSD attention-driven enhanced MobileNetV2 for image forgery classification
by Phannong Youngkuk, Rajashree Nayak, Sonali Samal, Bunil Kumar Balabantaray
Abstract: Image forgery refers to the manipulation or tampering of an image to obscure or falsify information. Detection of forgeries poses significant challenges in handling subtle manipulations and to achieve high accuracy in low-data scenarios. In response to these challenges, a novel forgery detection model, LSD-attention-driven enhanced MobileNetV2 (LAEM) is designed to enhance detection efficiency, even with limited data. The model incorporates a learnable scaling dot product (LSD)-attention mechanism, which improves feature extraction by assigning attention weights between spatial locations, allowing the model to focus on relevant regions of the image. Integration of learnable scaling factors further enhances the models ability to differentiate between genuine and forged sections whereas, MobileNetV2 is chosen for its computational efficiency and effectiveness in low-data scenarios. The LAEM model greatly enhances discriminative performance, achieving a remarkable 97% testing accuracy and 98% precision, surpassing the existing state-of-the-art methods.
Keywords: Image forgery; LSD-attention-driven enhanced MobileNetV2; LAEM; LSD-attention; MobileNetV2.
DOI: 10.1504/IJCVR.2025.10069696

Privacy-preserving machine learning and robust cryptography in TensorFlow for empowering secure IoT analytics
by Kritika Purohit, Surendra Yadav
Abstract: Integrating privacy-preserving machine learning with robust cryptographic techniques, particularly leveraging the TensorFlow platform, holds significant promise in extracting valuable insights from encrypted data while addressing privacy concerns. This collaboration bridges disciplines like encryption, machine learning, distributed systems, and high-performance computing. Machine learnings user-friendly interface enables the application of cutting-edge cryptographic methods, even for non-experts, ensuring security across the entirety of TensorFlow applications, from input data to models and code. This study introduces an innovative cryptographic framework within TensorFlow, demonstrating enhanced reliability and up to 90% improved efficiency over conventional methods. These findings underscore its potential to revolutionise privacy-conscious data utilisation, especially in the context of internet of things (IoT) applications.
Keywords: privacy-preserving; machine learning; robust cryptography; TensorFlow; Secure IoT; analytics; internet of things; data privacy.
DOI: 10.1504/IJCVR.2025.10070176

Heuristic-aided adaptive hybrid federated learning model for emotion recognition from face image and electroencephalography signals
by Dilsheen Kaur, Anuradha Misra, O.P. Vyas
Abstract: Facial expression recognition is useful for understanding individual expressions and dissimilar classes of facial expressions. The main novelty of the research work is to develop an integrated deep learning model that recognises the emotions of the human with the help of facial images and EEG signals to achieve reliable results in the emotion recognition process. From, the face images global and local features are extracted and feature fusion of both local and global features is done which is considered as features set 1 and deep features from EEG is represented as feature set 2. Consequently, the weighted feature fusion is conducted on two feature sets, where the weight is optimised using enhanced position-based hybridised reptile search with artificial bee colony optimisation (EPHRS-ABCO), and it is given to the efficient deep belief and gated recurrent unit (EDBGRU) for classifying the emotions. The suggested emotion recognition model attained accuracy value of 96.6%.
Keywords: emotion recognition; feature extraction and fusion; hybridised reptile search with artificial bee colony optimisation; face images; EEG signals; efficient deep belief and gated recurrent unit.
DOI: 10.1504/IJCVR.2025.10070635

Approach for mobile robot navigation and tracking using vision sensor in dynamic environment
by Sangram Keshari Das, Bala Karthikeya Nandula, B.K. Rout
Abstract: Most recently researchers have been working on implementation of navigation and tracking algorithms for exploration of mobile robot in a dynamic environment. The paper deals with mobile robot with vision sensor for the safe path tracking and navigation in an unknown environment among static and dynamic obstacles which has shortest path possible from any start to goal. In this case, D* lite path planning algorithm for robot navigation in an unknown environment has been implemented and the overhead vision sensor is used to capture the movement of the mobile robot. A tracking-learning detection (TLD)-based algorithm was used to evaluate the tracking of mobile robot path. The tracking error after implementation in terms of positional error is observed to be in the range of 810%. This could be attributed to the non-uniform speed and the change in orientation of the mobile robot while navigating. Similarly, Kalman filter-based de-noising technique is also applied to the output from TLD algorithm, which produced tracking error in the range of 4-7%.
Keywords: D* path planning; tracking-learning detection; TLD; Kalman filter; Robot Operating System; ROS.
DOI: 10.1504/IJCVR.2025.10071882

Discretise indoor mobile robot enhance path planning algorithm
by Rahul Shivaji Pol, Vijaya N. Aher
Abstract: The trajectory planning for indoor application using mobile robots is now gaining large interest of researchers so as to explore more optimal solution to save robots to improve travelling time, power and path cost. The optimal path is the major performance parameter of any path planning algorithms (PPA), as it decides the trajectory to move mobile robot to reach the target. The latest robotic systems are equipped with multiple on board sensors that helps it to navigate through environment by computing shortest path and distance. The sensory system many a times fails to explore more safest and feasible path. The proposed novel enhanced optimal path planning algorithm (EOPPA), utilise unconstrained edge-based path connectivity along with cell pruning techniques to improve the explored path length. This methodology also follows safer distance rule that helps to avoid future collision of mobile robot with in-path obstacle. Thus by utilising the uniquely designed set of rules, this method improves the overall computation time and path cost requires by 27% compared to traditional path planning algorithm.
Keywords: path planning; optimise path planning; robotic navigation; grid-based path planning; indoor mobile robot navigation; path cost.
DOI: 10.1504/IJCVR.2025.10073038

Blockchain driven intrusion detection using slime mould optimisation algorithm with ensemble of deep learning models
by C. Ananth, Sathiyarani Sakthivel, Mohananthini Natarajan
Abstract: In the last decades, intrusion detection systems have been crucial cyber-security tools that monitor network traffic for malicious activity and prompt actions to mitigate it. However, their accuracy is often limited by inadequate training data or incorrect thresholds, leading to high false positive rates. Blockchain technology enhances IDS accuracy by providing a secure, decentralised, and immutable ledger for tracking suspicious activities over time. This study develops a BC-driven intrusion detection using a slime mould optimisation algorithm and an ensemble of deep learning (BID-SMOAEDL) models. The model uses a Z-score normalisation approach, SMOA for feature selection, and an ensemble of deep belief networks, Elman neural networks, and BiLSTM for identification and classification. Based on experiments, it has been observed that the BID-SMOAEDL approach has gained an accuracy (99.19%), precision (99.19%), recall (99.19%), F-score (99.19%) and error rate (0.03). The BID-SMOAEDL models performance analysis on the TONIOT dataset showed superior results compared to recent models.
Keywords: intrusion detection system; IDS; security; blockchain; ensemble learning; slime mould optimisation; hyperparameter selection; false positive rates; FPR; deep belief network; DBN; Elman neural network; ENN.
DOI: 10.1504/IJCVR.2025.10074464

An efficient approach to detecting and locating abnormal regions on coronary artery image by directional vectors
by Tang Thi Phuong Linh, Le Nhi Lam Thuy, Gwang Hyun Yu, Jin Young Kim, Pham The Bao
Abstract: Coronary artery disease (CAD) is a major threat to human health. CAD is the most frequent type of cardiovascular diseases (CVDs). In clinical practice, identifying abnormal regions (including aneurysms and stenosis) on the blood vessels is critical. When the abnormal regions become severe, it will restrict blood flow and lead to symptoms such as angina or myocardial infarction. Thus, in this study, we propose an approach for detecting and locating abnormalities on coronary X-ray images. First, we based the direction of a vector to determine abnormal region candidates (ARCs) on the extracted blood vessels. Then, based on the width of the circuit, we determine the true abnormality region from the ARCs and calculate the percentage of coronary artery abnormalities. The experiment results from our private dataset achieved the recall of 0.9633 and the precision of 0.9375. The result comparison also demonstrated that our method outperforms existing stenosis detection methods. The proposed approach has great promise for clinical uses and supporting CAD diagnosis.
Keywords: abnormal location; directional vector; coronary artery.
DOI: 10.1504/IJCVR.2025.10075128

Securing digital transactions: empowering e-banking with a blockchain-powered fraud detection system enhanced by self-attention and augmented Wasserstein generative adversarial networks
by R. Akilandeswari, S. Malathi
Abstract: Blockchain technology has received considerable attention for its beneficial applications in data privacy, system security, and integrity. Malicious activities in bitcoin networks, e-banking, and online transactions, such as fraud and anomaly attacks, are continuously increasing due to the rapid development of fraudulent methodologies. In order to overcome the mentioned issues, this paper presents a blockchain-based efficient fraud detection system for e-banking and online transactions by developing a self-attention augmented Wasserstein GAN network, namely, BC-EFD-EOT-SAAWGAN. E-banking and online financial data are preprocessed by modifying a Hamilton filter and then split into two sets, namely, training and testing sets. The proposed system is divided into two parts, namely, a blockchain system for secure online transactions and a deep learning system for accurate fraud identification. A PoM-based smart contract is utilised for online legitimate transaction prediction, and an attacker model is used for increased system security. The proposed approach is validated by comparing its outcome for various parameters, such as accuracy, precision, F-score, sensitivity, specificity, and response time, with other existing models.
Keywords: self attention augmented Wasserstein generative adversarial; proof-of-majority; modified Hamilton filter; blockchain-based transaction.
DOI: 10.1504/IJCVR.2026.10076323

Remote heart rate estimation from facial video: a review to PPG and BCG methods
by Amal Adouani, Wiem Mimoun Ben Henia
Abstract: This paper evaluates the performance of different remote heart rate (HR) methods in extracting HR from facial videos. Recent works have demonstrated the possibility of extracting this parameter via recorded videos containing the human face without applying invasive sensors. These works are based on two types of methods: colour-based and motion-based, applied in different domains, especially healthcare. The multimodal MAHNOB-HCI database containing facial videos from participants with different gender, age and emotional state was used for comparison. Experiments were conducted using 20 facial video sequences. Average pulse rate and mean error (ME) were calculated to estimate accuracy. Standard deviation of error (SD) and root-mean-square error (RMSE) measured agreement and error between each method and its reference system. Bland-Altman plots were implemented for comparison. Results showed that remote HR estimation methods can be very accurate under constrained conditions: limited movement for motion-based methods and controlled illumination for colour-based methods.
Keywords: heart rate; physiological signals; independent component analysis; ICA; photoplethysmography; computer vision.
DOI: 10.1504/IJCVR.2026.10076918

Integrated IoT-WSN forest fire detection system using FoCnn for enhanced monitoring in Dalma Sanctuary
by Suprava Ranjan Laha, Binod Kumar Pattanayak, Rina Mahakud, Satyaprakash Swain, Debasish Swapnesh Kumar Nayak, Saumendra Pattnaik
Abstract: Forest fires substantially peril human communities, ecosystems, and biodiversity of fire-prone Dalma Sanctuary. Integrating IoT-WSN technologies with a specialised forest convolutional neural network (FoCnn), the study aims to enhance monitoring and early detection of fire risks. The primary objective is to develop and evaluate an Integrated IoT-WSN forest fire detection system using FoCnn for enhanced monitoring in Dalma Sanctuary, focusing on improving accuracy and efficiency. Real-time data from IoT-based cameras and sensors are utilised for early fire risk detection. Evaluating raw and augmented datasets demonstrates the models robustness, consistency, and generalisability in accurately detecting and classifying forest fires. Results show the superior performance of the proposed FoCnn algorithm, reaffirming its efficacy in proactive wildfire management. The findings suggest significant progress in wildfire monitoring technologies, enabling proactive strategies to mitigate the impact of forest fires. The proposed system has implications for improving wildfire control strategies in the Dalma Sanctuary and beyond, contributing to forest conservation efforts globally. Our FoCnn model gives 85% accuracy on raw data and 98.82% while utilising the augmented data. However, our study presents an innovative integrated IoT-WSN forest fire detection system, offering a comprehensive approach to forest fire detection with potential applications in proactive wildfire management.
Keywords: forest fire monitoring; Dalma Sanctuary; IoT-WSN; augmentation; FoCnn; biodiversity.
DOI: 10.1504/IJCVR.2026.10077016

Towards lightweight Crystals-Kyber-based homomorphic encryption for PostQuantum secure internet of things
by Ganesh Kumar Mahato, Swarnendu Kumar Chakraborty, Anwesha Banik, Sumitra Nayak
Abstract: CRYSTALS-Kyber is the first quantum-resistant mechanism selected for standardisation by the National Institute of Standards and Technology (NIST). Due to its efficient implementation on field-programmable gate arrays (FPGAs), it is often used in internet of things (IoT) applications. This work is dedicated to developing an optimised hardware architecture for CRYSTALSKyber, with the main objective of maximising parallelisation and efficiency in IoT environments through techniques such as inter-module and intra-module pipelining. Implementation results on Artix-7 and Zynq UltraScale+ devices show significant speedups of 25-51% and reductions in digital signal processing (DSP) blocks by 50-75% across Kybers security levels, making it ideal for resource-constrained IoT devices. Moreover, the proposed scheme achieves greater area-time (AT) product efficiencies of 21-35% compared to existing approaches, further enhancing its suitability for IoT applications. This work presents a high-performance and resource-efficient solution tailored for IoT deployments, contributing to the advancement of secure communication in IoT ecosystems.
Keywords: CRYSTALS-Kyber; polynomial multiplier decryption; PostQuantum cryptography; number theoretic transform; NTT.
DOI: 10.1504/IJCVR.2026.10077528

Optimisation for improving the efficiency, efficacy, and effectiveness in CBIR: an overview
by Mawloud Mosbah
Abstract: In this paper, we talk about optimisation tools commonly considered in CBIR systems and its direct repercussion in terms of efficacy and efficiency aspects as well as a possible influencing on accuracy. The paper, presented as an overview, answers then the following questions: which tools are employed for optimisation purpose and in which CBIR component we consider them. We deal also with learning methods in accordance with optimisation as well as the combination of this scheme with other mechanisms for building a real life CBIR system with good performance parameters.
Keywords: CBIR; optimisation; efficiency; efficacy; effectiveness.
DOI: 10.1504/IJCVR.2026.10077677

MPD-CapsNet: a model based on capsule network for determination of melting point of crystalline chemical substances
by Anurag Shrivastava
Abstract: The melting point is one of the distinguishing characteristics of crystalline solids, it is the most common thermal investigation procedure used to characterise solid crystalline chemicals, identify chemical substances, and assess their purity in research and development. The change detection methods and CNN-based deep learning model for the classification of chemical substance images have been used for determining the melting point of chemical substances. However, change detection methods have certain limitations: they require a threshold value to determine the changes, and CNN-based models are incapable of properly handling input transformations. Capsule networks are fresh, newly emerging machine learning architectures that were recently developed to address CNNs inadequacies and required a smaller training dataset. Thats why we have used this in our proposed work. The objective of this proposed work is to design a capsule network-based model for melting point determination (MPD-CapsNet) that improves the efficiency of the current classification problem for classifying images of chemical substances states (the DCSS dataset) and compare its results with those of a previously used CNN-based model. Our findings suggested that the proposed MPD-CapsNet model can successfully overcome CNNs inadequacies, and the maximum accuracy achieved by the proposed MPD-CapsNet model is 99.69%.
Keywords: capsule network; CNN; melting point; chemical substances; SegCaps.
DOI: 10.1504/IJCVR.2026.10078059

A comprehensive study of various types of computer aided liver cancer detection
by Mohammad Anwarul Siddique, Shailendra Kumar Singh, Moin Hasan, Tanveer Quazi
Abstract: Liver cancer is considered among the principal cause of mortality due to cancers worldwide. Survival rate in liver cancer is found to be extremely low because of the complexities in early diagnosis, hasty progression, and limited availability of targeted drugs. Like any other serious disease, successful treatment of liver cancer also requires an early and accurate diagnosis. Implementation of efficient cancer detection system not only requires the technical knowhow, but also a detailed and comprehensive study of various works carried out in that field. Several eminent authors have endeavoured to present the review of liver cancer detection. However, it has been observed that the available reviews are limited on various perspectives and do not provide an extensive knowledge-base in the relevant area. Presented review covers the methodologies, datasets, evaluation metrics, and challenges/constraints of the various machine-learning (ML) and deep learning-based (DL) liver cancer detection approaches.
Keywords: liver cancer detection; liver segmentation; machine learning; ML; deep learning; DL; medical imaging.
DOI: 10.1504/IJCVR.2026.10078060

Underwater image enhancement using anisotropic diffusion and multiscale fusion strategy
by Rekha Chaturvedi, Vishnu Soni, Jitendra Rajpurohit, Abhay Sharma
Abstract: Since the transmission of light through water leads to scattering, consequently underwater images thus often afflicted by several types of degradation such as poor contrast, haziness, blurring, and colour distortions. In order to resolve these kinds of problems, we devise a novel technique that combines anisotropic diffusion to effectively split the LAB colour space's L-channel into base and detail images with the aim of reducing noise while simultaneously preserving the salient features of underwater images. The method further performs a fusion process to quantify three weight maps using a variety of strategies and yield normalised weight maps for each image. To achieve enhanced final image, we consolidate the blended contributions of all levels after appropriate upsampling. Lastly, we restore the enhanced underwater image by converting the blended enhanced LAB to RGB colour space image. Enhancement of image quality is measured in terms of Entropy, PCQL and UIQM. UIEB dataset has been used to implement our proposed method and experimental findings shows that our method outperforms the LAFFNet, deep residual, retinex based methods. It also works well for the underwater images having colour distortion, poor contrast and detail loss.
Keywords: underwater image enhancement; multiscale fusion; anisotropic diffusion; weight maps; Laplacian pyramid.
DOI: 10.1504/IJCVR.2024.10063999

Mammogram mass segmentation using evolutionary algorithm-based single layer neural network
by Sunita Sarangi, Harish Kumar Sahoo
Abstract: Mammography is the most reliable method for detecting breast cancer in its early stages. Breast region segmentation is a fundamental procedure for analysing mammograms. This paper presents an improved segmentation approach using a hybrid model using a functional link artificial neural network (FLANN) based on particle swarm optimisation (PSO). The suggested segmentation technique makes use of a threshold for segmentation that is adaptively adjusted by the image attributes. A comparison has been made between three expansion techniques used for input to the FLANN, they are exponential FLANN (EFLANN), Chebyshev FLANN (CFLANN), and Legendre FLANN (LFLANN). 110 images from mini-MIAS and DDSM databases are used for comparison. The performance measures for CFLANN and LFLANN are found to be better than Exponential FLANN (EFLANN).
Keywords: mammogram; adaptive threshold; EFLANN; CFLANN; LFLANN; particle swarm optimisation; PSO.
DOI: 10.1504/IJCVR.2024.10063552

Sewer shad fly optimisation based efficient skin lesion detection using capsule neural network
by Vineet Kumar Dubey, Vandana Dixit Kaushik
Abstract: In this research, sewer shad fly optimisation (SSFO) is developed to detect the skin lesion using capsule neural network. HAM10000 dataset is first accessed for input, after which pre-processing is carried out. ROI is segmented using an optimised clustering-based segmentation method based on sewer shad fly optimisation, created as a result of mayfly and moth flame optimisation. The segmented region is sent for feature extraction, which is carried out using both grid-based statistical features and a hybrid ternary pattern. The recovered region is sent to the Capsule Neural Network classifier, uses the sewer shad fly optimisation algorithm to adjust the classifier's weights and bias to accurately detect the skin lesion. The proposed SSFO-CapsNet NN attained the values for TP 90 is 96.45%, 98.00%, 94.28% and while measuring k-fold 10 it attains 95.89%, 98.57%, and 95.76%.
Keywords: capsule neural network; skin lesions classification; sewer shad fly optimisation; SSFO; transfer learning; and resnet-101.
DOI: 10.1504/IJCVR.2024.10064099

Melanoma skin cancer identification from dermatoscopy images by machine learning using Thepade SBTC and triangle thresholding
by Sudeep D. Thepade, Deepa Abin, Aasim Sayyad, Zahid Akthar, Rik Das
Abstract: Melanoma skin cancer is prevalent in all types of cancer and can be completely treated if detected in the initial stages. Even for medical practitioners, discrimination of melanoma skin lesions from other skin scars resulting from sunburn or rashes is difficult. Machine learning (ML) can help in skin cancer detection (SCD) from dermatoscopy images. The study proposes the fusion of global and local features computed with Thepade_Sorted_Block_ Truncation_Coding (TSBTC) and the triangle thresholding method. Nine variations of the TSBTC are explored for feature formation (TSBTC 2-ary to TSBTC 10-ary). Each of the variations of the features is provided to eight classifiers like BayesNet, RandomForest, J48, NBTree, RandomTree, REPTree, IBK, and Kstar, with three ensembles. The classification is performed with dataset HAM10000. Compared to individual feature considerations, better SCD performance is observed with feature fusion of TSBTC and triangle thresholding. Ensemble 'RandomForest+IBK+NBTree' gives good accuracy for melanoma SCD.
Keywords: melanoma identification; triangle thresholding; ML classifiers; Thepade SBTC; global and local features.
DOI: 10.1504/IJCVR.2025.10070121

Risk estimation of breast cancer patient with METABRIC clinical data: an elucidative study of machine learning algorithms with time sensitive information
by Rajan Prasad Tripathi, Sunil Kumar Khatri, Darelle Van Greunen, Danish Ather
Abstract: Breast cancer is a prevalent and life-altering disease that demands precise prognostic tools to guide treatment decisions. Machine learning (ML), with its data-driven capabilities, has emerged as a promising avenue for improving breast cancer prognosis. In this study, we harness the power of machine learning to predict breast cancer survival using clinical data sourced from the METABRIC dataset. Our research sheds light on the critical clinical factors that intimately influence patient outcomes. Among seven distinct algorithms evaluated, Logistic Regression stands out with the highest accuracy of 78%. Notably, our findings underscore the pivotal role of time-related data in enhancing predictive performance, advocating for its inclusion in future prognostic models. We identify positive correlations between survival and parameters such as tumour size and breast-conserving surgery, where the latter exhibits a correlation coefficient of 0.18. Conversely, a negative correlation emerges with breast mastectomy surgery, with a correlation coefficient of -0.18. This study not only points to robust machine learning models for prognosis but also highlights the intricate interplay between time-sensitive information and breast cancer prognosis. By doing so, it deepens our understanding of breast cancer prognosis and potentially informs more effective treatment strategies.
Keywords: breast cancer; METABRIC; machine learning; patient survival; risk estimation.
DOI: 10.1504/IJCVR.2024.10064463

Automated kitchen waste segregation system via convolutional neural network
by Teh Boon Hong, Sarah 'Atifah Saruchi, Ain Atiqa Mustapha, Nor Aziyatul Izni, Wan Zailah Wan Said, Noor Idayu Mohd Tahir
Abstract: Composting is one of the efficient and practical methods to manage kitchen waste. The initial process of the composting system is the kitchen waste segregation between compostable and non-compostable categories. However, currently, the segregation process is carried out by human labour. Thus, to reduce the human labour burden, this study proposes an automated kitchen waste segregation system by deep learning method to classify kitchen waste into two groups: compostable and non-compostable. A convolutional neural network (CNN) model with different learning algorithms and several epochs is applied to perform the segregation. A prototype consisting of a camera, sensors, and motors is developed to validate the performance efficiency of the proposed model. Results show that the integration of CNN into the proposed kitchen waste segregation system manages to segregate the waste successfully without human involvement. This output is expected to contribute to supporting the waste management and composting campaign thus leading to a better environment.
Keywords: kitchen waste; composting; convolutional neural network; CNN; internet of things; IoT; automation.
DOI: 10.1504/IJCVR.2024.10065039

Application of artificial bee colony algorithm as numerical solution for first order IVPs and industrial robot arm control problem
by V. Murugesh, G. Sanjiv Rao, Manoj Singhal, Rajnesh Singh, Sunil Gupta
Abstract: The current research article presents a novel analytical method with the help of artificial bee colony (ABC) algorithm to overcome the industrial robot arm control problem and 1st order initial value-based ordinary differential equations (ODEs). The current study took ten problems in addition to industrial robot arm control problem into consideration to establish the effective outcomes of ABC algorithm. In terms of actual solutions, the outcomes were compared with that of the results from RK-Gill algorithm and RK-Butcher algorithm and were inferred to be highly accurate. The results infer that it is easy to implement ABC algorithm and obtains solution for any time period.
Keywords: Runge-Kutta method: RK-Butcher algorithm; RK-Gill algorithm; ABC algorithm: ODEs; artificial bee colony; ordinary differential equations; first order IVPs.
DOI: 10.1504/IJCVR.2024.10065106

Fingerprint template protection: cancellable biometrics
by Ayesha S. Shaikh, Vibha D. Patel
Abstract: Biometric authentication systems have become more popular nowadays because of mobile and other handheld devices since they eliminate the need for a password or pin to remember. If an intruder hacks biometric traits, there is no way to change the biometric traits of any person because they are permanently attached to the person. The biometric traits are not replaceable like passwords; hence, the key research area is privacy preservation. To stop such biometric traits from being stolen or used improperly, secure technology solutions must be developed. In order to provide a reliable and secure biometric authentication system, we present a cancellable biometrics technique. We proposed a highly secure method for cancellable biometrics using a speeded up robust feature approach for image feature extraction, which is followed by a fast Fourier transform with an index of max hashing and Hadamard product vector for the protection of the biometric template. On a standard dataset FVC2002-DB1 and DB2, we tested and assessed the suggested strategy, and we got reasonably decent results.
Keywords: fingerprint biometrics; template protection; cancellable biometric; security and privacy preservation.
DOI: 10.1504/IJCVR.2024.10063568

Exploring community detection algorithms for sustainable social networks
by Tabrej Ahamad Khan, Imran Hussain, Mohd Abdul Ahad, Siddhartha Sankar Biswas
Abstract: Community detection plays a vital role in understanding the structure and dynamics of social networks. The paper examines the effectiveness of three prominent community detection algorithms - Girvan-Newman, Walktrap, and Fluid Communities - in understanding social network dynamics. Utilising the Nashville Meetup Network dataset, it evaluates the algorithms' performance in identifying cohesive groups within the network. The study analyses results based on modularity, clustering coefficient, and community size distribution. Additionally, it explores different graph layouts to visually represent detected communities. The research aims to provide insights into algorithm strengths and limitations, aiding in selecting suitable community detection methods and graph layouts for similar social network analyses, particularly from online community platforms like Meetup.
Keywords: community detection; social networks; Girvan-Newman algorithm; Walktrap algorithm; Fluid Communities algorithm; network analysis; cohesive groups.
DOI: 10.1504/IJCVR.2025.10072731

An efficient hybrid model for localisation and grading of diabetic retinopathy using fundus images
by Pammi Kumari, Priyank Saxena
Abstract: Diabetic retinopathy (DR) is the leading factor affecting the visions of many. This study aims to develop a computationally efficient deep learning (DL) framework for DR grading (0 to 4) to overcome the limitations of computationally inefficient existing DL models. This prompted us to use a small-scale architecture (MobileNetV2) integrated with a support vector machine (SVM) for DR grading on the APTOS dataset. A computationally light MobileNetV2 has considerably fewer trainable parameters, making it suitable for edge devices. The integration of SVM provides flexibility in tuning the essential characteristics of the dataset and enhances the grading performance efficaciously. The gradient-weighted heatmap technique is incorporated for disease localisation to visualise the affected regions adequately. The investigation's outcome substantiates the proposed architecture's efficiency over the existing DL methods, achieving a test set accuracy of 80% for multilevel and 96% for binary classification with a minimum testing loss.
Keywords: diabetic retinopathy; DR; support vector machine; SVM; Grad-CAM; deep learning; hybrid architecture; APTOS; MobileNetV2.
DOI: 10.1504/IJCVR.2024.10063875

A novel signature recognition system using a convolutional neural network and fuzzy classifier
by Ouafae El Melhaoui, Soukaina Benchaou, Redouan Zarrouk
Abstract: The present work provides a novel method for recognising the signature images, based on machine learning algorithms; convolutional neural network (CNN) and fuzzy min max classifier (FMMC). The new system goes through three phases; pre-processing, features extraction and classification. First of all, a variety of pre-processing techniques are used to isolate the signature pixels from the background. The resulting images are scanned with multiple filters to perform the convolution and ReLU procedure. The pooling process is then applied. Finally, the resulting image pixels are flattened and used to feed FMMC. Three systems containing the most used techniques including; profile projection-FMMC, Loci-FMMC and CNN; have been compared to the proposed system. The first two models are used to prioritise the feature extraction method of our system, while the third model, CNN, is utilised to prioritise the FMMC as classifier. The experimental results have obtained a good recognition rate equal to 97% which confirm the effectiveness of the proposed structure.
Keywords: convolutional neural network; CNN; fuzzy min max classification; FMMC; offline signature recognition.
DOI: 10.1504/IJCVR.2024.10064681

Intelligent serial cascade of hybrid deep learning model for plant leaf disease identification and classification with multi-scale dilation assisted 3D-CNN features
by P. Vinay, G. Santhosh Kumar
Abstract: A novel deep learning framework is explored for plant leaf disease detection to resolve the challenges of existing leaf disease detection models. The pre-processed through optimal weighted threshold histogram equalisation. The parameters inside the histogram equalisation approach are optimised via the hybrid heuristic algorithm like rat aquila swarm optimisation (RASO). Subsequently, the deep features from the pre-processed image are acquired through multi-scale dilation assisted 3D-CNN. Thus, the resultant image is classified using the serial cascade of autoencoder and gated recurrent unit (GRU) (SC-AGRU). Then, the RASO is also used to perform the parameter tuning to increase the classification performance. Throughout the analysis, the accuracy and precision rate of the suggested method are 96% and 95%. Thus, the overall effectiveness of the proposed plant leaf disease classification technique is encountered by conducting a comparative analysis of various plant leaf disease classification techniques regarding various evaluation measures.
Keywords: plant leaf disease identification; optimal weighted threshold histogram equalisation; rat Aquila swarm optimisation; serial cascade of autoencoder and gated recurrent unit neural network; multi-scale dilation assisted convolution neural network.
DOI: 10.1504/IJCVR.2024.10063504

Ensemble CNN model with novel optimisation technique for video content detection
by Sita M. Yadav, Sandeep M. Chaware
Abstract: This research develops and implements a CNN-BiLSTM with chaser prairie wolf optimisation (CPW) model for video content analysis. Initially, the input is collected from the CAMVID and DAVIS datasets, the video is first been read. The optimised YOLO-4 model is proposed for detecting the objects from the video. The hybrid optimisation algorithm is developed from the characteristics of Albus and Falcon, and the role of the optimiser is to train the YOLO model. Then, in order to achieve enhanced performance for the multiclass object classification from videos, the identified objects are subjected to classification using a deep learning model employing the suggested CNN-coupled LSTM model. Additionally, the chaser priori wolf optimisation is used to enhance the deep learning classifier's training, which improves convergence rates. Based on the video content analysis model achievements, at training percentage (TP) 90, the accuracy is 95.75%, sensitivity is 97.30%, and specificity is 96.88% for D1, similarly based on D2 the accuracy is 97.77%, sensitivity is 99.00%, and specificity is 98.90%.
Keywords: hybrid optimisation algorithm; chaser priori optimisation; object detection; object classification; CNN-coupled LSTM.
DOI: 10.1504/IJCVR.2024.10063874

Design analysis of compliant 3D printed thermoplastic polyurethane micro-gripper with screw-gear actuation
by N. Sahay, S. Chattopadhyay
Abstract: In this work the design and analysis of a compliant micro-gripper of thermoplastic polyurethane (TPU) material is presented. With the proposed design the prototype of the grippers will be developed by of 3D printing technology using TPU. The material is of light weight, low cost and very flexible in nature providing gripping with its deformation due to application of force at its actuation point. The gripper is designed and finite element analysis (FEA) has been done using Pro Release 5.0 software where stress and displacement are evaluated at every point of interest. Pressure has been applied in the range of 0.01 to 1.0 MPa to obtain input characteristics in terms of stress generation of the structure which is found to be linear in the range of interest. Output characteristics have been presented in the displacement curve with respect to the applied force. Actuating force has been calculated mathematically from the specified torque and other required parameters of the screw-gear actuation system.
Keywords: compliant mechanism; displacement analysis; micro-gripper; Pro Release 5.0; screw-gear; stress analysis; thermoplastic polyurethane.
DOI: 10.1504/IJCVR.2024.10063569

Forthcoming Articles

International Journal of Computational Vision and Robotics

Keep up-to-date