Forthcoming and Online First Articles

International Journal of Computational Vision and Robotics

International Journal of Computational Vision and Robotics (IJCVR)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Computational Vision and Robotics (43 papers in press)

Regular Issues

  • Edge feature enhanced convolutional neural networks for face recognition using IoT devices   Order a copy of this article
    by Ankur, Mohit Kumar Rohilla, Rahul Gupta 
    Abstract: COVID-19 pandemic has turned the world upside down, with almost everything coming to a halt. In the current period, where we are slowly returning to normal lives, organisations have become more concerned about safety and health. In the post-COVID period, biometric systems based on Fingerprint can be dangerous; moreover, real-time attendance of employees and students joining from online mode is a challenge. Real-time face recognition is a challenging task in terms of accuracy and reliability, especially when deep convolutional neural networks (DCNN) are used for face recognition. DCNNs are data-hungry, and in real-life scenarios, the amount of data per subject or class is minimal, and the number of subjects/classes can be huge. Hence, the need for research on image processing and data augmentation research arises for face recognition as there are many scenarios where the number of classes (subjects) is vast.
    Keywords: face recognition; edge enhancement; face edge processing; deep convolutional neural network; DCNN; data augmentation; image processing.
    DOI: 10.1504/IJCVR.2022.10050239
  • Intelligent classification model for holy Quran recitation Maqams   Order a copy of this article
    by Aaron Rasheed Rababaah 
    Abstract: Quranic recitation is a field that has been studied for centuries by scholars from different disciplines including tajweed scholars, musicians and historians. Maqams are a system of scales of melodic vocal patterns that have been established and practiced by Quran reciters all over the world for centuries. Traditionally, Maqams are taught by an expert of Quran recitation. We are proposing a process model for intelligent classification of Quran maqams using a comparative study of neural networks, deep learning and clustering techniques. We utilised a publicly available audio dataset of Maqams labelled audio signals consisting of the eight primary Maqams: Ajam, Bayat, Hijaz, Kurd, Nahawand, Rast, Saba, and Seka. The experimental work showed that all of the three classifiers nearest neighbour, multi-layered perceptron and deep learning performed well. Furthermore, it was found that deep learning with power spectrum features was the best model with a classification accuracy of 96.55%.
    Keywords: Quran Maqams; neural networks; signal processing; deep learning; convolutional neural networks; CNN; audio signal features; short-term Fourier transform; STFT; power spectrum.
    DOI: 10.1504/IJCVR.2022.10050367
  • Quantitative analysis of transfer and incremental learning for image classification   Order a copy of this article
    by Mohammed Ehsan Ur Rahman, Imran Shafiq Ahmad 
    Abstract: Incremental and transfer learning are becoming increasingly popular and important because of its advantageous nature in data scarcity scenarios. This work entails a quantitative analysis of the incremental learning approach along with various transfer learning methods using the task of image classification. A detailed analysis of the assumptions under which incremental learning should be applied is presented. The degree to which these assumptions hold in most real-world scenarios is also presented. For experiments, MNIST and CIFAR-100 were used. The extensive coverage of incremental and transfer learning techniques on these two datasets showed that a performance improvement is achieved when these techniques are used in data-scarce situations.
    Keywords: transfer learning; incremental learning; deep learning; image classification; image generation; neural networks; artificial intelligence; machine learning; MNIST; CIFAR-10; digit recognition.
    DOI: 10.1504/IJCVR.2022.10050419
  • An improvement in IoT-based smart trash management system using Raspberry Pi   Order a copy of this article
    by Muhammad Shakir, Shahid Karim, Shahzor Memon, Sadiq Ur Rehman, Halar Mustafa 
    Abstract: Our primary aim is to establish an environmentally sustainable and pollution-free community. The responsible states and their citizens carry out all the attempts to make the city neat and clean. In this paper, smart dustbin garbage collection work has been proposed and completed. To avoid all garbage issues, we have developed a project based on a monitoring system with the help of IoT technology. The proposed work is bidirectional; firstly, it connects with hardware, and secondly, it is supported with mobile by developing an Android-based application. Firebase fire store is used to provide communication between both applications. We have improved the smart trash management system using Raspberry Pi, which is pertinent to developed cities worldwide. This project tracks the dustbins and informs the admin of the amount of garbage collected in the garbage bins via a smartphone application. The proposed approach lowers the total number of waste collection truck trips, thereby lowering the total waste collection budget. It ultimately contributes to society’s cleanliness.
    Keywords: internet of things; IoT; Raspberry Pi; ultrasonic sensor; garbage monitoring; Android.
    DOI: 10.1504/IJCVR.2022.10050420
  • A framework for breast cancer prediction and classification using deep learning   Order a copy of this article
    by Praveen Kumar Shukla, Aditya Ranjan Behera 
    Abstract: Breast cancer is a very common disease nowadays. But it is very important to identify and diagnose it at an early stage. So before identifying, it requires identifying and classifying the cancerous cell. Generally to detect the cancerous cell mammography process is more intuitive than any other methods. This is a method of computer aided diagnostic that includes digital image processing for detection of breast cancer. Article represents the method of detection of cancer affected cells and classifies normal patients to cancerous patients. Pre-processing operations are performed on mammographic images after normalisation of the mammographic images. To complete the task of prediction of cancer affected cells, a breast cancer prediction model architecture has been proposed with an accuracy of 94.87%. For the classification of cancerous patients and normal patients, VGG Net 19 architecture has been adopted with an accuracy of 97.27%. In the purposed framework model can be implemented practically as an application in the field of breast cancer diagnosis for a better result in a shorter period.
    Keywords: artificial neural network; benign; breast cancer; fine needle aspirate; malignant; nuclei; recti linear unit.
    DOI: 10.1504/IJCVR.2022.10050421
  • Image-based deep learning automated grading of date fruit (Alhasa case study Saudi Arabia)   Order a copy of this article
    by Amnah Aldandan, Sajedah AlGhanim, Hawraa Alhashim, Mona A.S. Ali 
    Abstract: Dates are small and popular in the Middle East, and they grow in many countries. Many researchers focus on classifying dates by type. But the researchers didn't consider that many date industries sort dates by quality to determine the proper price and use. This paper classifies Tamer stage dates automatically based on quality. This study proposed two ways to differentiate date fruit quality. First, use CNN; VGG-16 to extract features from the dataset, then uses SVM classifier. Second method based on developed CNN. Tamar used three different images to train these models. Another contribution is the creation of our own dataset, which was acquired using a smartphone camera under uncontrolled lighting and camera parameter circumstances, such as autofocus and camera stabilisation. A comparison between two methods, the CNN model had 97% classification accuracy for Khalas, 95% for Ruzaiz and 90% for Shaishi.
    Keywords: date fruit; classification; Rutab stages; deep learning; convolutional neural network; CNN; support vector machine; SVM; Saudi Arabia.
    DOI: 10.1504/IJCVR.2022.10050650
  • Deep learning method for human activity recognition using heaped LSTM and image pattern of activity   Order a copy of this article
    by P. Rajesh, R. Kavitha 
    Abstract: Deep learning, the most spelt word and habitually used technology of the researchers around the globe of technical arena. With the tremendous growth of technologies like data analytics, data mining, machine learning methods and IoT applications like health monitoring, safety and security, smart control, human movement acknowledgement has become more noteworthy achievement in the field of science. Utilising the most booming technology, we propose a unique approach for monitoring human activities who aspire to live and lead an independent life, mostly the elderly people. In this experiment we discovered a novel method in identifying the human activities and the forte of this approach is the privacy of the monitored person is ensured. This investigation is moved forward by utilising an improved convolution neural network (CNN) with enriched bi-directional LSTM (BLSTM). The activity recognition model is still optimised by using a heaped LSTM (HLSTM) and a fine trained data clustering algorithm. Our proposed approach, when trained and tested with a prominent dataset that contains sensor data, achieved overall accuracy of 99.43% for all the considered nine activities.
    Keywords: activity recognition; bi-directional LSTM; BLSTM; clustering; convolution neural network; CNN; heaped LSTM.
    DOI: 10.1504/IJCVR.2022.10050893
  • Efficient masked face identification biometric systems based on ResNet and DarkNet convolutional neural networks   Order a copy of this article
    by Freha Mezzoudj, Chahreddine Medjahed 
    Abstract: The COVID-19 pandemic has caused death and serious illness in the entire world. During humanity’s fight against this disease, the wearing of face masks has become and remains a necessity in our daily life. This critical fight encourages us to generate a rich masked face database (noted FEI-SM) with different variations of poses and different emotions. We also employed several robust convolutional neural network systems based on three ResNet and two DarkNet models (ResNet18, ResNet50, ResNet101, DarkNet19, and DarkNet53) to measure the accuracy of biometric identification of masked and un-occluded faces on the challenging masked face database FEI-SM. In general, the compared results are showing good accuracies with many used biometric systems. Through experimental runs, the obtained outputs show clearly that the scheme model based on ResNet18 is the most effective model to recognise individuals with masks in different scenarios in terms of rate recognition and testing time.
    Keywords: biometric; masked face identification; ResNet; DarkNet; FEI-SM database; convolutional neural networks; CNN.
    DOI: 10.1504/IJCVR.2022.10052153
  • Applied-behavioural analysis therapy for autism spectrum disorder students through virtual reality   Order a copy of this article
    by T. Subetha, Kayal Padmanandam, L. Lakshmi, S.L. Aruna Rao 
    Abstract: Autism spectrum disorder (ASD) is a neurological disorder that contracts one’s social engagements and suffers from mind blindness, leading to a lack of social-emotional reciprocity. The most unravelling solution for them to learn social behaviour is using interactive virtual reality technologies. This study aims to develop an applied behaviour analysis therapy through virtual reality (VR)-based training to learn the necessary social-communication skills. The proposed system consists of two sessions. First, the training session, where the students are trained with VR content, developed to exhibit social-communication skills. Next, the students enter the practice session where the student will be given a chance to practice the lessons taught during the training session. Student gestures are recognised using a multimodal gesture recognition system, and deep neural network is employed to identify the student’s speech. The successful video snippets are stitched into a video using automatic video self-modelling (VSM) and it allows the learner to improve a particular target social behaviour. The system has been evaluated using a comparative study with and without the proposed study and the results evidence that students have improved learning and communication in the real world, which seems to be the dream of their parents and family.
    Keywords: virtual reality; augmented reality; gesture detection; voice recognition; video self-modelling.
    DOI: 10.1504/IJCVR.2022.10051122
  • Acute myelogenous leukaemia detection in blood microscope images using particle swarm optimisation   Order a copy of this article
    by Abdullah Mohan, Kedir Beshir, Alemayehu Kebede 
    Abstract: The acute myelogenous leukaemia (AML) is one of the types of acute leukaemia that is seen in adults. Nowadays, people use manual tests of blood smear to diagnose leukaemia. This manual method requires more time and the operators ability to diagnose the diseases. In this article, a new hybrid technique that detects AML in blood smears is presented. The proposed method uses a texture-based method - local binary pattern (LBP) and a statistical-based method - grey-level co-occurrence matrix (GLCM) to extract the features from WBC cells. The best features are selected by using a PSO algorithm and their accuracy is measured using nearest neighbour (NN)-classifier and extreme learning machine (ELM). The proposed method was tested using American Society of Hematology (ASH) public datasets and achieved promising results. The ASH database consists of 80 images, where 40 images are taken from AML patients and the remaining 40 are from non-AML patients. The proposed method, LBP+GLCM+PSO along with the ELM classifier achieved an accuracy of 90.44%. The experiment shows that the proposed method outperforms the existing methods in the detection of AML.
    Keywords: NN-classifier; particle swarm optimisation; PSO; acute myelogenous leukaemia; AML; acute lymphoblastic leukaemia; ALL.
    DOI: 10.1504/IJCVR.2022.10051333
  • A new approach to detect cardiovascular diseases using ECG scalograms and ML-based CNN algorithm   Order a copy of this article
    by Lanka Alekhya, P. Rajesh Kumar 
    Abstract: Convolutional neural networks (CNNs) have gained popularity in the classification of cardiovascular diseases using ECG signals. This paper uses a pre-trained CNN model Visual Geometry Group16 (VGG16) network with the transfer learning process is used for feature extraction with SVM, k-NN and RF algorithms to classify the signals. The input to VGG16 net were ECG signals that are considered from the MIT-BIH database for four classes of heart ailments. Around 27 min and 42 sec of elapsed time is engaged to train the network. The study evaluates that this hybrid model of CNN performs on test data and gives an overall model accuracy and mean of MCC for SVM as 95.83% and 94.52%, for k-NN as 96.67% and 95.60% and for Random Forest as 96.94% and 95.96% respectively which gives a better performance when compared with only pretrained CNN-VGG16Net with an overall accuracy of 95.3% and 93.75% as mean MCC.
    Keywords: electrocardiogram; ECG; convolutional neural network; CNN; Visual Geometry Group16; VGG16; support vector machine; SVM; k-nearest neighbour; k-NN; random forest; RF; Mathews correlation coefficient; MCC.
    DOI: 10.1504/IJCVR.2022.10051429
  • Comprehensive survey on video anomaly detection using deep learning techniques   Order a copy of this article
    by Sreedevi R. Krishnan, P. Amudha, S. Sivakumari 
    Abstract: The rapid increase in violence and crime leads to the use of video surveillance systems. Handling such huge videos and classifying them as abnormal or not are tedious. Therefore, an automatic anomaly detection method is vital for the real-time detection of anomalous events. Advancements in machine intelligence lead to an automatic anomaly detection system for the timely identification of anomalous events and reducing the after-effects. Recent research uses deep learning techniques for faster and automatic detection of abnormal events from an enormous volume of surveillance videos. Reviewing the video anomaly detection system is very relevant and helps to promote future research in this area. The paper performs a comprehensive study of several video anomaly detection methods using deep learning techniques to detect and predict anomalous events. The paper also surveys various methods used for women’s safety. Various methodologies, datasets, and evaluation metrics for detecting video anomalies and comparisons are included.
    Keywords: deep learning; CNN; LSTM; GAN; autoencoder; women safety.
    DOI: 10.1504/IJCVR.2022.10051823
  • Robust autonomous detection and tracking of moving objects using hybrid tracking approach   Order a copy of this article
    by Mohamed Akli Bousta, Abdelkrim Nemra 
    Abstract: Detecting and tracking mobile objects in video is among the most prevalent and challenging tasks under realistic motion and climatic conditions such as image occlusion, fast camera movement and natural environmental changes (fog, rain, etc.). In this paper, we propose an improved autonomous visual detection and tracking algorithm, which uses the single shot detection algorithm for initialisation followed by an adaptive kernelised correlation filter (KCF) tracker and combined with a predictor-corrector smooth variable structure filter (SVSF) for target recovery and estimation. It is known that KCF tracker suffers from failure to target recovery after an occlusion and scale variation. To overcome these limitations, the optimal SVSF filter is combined with the KCF tracker in order to maintain suitable target estimation and update the KCF tracker when the target is lost. The obtained results illustrate that the proposed approach achieves the state-of-the-art performance on all tested datasets with many realistic scenarios with different attributes.
    Keywords: visual detection and tracking; single shot multi-box detector; SSD; kernelised correlation filter; KCF; smooth variable structure filter; SVSF.
    DOI: 10.1504/IJCVR.2022.10051959
  • Visual place representation and recognition from depth images   Order a copy of this article
    by Farah Ibelaiden, Slimane Larabi 
    Abstract: We propound a new visual positioning method that recognises the previously visited places whose descriptors are stored in a dataset that does not need updates. The descriptor of the unknown location is computed from a depth video acquired by surrounding the depth camera in the scene to build gradually the corresponding 3D map. From which the 2D map is derived and described geometrically based on the architectural features to constitute the query descriptor which is compared to database descriptors in order to deduce the location. The experiments show the efficiency and robustness of the proposed descriptor to scenery changes, light variations and appearance changes.
    Keywords: place recognition; depth image; architecture-based descriptor; three dimensional model; two dimensional map.
    DOI: 10.1504/IJCVR.2022.10052055
  • Collation of performance parameters on various machine learning algorithms for breast cancer discernment   Order a copy of this article
    by Mohan Kumar, Sunil Kumar Khatri, Masoud Mohammadian 
    Abstract: In clinical practices machine learning (ML) technology plays an important and rapid growing role as it is likely to help healthcare professionals making decisions and proposing new diagnoses. This research study aims in validating and comparing the performance of various ML models that can help in predicting breast cancer in women. Performance Parameters on various ML Algorithms for breast cancer dataset has been tested. The testing is performed on 116 participants from dataset. The features of dataset including insulin, glucose, resisting, adiponectin, homeostasis model assessment (HOMA), leptin, age, and index of obesity (MCP1). Many clinical features were measured like BMI. This dataset experimented with 11 classification algorithms such as logistic regression (LR), k-nearest neighbour (kNN), support vector machine (SVM), decision tree (DT), random forest (RF), Naive Baise and optimum ML algorithms, etc. The research work detected breast cancer from the published Coimbra breast cancer dataset (CBCD). Each classifier has been utilised for various kinds of parameters tuning and for prediction. These results suggested they could be taken as a very meaningful and useful pair of factors to forecast cancer.
    Keywords: machine learning; ML; optimal algorithms; prediction; breast cancer; support vector machine; SVM.
    DOI: 10.1504/IJCVR.2022.10052056
  • Copy move forgery detection by improved SIFT K-means algorithm   Order a copy of this article
    by Kavita Rathi, Parvinder Singh 
    Abstract: The copy move forgery due to its copied features from same image pose toughest challenge in image forgery detection. Key-point-based CMFD techniques outperform the block-based CMFD techniques. SIFT is most used key-point-based techniques. The present algorithm improves upon the SIFT algorithm with improvement in the various steps of the workflow by adding Laplace of Gaussian and multiplying it by the square of Gaussian kernel to make it real scale invariant, applying double level filtering at feature extraction and filtering by using g2NN and K-mean clustering. The results in the form of recall, precision, and F1 measure outperformed the state-of-art key-point-based CMFD techniques over multiple datasets.
    Keywords: copy move forgery; CMF; key-point-based CMFD; SIFT extractor; Laplace of Gaussians; LoGs; K-means clustering.
    DOI: 10.1504/IJCVR.2022.10052099
  • Deep multiple affinity model for proposal-free single instance segmentation   Order a copy of this article
    by Isah Charles Saidu, Lehel Csató 
    Abstract: We improve on an existing instance segmentation model with a probabilistic extension to the encoded neighbourhood branch model (Bailoni et al., 2020) - we call it multiple outputs encoded neighbourhood branch (mENB) model. The mENB predicts - for each voxel in a 3D volume, a distribution of central masks, where each mask represents affinities of its central voxel and the neighbouring voxels within the mask. When post-processed using a graph partition algorithm, these masks collectively delineates the boundaries of each instance of the target class within the input volume. Our algorithm is efficient due to active learning, more accurate and it is robust to Gaussian noise and model weights perturbations. We conducted two experiments: 1) the first experiment compared mask predictions of our technique against the baseline (Bailoni et al., 2020) using the CREMI 2016 neuron segmentation dataset and the results showed a more accurate masks predictions with uncertainty quantification; 2) in the second experiment, we tested segmented instances against the popular proposal-based mask-RCNN and the results showed that our technique yields better precision and intersection over union.
    Keywords: segmentation; active learning; affinity model; uncertainty quantification.
    DOI: 10.1504/IJCVR.2022.10052466
  • Generic object detection in real-time images under poorly visible conditions: a systematic literature review   Order a copy of this article
    by Perla Sunanda, Dwaram Kavitha 
    Abstract: The invention and usage of CNN in computer vision (CV) have made object detection an emerging task to locate and identify objects in an image or video is facing challenge with poorly visible conditions. This review aims to know the research gap for detecting generic objects, to identify the frameworks needed for working with real-time images, to see the importance of image enhancement and the need for designing nighttime datasets. A systematic literature search of studies were carried out in Scopus and IEEE databases to select object detection studies specifying generic object detection, real-time images, poorly visible or lowlight conditions, image enhancement pre-processing, the type of framework or algorithms needed, and the nighttime datasets. The time frame for the analysis was from January 2010 to the latest month of 2022. The study shows that there is an utmost need for detecting objects in nighttime or lowlight conditions.
    Keywords: computer vision; CV; object detection; obstacle detection; poorly visible; low light condition; nighttime.
    DOI: 10.1504/IJCVR.2023.10053141
  • Improving accuracy of arbitrary-shaped text detection using ResNet-152 backbone-based pixel aggregation network   Order a copy of this article
    by Suresh Shanmugasundaram, Natarajan Palaniappan 
    Abstract: CNN-based scene text detection in real-world applications is facing two major issues. The speed-accuracy trade-off is the first issue. Secondly, the arbitrary-shaped text instance is to be modelled. This work solves both issues by using ResNet-152 backbone-based pixel aggregation network. Since ResNet-152 provides better accuracy and performance, ResNet-152 is chosen for backbone. The proposed network has a high speed segmentation head and a learnable post-processing. Feature pyramid enhancement module (FPEM) and feature fusion module (FFM) constitute the segmentation head. For high quality segmentation, multi-level information is introduced by a cascadable U-shaped module that is nothing but FPEM. Different depth features are given by FPEM. FFM will collect these features into a final feature to segment the arbitrary shaped text. Using the predicted similar vectors aggregate precisely text pixels, pixel aggregation (PA) implements this post process which is learnable. The proposed ResNet-152 backbone-based PAN can attain an F-measure of 85.6% on Total-Text dataset.
    Keywords: arbitrary-shaped text detection; scene text detection; curve text detection; text segmentation; DNN.
    DOI: 10.1504/IJCVR.2023.10053234
  • An implementation of searchable video player   Order a copy of this article
    by Kitae Hwang, In Hwan Jung, Jae Moon Lee 
    Abstract: This paper introduces an Android app, SVPlayer that searches for scenes in a video. To search for scenes in a video, SVPlayer extracts voice from the video, converts it into text, and searches for words in the text. Voice is converted to text in units of ten seconds, and both voice and text are made into a timeline text. When the user enters a word, the word is searched and a timeline list of all scenes that contains the word is displayed, and the user can select a desired time from the list. The performance was variously evaluated through actual measurement, and as a result, it took only 2-3 minutes to create 10-second timeline text from a 20-minute video. SVPlayer processes this task in the background, so the user can jump directly to the desired scene in the middle of watching 2-3 minutes after starting to play the video.
    Keywords: voice; search scene; text.
    DOI: 10.1504/IJCVR.2023.10053409
  • Registration of CT and MR image in multi-resolution framework using embedded entropy and feature fusion   Order a copy of this article
    by Sunita Samant, Pradipta Kumar Nanda, Ashish Ghosh, Subhaluxmi Sahoo, Adya Kinkar Panda 
    Abstract: In this paper, a new scheme for the registration of brain CT and noisy MR images is proposed in a multi-resolution framework based on the notions of embedded entropy and nonlinear combination of the mutual information (MI) corresponding to Renyi’s and Tsallis entropy. Gabor and Sobel’s features are fused probabilistically and the registration is carried out in fused feature space. The weights for the fusion of the two distributions are obtained using the Bhattacharyya distance as the similarity measure. Registration parameter is obtained at different resolutions by maximising the combined mutual information obtained at different resolutions. The proposed algorithm is tested with the real patient data obtained from Retrospective Image Registration Evaluation (RIRE) database. It is found that the optimum registration parameter obtained at a low resolution of (64 x 64) has high accuracy. The proposed scheme exhibits improved performance as compared to other existing algorithms.
    Keywords: multi-modal image registration; embedded entropy; mutual information; fused feature space; multi-resolution.
    DOI: 10.1504/IJCVR.2023.10053410
  • Wireless underwater channel modelling for acoustic communication   Order a copy of this article
    by Sanapala Umamaheswararao, M.N.V.S.S. Kumar, R. Madhu 
    Abstract: Underwater channel modelling is very essential to establish acoustic communication underwater. It helps AUVs to navigate safely by avoiding collisions. But lot of complexities involved in acoustic communication as there will be reflections from the water surfaces. The main factors that are influencing underwater communication are transmission loss, noise, multipath, Doppler spread, and propagation delay. These parameters made available the acoustic channels bandwidth restricted and drastically subject to both range and frequency. The terrestrial communication parameters are not suitable to the underwater communication and hence require a dictated system design. The underwater channel modelling includes the finding of signal to noise ratio (SNR) at the receiver, transmission path loss and path gain for a particular path due to multipath propagation, and the noise level in the propagation path. An underwater channel communication model for sonar data is developed by considering the case of multipath propagation in shallow water.
    Keywords: channel modelling; multi-path propagation; path loss.
    DOI: 10.1504/IJCVR.2023.10054629
  • Integrating Thepade SBTC and Niblack thresholding features for identification of land usage from aerial images using ensemble of machine learning algorithms   Order a copy of this article
    by Sudeep D. Thepade, Sandeep Chauhan 
    Abstract: The aerial images taken by satellites and drones are used to identify different types of land utilisation. Land use identification (LUI) is attempted using several machine learning (ML) algorithms, which are trained with the aerial image features extracted using global or local content. The work here presents a fusion of globally extracted Thepade SBTC (TSBTC) features and local extracted Niblack thresholding features for LUI. Extraction of features from an aerial image using TSBTC is done with ten variations from 2-ary to 11-ary. Nine ML classifiers and ensembles are considered. The UC-Merced-dataset, containing 2100 photos split over 21 different land-use-types, is used for experimentation. The performance metrics alias F-measure, accuracy and MCC performance are used. The fusion of TSBTC and Niblack has given a better LUI. In TSBTC variations, TSBTC 11-ary has given better LUI. The ensembles have given better LUI. The IBK + RF + SL ensemble performs better.
    Keywords: land use identification; LUI; Niblack; Thepade sorted BTC; Thepade sorted BTC N-ary; aerial image.
    DOI: 10.1504/IJCVR.2023.10055668
  • Enhancing stock market prediction through image encoding, pattern recognition, and ensemble learning with custom error correction techniques   Order a copy of this article
    by Ravi Prakash Varshney, Dilip Kumar Sharma 
    Abstract: Financial forecasting is a crucial task in the financial sector and is currently being addressed using various technical pricing patterns. However, the conventional techniques suffer from limitations such as time-consuming computations and lower accuracy due to the stochastic dependency between historical and future values. This research aims to bridge the gap in financial forecasting by proposing a hybrid model that combines time series analysis using LSTM with image processing techniques such as Gramain Angular Field, line plot methodology, and error correction techniques. The proposed approach leverages the strengths of both techniques to provide a reliable forecasting solution that can capture the stochastic dependency between past and future values. The study aims to contribute to the field of machine learning by providing a novel approach to financial forecasting and expanding the research on intelligent processing methods. For Apple, when compared the LSTM model result with final model there is ~48% decrease in test RMSE and ~57% decrease in test MAE. For Amazon, when compared the LSTM model result with final model there is ~14% decrease in test RMSE and ~10% decrease in test MAE. Moreover, the proposed model outclasses the state-of-art model and addresses the overfitting in them.
    Keywords: time series forecasting; Gramain Angular field; GAF; computer vision; pattern recognition; image encoding; error correction.
    DOI: 10.1504/IJCVR.2023.10055874
  • Experimental and simulation study of a four-degrees of freedom robot arm moving through space planner path   Order a copy of this article
    by Hajar Abd Al-Sattar Ali, Hatem H. Obeid 
    Abstract: In this work, a 4-DOF robot arm manipulator was built and tested using the software programs LABVIEW and SOLIDWORKS. The model tracks a horizontal path at four times periods and is tested in experimental and theoretical environments. The aim study of this work was to make CAD models of the four degrees of a freedom robot arm and then use LabVIEW programming to control how its moves. Find the optimum time for the path and minimum power consumption to accomplish the task. Also, the kinematics and dynamics parameters of the robot arm were calculated and tested through the proposed intervals. Four case studies are utilised to evaluate the modules performance at two, four, six and eight second. By observing the results, it was investigated that the optimum time to complete the task is between two and four seconds and the power consumption reduced by 99.8%.
    Keywords: robotic; arm; power consumption; time periods; kinematics; kinetics; real-time.
    DOI: 10.1504/IJCVR.2023.10055875
  • Fuzzy text/non-text classification of document images based on morphological operator, wavelet transform, and strong feature vector   Order a copy of this article
    by Mobina Ranjbar Malidareh, Amir Masoud Molaei 
    Abstract: In text retrieval systems, the classification of textual and non-textual content is known as an introduction to accessing semantic information in document images. In this paper, a new structure based on morphological operator, wavelet transform, and strong feature vector extraction is proposed for classifying textual and non-textual content in document images regardless of text language. In this structure, the image is segmented by an effective mechanism. By training the pattern of textual and non-textual areas in the images, the text and non-text regions are determined by a fuzzy classifier. The texture features such as coarseness, directionality, contrast and roughness, and features extracted from the wavelet transform sub-bands are used to classify and label the regions. The proposed method is evaluated on a database of textual and non-textual images derived from document images available on the Internet. The simulation results show the high efficiency of the proposed method in the segmentation and classification of the image components. It provides an accuracy of 90.1% for the classification of image regions.
    Keywords: fuzzy classification; morphological operator; segmentation; strong feature vector; text/non-text separation; wavelet transform.
    DOI: 10.1504/IJCVR.2023.10056059
  • Improved green wireless sensor network using modified prediction oriented distributed clustering   Order a copy of this article
    by Pranati Mishra, Ranjan Kumar Dash 
    Abstract: In this paper, an improved clustering protocol is suggested for WSNs, in which a new cluster head strategy for non-uniformly distributed nodes and network lifespan is introduced the network follows a mess topology of wireless sensor nodes which report to a main station to observing and study. Geographical and contextual analysis and visualisations are provided. It is considered as a green network because the overall network consumes less energy. Main application of this research work includes data observation and communication of data to the base station. It includes multi-level node clustering to efficiently save energy at multiple levels. Priorly the PODC algorithm shows the optimised result in comparison to EADC and SA-EADC algorithm. In this work improved PODC works with the additional multilevel clustering. The result shows the improvement in network lifetime and energy dissipation among sensor nodes and it is observed that the proposed improved PODC performs far better than them. Therefore, it can be safely concluded that proposed protocol improves the network life span while maintaining original sensing coverage level for the network.
    Keywords: green network; improved PODC; EADC; SA-EADC; multilevel clustering.
    DOI: 10.1504/IJCVR.2023.10056160
  • Deep learning approach to pedestrian detection and path prediction   Order a copy of this article
    by Ujwalla Gawande, Kamal Hajari, Yogesh Golhar 
    Abstract: Pedestrian detection and path prediction are significant challenges in vision-based surveillance systems. Because of variances in pedestrian postures and scales, backdrops, and occlusion, advanced computer vision applications face many obstacles. To address these issues, we provide an improved YOLOv5 pedestrian recognition and path prediction model. To begin, the revised YOLOv5 model is employed to determine pedestrians of varied sizes and proportions. A pedestrian’s path is estimated using a path prediction approach. The proposed method addresses partial occlusion situations in order to reduce object occlusion-induced progression and loss, as well as linking recognition results to motion properties. The route prediction system then analyses motion and directional data to estimate the direction of pedestrian movement. The results of the experiments significantly enhanced the performance on datasets from Caltech, INRIA, MS COCO, ETH, KITTI, and the proposed pedestrian dataset. Improved YOLOv5 outperforms existing methods. The Caltech dataset has the lowest log-average miss rate (8.32%), followed by the INRIA dataset (7.32%) and the ETH dataset (32.64%). Results from the KITTI dataset were promising, at 76%, 64%, and 60%. 8.69% miss rate on the proposed pedestrian dataset and 8.57% on the MS COCO dataset. Finally, we conclude and look into future research.
    Keywords: convolutional neural network; CNN; deep learning; YOLOv5; pedestrian detection; tracking; path prediction.
    DOI: 10.1504/IJCVR.2023.10056182
  • Machine learning approaches for early detection and management of musculoskeletal conditions   Order a copy of this article
    by Pawan Whig, Ebtesam Shadadi, Shama Kouser, Lathifah Alamer 
    Abstract: Musculoskeletal conditions have a significant impact on quality of life. This study explores the use of machine learning algorithms for early detection and management of such conditions. Different models were evaluated using a dataset of musculoskeletal images and clinical information. Results demonstrate accurate classification with high sensitivity and specificity. A neural network was developed for detecting chronic lower back pain, achieving an impressive validation F1 score of 89%-93%. This highlights the potential of artificial intelligence in improving early detection and management. Future research should address data outliers to enhance model performance. Overall, neural networks are a valuable tool for early detection and management of musculoskeletal conditions, leading to improved patient outcomes. These findings suggest promising avenues for future research and implications for early detection and management in this field.
    Keywords: musculoskeletal conditions;arthritis; fractures; spinal problems; machine learning; early detection.
    DOI: 10.1504/IJCVR.2023.10057385
  • Localisation and classification of surgical instruments in laparoscopy videos using deep learning techniques   Order a copy of this article
    by Avanti Bhandarkar, Priyanka Verma 
    Abstract: Surgical trainees often use laparoscopic surgery videos to understand the appropriate use of instruments and visualise the surgical workflow better, but these videos may be difficult to interpret without proper annotations. In recent times, neural networks have emerged as an accurate and effective solution for instrument detection and classification in surgical video frames, which can subsequently be used to automate the annotation process. The proposed implementation uses faster-RCNNs and bidirectional LSTMs with (and without) time-distributed layers and attempts to solve some of the problems commonly faced while developing deep learning models for surgical image and video data: severe class imbalance, inaccuracies during multi-label classification and a lack of spatiotemporal context from adjacent video frames. The bidirectional LSTM with time-distributed layers achieved an average accuracy of 80.20% and an average F1 score of 0.7176 on the M2CAI16 tool dataset, while also achieving 63.49% average accuracy and an average F1 score of 0.522 on unseen data. Jaccard distance and Hamming distance have also been used as object detection-specific metrics; the same model registered the lowest values for both distances, implying accurate localisation and identification of surgical instruments.
    Keywords: deep learning; surgical instrument detection; surgical instrument classification; surgical instrument localisation; data augmentation; transfer learning; faster-RCNN; region-based convolutional neural networks; bidirectional LSTMs; long short-term memory networks; Jaccard distance; Hamming distance.
    DOI: 10.1504/IJCVR.2023.10057447
  • Holistic knuckle recognition through adept texture representation   Order a copy of this article
    by Neeru Bala, Anil Kumar, Rashmi Gupta, Ritesh Vyas 
    Abstract: In topical years, substantiation of individuals through their finger knuckle patterns has turned into an extremely dynamic area of exploration. Finger knuckle patterns are the inimitable creases existent on the posterior surface of the hand which is more expedient than other hand related modalities like fingerprint and palmprint, as the posterior surface of hand is less abraded in contrast to interior hand. This work presents an effective knuckle-based recognition framework via fusion of base, minor and major finger knuckle patterns of fingers of the individual for boosted recognition. For this, all the finger knuckle patterns are segmented and features are extracted explicitly using an efficient feature descriptor named curvature Gabor filter (CGF). In order to substantiate the proposed methodology, rigorous investigations have been performed on a publicly accessible large hand dorsal database named PolyU-Hand Dorsal (HD) dataset. Knuckles are integrated in three different ways to investigate the effect of their fusion, named fusion over knuckle, fusion over finger and fusion over hand. All the strategies mentioned have supported their magnified performance than individual knuckle recognition framework, whereas fusion over hand outshined with tiniest EER of 0.2009.
    Keywords: information security; multimodal biometrics; information fusion; knuckle recognition; score level fusion.
    DOI: 10.1504/IJCVR.2023.10057530
  • A multi-modal image encoding and self-attention-based transformer framework with sentiment analysis for financial time series prediction   Order a copy of this article
    by Ravi Prakash Varshney, Dilip Kumar Sharma 
    Abstract: In this paper, we propose a novel approach for financial time series forecasting using feature selection, image encoding, and a self-attention-based CNN transformer. We use Markov transition field and candlestick chart encoding to extract features from historical stock data. Additionally, we incorporate the sentiment analysis of the financial news data in our model to improve the forecast accuracy. The proposed approach is compared to traditional time series forecasting methods, and the results show that our method outperforms the traditional method in terms of forecasting accuracy. The proposed approach can be used to improve risk management and make more informed trading decisions. Our experiments demonstrate that the proposed framework achieved an improvement of approximately 17.8% in root mean squared error and ~38.7% in mean absolute error for securities lending dataset and ~71.5% improvement in root mean squared error and around ~83.2% improvement in mean absolute error for pricing dataset.
    Keywords: candlestick image encoding; computer vision; convolutional neural network transformer; feature selection; Markov transition field image encoding; multivariate time series; MTS; pattern recognition; long-short term memory; LSTM; sentimental analysis.
    DOI: 10.1504/IJCVR.2023.10057531
  • Improved classification of histopathological images with feature fusion of Thepade SBTC and Sauvola thresholding using machine learning   Order a copy of this article
    by Sudeep D. Thepade, Mangesh S. Dudhgaonkar Patil 
    Abstract: Histopathological images play a significant role in selecting effective therapeutics and identifying disorders like cancer. Digital histopathology is a crucial advancement in contemporary medicine. The growth and spread of cancer cells within the body can be significantly controlled or stopped with early identification and therapy. Many machine learning (ML) algorithms are used to study the images in the dataset. Feature extraction is done using Sauvola thresholding and Thepade sorted block truncation code (TSBTC). This paper presents a fusion of the features computed using the TSBTC and Sauvola thresholding method for improved classification of histopathological images. The experimental validation is done using 960 images from KIMIA PATH 960 dataset with the help of performance metrics like sensitivity, specificity, and accuracy. The superior performance is shown in TSBTC 9-ary and Sauvola thresholding feature fusion using logistic model tree (LMT) classifier with 97.6% accuracy in ten cross-fold validation scenarios.
    Keywords: classification; binarisation; histopathological; feature fusion; ensembles; KIMIA_PATH_960; Thepade SBTC; classifiers.
    DOI: 10.1504/IJCVR.2023.10058302
  • A primitive analysis of resonance frequency and stability simulation of a 2D SCARA drawing robot system for BCIs   Order a copy of this article
    by Ellis Iver David, James Edward Rowe, Yeon-Mo Yang 
    Abstract: In recent years, selective compliance assembly robot arm (SCARA) manipulators related to brain-computer interfaces (BCIs) have been gaining in popularity in industrial applications owing to their significant adaptability. One popular application concerns commercially available drawing robots. For example, the tip ring sleeve drawbot by Hart and Ragan uses an audio output. Thus, WAV files with pulse width modulation are used to control the servomotors. After constructing a drawing robot prototype and analysing the impulses and responses, structural flaws were noticed in this particular design from the perspective of stability, limiting the quality of the final drawing. Indeed, the robot was designed to follow single-line paths, resulting in very sudden movements (e.g., stop-start motions). This caused vibrations in the arm that were more noticeable at high speeds. To counter or mitigate the shaking of the robot arm, in this study, a kinematic model and stability simulation for a 2D (dimensional) SCARA drawing robot arm were constructed with the aim of improving the overall stability. The eventual aim was to find a model for describing the motions of all two-degree-of-freedom (DOF) rotational arm robots to allow for quick access or derivation of the optimal functional parameters of such robots.
    Keywords: brain-computer interface; BCI; SCARA; drawbot synthesiser; stipple gen; travelling salesman problem; TRS; statistical signal processing; stability; transfer function; impulse response; IR; step response; SR.
    DOI: 10.1504/IJCVR.2023.10058433
  • An efficient deep convolutional neural network-based safety monitoring system for construction sites   Order a copy of this article
    by V. Ashwanth, Dhanya Sudarsan 
    Abstract: Worker safety and health are paramount concerns, especially in high-risk occupations such as construction works. Monitoring workers to ensure proper usage of personal protective equipment (PPE) at construction sites is essential. However, manual surveillance via CCTV footage is time-consuming. This paper proposes an automated approach for construction site monitoring without human intervention. Initially, YOLOv4 is employed for construction worker detection, with subsequent division of the bounding boxes into four halves. EfficientNet is then utilised to analyse these cropped sections and identify specific PPE components. Additionally, construction tools and equipment are recognised, and a safety score is assigned based on worker proximity to these objects. Unsafe workers are flagged as danger zone in each frame, alongside the marking of workers. This approach streamlines safety monitoring processes while ensuring worker well-being.
    Keywords: computer vision; YOLOv4; construction safety; EfficientNet-B5; transfer learning; PPE; custom labelling; safety detection; object detection; image classification; machine learning.
    DOI: 10.1504/IJCVR.2023.10058756
  • An approach for speaker diarisation using whale-anti coronavirus optimisation integrated deep fuzzy clustering   Order a copy of this article
    by K. Vijay Kumar, Ramisetty Rajeswara Rao 
    Abstract: In this paper, Anticorona whale optimisation (ACWOA) method is developed for speaker diarisation which is then used to train the deep fuzzy clustering (DFC) algorithm for final clustering. To extract relevant characteristics, such as Mel frequency cepstral coefficients (MFCCs), line spectral frequencies, and line prediction cepstral coefficients (LPCCs), the input audios are fed into a feature extraction procedure (LSF). Music and silence removal are used in the speech activity detection (SAD). After identifying speech activities, the speakers are segmented using a Bayesian inference criterion (BIC) score. The ACWOA-based DFC outperformed other methods with best testing accuracy of 0.891, lowest diarisation error, false discovery rate (FDR), false negative rate (FNR) and false positive rate (FPR) of 0.618, 0.289, 0.148, and 0.130. The proposed approach outperforms the existing approaches active learning, DE+K-means, LSTM, MCGAN, and ANN-ABC-LA in terms of testing accuracy for test case 1 by 9.31%, 7.40%, 6.73%, 5.49%, and 3.59%.
    Keywords: speaker diarisation; deep fuzzy clustering; DFC; Bayesian inference criterion; BIC; speech activity detection; SAD; speaker segmentation; Mel frequency cepstral coefficients; MFCCs; line prediction cepstral coefficients; LPCCs.
    DOI: 10.1504/IJCVR.2023.10059523
  • A sine-cosine algorithm blended grey wolf optimisation algorithm for partitional clustering   Order a copy of this article
    by Gyanaranjan Shial, Chita Ranjan Tripathy, Sabita Sahoo, Sibarama Panigrahi 
    Abstract: Over last few decades, partitional clustering algorithms have been emerged as one of the most promising clustering algorithms that find groups among data items. Motivated from this, we have proposed a hybrid sine-cosine algorithm (SCA) blended grey wolf optimisation (GWO) algorithm for partitional data clustering. This algorithm selects near-optimal cluster centres using leadership approach of GWO and explorative strategy of SCA. Here, the sine and cosine functions are used to generate more diversified solutions around the mutant wolf of each search agents. Therefore, a tradeoff is maintained between exploration and exploitation which enjoys the benefits from both the algorithms. An extensive simulation work is carried out for clustering 11 benchmark datasets using four performance measures. Additionally, a comparative performance analysis (statistical) is conducted against GWO, PSO, SCA, JAYA and K-means using Duncans multiple range test and Friedman and Nemenyi hypothesis test. The test confirms the supremacy of our proposed algorithm.
    Keywords: grey wolf optimiser; JAYA algorithm; sine-cosine algorithm; SCA; particle swarm optimisation; PSO; partitional clustering; K-means algorithm.
    DOI: 10.1504/IJCVR.2023.10059975
  • High-speed optical 3D measurement device for quality control of aircraft rivet   Order a copy of this article
    by Benjamin Bringier, Majdi Khoudeir 
    Abstract: Optical three-dimensional devices are widely used for quality control in the industry and allow controlling various defects or properties. In the context of aeronautics, quality control of the parts' assembly is problematic due to the number of rivets to be checked and the necessary measurement accuracy. This paper presents a new device that makes it possible to measure the positioning of a rivet in less than two seconds with a measurement accuracy close to 10 μm. A standard colour camera and a projector are used to achieve a relatively low-cost device. This device is lightweight and compact enough to be mounted on a robotic arm. Few parameters must be calibrated, and the proposed methodology is accurate even if device positioning errors occur or the appearance of the surface is changed. From a single image acquisition, about 2,000 measuring points on the aircraft skin and up to 600 measuring points on a rivet head of 1 cm2 are performed to evaluate its positioning. Our device is validated by a comparative approach and in real conditions of use.
    Keywords: optical three-dimensional device; computer vision; image processing; quality control.
    DOI: 10.1504/IJCVR.2022.10050711
  • Identification of personality traits from handwritten text documents using multi-label classification models   Order a copy of this article
    by Salankara Mukherjee, Ishita De Ghosh 
    Abstract: Handwriting is widely investigated to mark emotional states and personality. However, the majority of the studies are based on graphology, and do not utilise personality factor models. We use the well-known five-factor model which says that people possess five basic traits, together known as big-five. Hence the problem of personality prediction from handwriting is essentially a multi-label problem. In addition to that, the predicted values should be non-binary decimal numbers since the model says people possess the traits in various degrees. Multi-label classifiers have not been explored for personality assessment using handwriting features. The current work aims to bridge the gap. Multi-label classifiers are trained by trait scores obtained by big-five inventory as well as handwriting features. A number of classifiers including classifier chain, binary relevance and label power-set are employed in the work. Best accuracies of 95.9% with non-binary label values and 97.9% with binary label values are achieved.
    Keywords: multi-label classification; personality assessment; big-five traits; handwriting features; non-binary label values.
    DOI: 10.1504/IJCVR.2022.10049835
  • An investigation into automated age estimation using sclera images: a novel modality   Order a copy of this article
    by Sumanta Das, Ishita De Ghosh, Abir Chattopadhyay 
    Abstract: Automated age estimation attracts attention due to its potential application in fields like customer relationship management, surveillance, and security. Ageing has a significant effect on human eye, particularly in the sclera region, but age estimation from sclera images is a less explored topic. This work presents a comprehensive investigation on automated human age estimation from sclera images. We employ light-weight deep learning models to identify the changes in the sclera colour and texture. Extensive experiments are conducted for three related tasks: estimation of exact-age of a subject, categorical classification of subjects in different age-groups, and binary classification of adult and minor subjects. Results demonstrate good performance of the proposed models against the state-of-the-art methods. We have obtained mean-absolute-error of 0.05 for the first task, accuracy of 0.92 for the second task, and accuracy of 0.89 for the third task.
    Keywords: human age estimation; age-group classification; adult-minor binary classification; sclera images; deep learning; MASDUM; SBVPI.
    DOI: 10.1504/IJCVR.2022.10049572
  • Unsupervised image transformation for long wave infrared and visual image matching using two channel convolutional autoencoder network   Order a copy of this article
    by Kavitha Kuppala, Sandhya Banda, S. Sagar Imambi 
    Abstract: Pixel level matching of multi-spectral images is an important precursor to a wide range of applications. An efficient feature representation which can address the inherent dissimilar characteristics of acquisition by the respective sensors is essential for finding similarity between visual and thermal image regions. Lack of sufficient benchmark datasets of corresponding visual and LWIR images hinders the training of supervised learning approaches, such as CNN. To address both the issues of nonlinear variations and unavailability of huge data, we propose a novel two channel non-weight sharing convolutional autoencoder architecture, which computes similarity using encodings of the image regions. One channel is used to generate an efficient representation of the visible image patch, whereas the second channel is used to transform an infrared patch to a corresponding visual region using encoded representation. Results are shown by computing patch similarity using representations generated from various encoder architectures, evaluated on two datasets.
    Keywords: convolutional autoencoder; CAE; multi-spectral image matching; transformation network; two channel siamese architecture; structual similarity measure; SSIM; KAIST dataset; mean squared error; MSE; peak signal to noise ratio; PSNR; Earth mover’s distance; EMD.
    DOI: 10.1504/IJCVR.2022.10050246
  • Comparison of convolutional neural networks architectures for mango leaf classification   Order a copy of this article
    by B. Jayanthi, Lakshmi Sutha Kumar 
    Abstract: Plant diseases are a threat to the food supply as they reduce the yield, and reduce the quality of fruits and grains. Hence, early identification and classification of plant diseases are essential. This paper aims to classify mango plant leaves into healthy and diseased using convolutional neural networks (CNNs). The performance comparison of CNN architectures, AlexNet, VGG-16 and ResNet-50 for mango plant disease classification is provided. These models are trained using the Mendeley dataset, validation accuracies are found and compared with and without the use of transfer learning models. AlexNet (25 layers, 6.2 million parameters) produces a testing accuracy of 94.54% and consumes less training time. ResNet-50 (117 layers, 23 million parameters) and VGG-16 (16 layers, 138 million parameters) have given testing accuracies of 98.56% and 98.26% respectively. Therefore, based on the accuracies achieved and complexity, this paper recommends AlexNet followed by ResNet-50 and VGG-16 for plant leaf disease classification.
    Keywords: convolution neural networks; neural network; image classification; precision agriculture.
    DOI: 10.1504/IJCVR.2022.10049962
  • ResNet-based surface normal estimator with multilevel fusion approach with adaptive median filter region growth algorithm for road scene segmentation   Order a copy of this article
    by Yachao Zhang, Yuxia Yuan 
    Abstract: As an integral part of information processing, road information has important application value in map drawing, post-disaster rescue and military application. In this paper, convolutional neural network is used to fuse lidar point cloud and image data to achieve road segmentation in traffic scenes. We first use adaptive median filter region growth algorithm to preprocess the input image. The semantic segmentation convolutional neural network with encoding and decoding structure of ResNet is used as the basic network to cross and fuse the point cloud surface normal features and RGB image features at different levels. After fusion, the data is restored into the decoder. Finally, the detection result is obtained by activation function. The KITTI data set is used for evaluation. Experimental results show that the proposed fusion scheme has the best segmentation performance. Compared with other road detection methods, the results show that the proposed method can achieve better overall performance. In terms of AP, the value of proposed method exceeds 95% for UM, UMM scene.
    Keywords: road segmentation; adaptive median filter region growth; data fusion; point cloud surface normal feature; encoding and decoding structure.
    DOI: 10.1504/IJCVR.2022.10049783