Forthcoming and Online First Articles

International Journal of Computational Vision and Robotics

International Journal of Computational Vision and Robotics (IJCVR)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Computational Vision and Robotics (51 papers in press)

Regular Issues

  • The recognition of 3-phase power quality events using optimal feature selection and random forest classifier   Order a copy of this article
    by Laxmipriya Samal, Hemanta Kumar Palo, Badri Narayan Sahu 
    Abstract: This article proposes a novel feature vector by combining the K-means Apriori feature selection algorithm (KAFS) and statistical technique to classify 3-phase Power Quality Disturbance (PQD) events. While the K-means algorithm has clustered the raw signals, the Apriori algorithm has been capable to fetch the desired discriminative features of the chosen PQD events. Further, these discriminative features extracted have been utilized to compute nine-statistical parameters. The reliability of the novel feature vector has been measured in classifying the 3-phase PQD events with similar statistical parameters obtained from the raw PQD samples. Finally, the ability of the Short-time Fourier Transform (STFT) as a time-frequency tool has been evaluated using the KAFS algorithms for the said task. The Random Forest (RF) classifier is chosen to validate the efficacy of the proposed feature vectors. The novel optimized feature vectors using the KAFS have indeed enhanced the recognition accuracy as revealed from our results.
    Keywords: power quality; feature selection; classification; recognition accuracy; random forest algorithm.
    DOI: 10.1504/IJCVR.2022.10047736
  • Application of digit and speech recognition in food delivery robot   Order a copy of this article
    by Low Chun Yin, Sarah Atifah Saruchi, Ong Hong Tze, Chew Ying Xin, Chong Han Wei, Jonathan Lam Lit Seng 
    Abstract: In COVID-19 quarantine centres, physical human interaction is limited to prevent the spread of the virus. Food delivery robots have been seen replacing humans to perform the task perfectly. However, there is a limit in the tasks that a single robot can handle. This paper designs an efficient and intelligent food delivery robot that acts as a messenger that recognises speech from patients and humans in the background can act on them without any physical interaction. The workload on the microcontroller is greatly reduced when a task like face recognition is replaced with digit recognition as patients are tagged with numbers. The design of the robot is also modular and scalable for bigger centres, introducing the capability to expand when necessary. The future of robotic delivery relies on the efficiency and scalability of multiple systems.
    Keywords: speech recognition; digit recognition; image processing; computer vision; robotics.
    DOI: 10.1504/IJCVR.2022.10045833
  • An improved sclera recognition using kernel entropy component analysis method   Order a copy of this article
    by B.S. Harish, M.S. Maheshan, C.K. Roopa, R. Kasthuri Rangan 
    Abstract: Among the various biometric traits that exist in the human body, sclera is considered to be prominent because of its unique characteristics. In this paper, we propose an improved sclera recognition method using kernel entropy component analysis (KECA). The main objective of this paper is to integrate kernel-based methods with entropy to choose the best principal components. Further, the resulting top principal components are given a symbolic interval valued representation. To evaluate the efficiency of the new proposed representation method, we conducted extensive experimentation using various classifiers. The proposed method has achieved over 5.09% of hike in the accuracy result with 50:50 split and over 10.69% of hike with 60:40 split, respectively. The obtained result of the proposed method is effective and feasible for sclera recognition.
    Keywords: sclera; recognition; kernel entropy; symbolic representation.
    DOI: 10.1504/IJCVR.2022.10046563
  • Local directional double ternary coding pattern for facial expression recognition   Order a copy of this article
    by Chebah Ouafa, Laskri Mohamed Tayeb 
    Abstract: This paper presents a novel texture descriptor, the local directional double ternary coding pattern (LDDTCP) that combines the directional information from LDP and the ternary description from LTP for representing facial expression. The proposed LDDTCP operator encodes the image texture by computing the edge and line responses values using the eight directions based Frei-Chen masks. To achieve robustness, the obtained eight Frei-Chen masks are partitioned into two groups according to their directions. After calculating the average of each group, we assign three discrimination levels to each pixel based on the edge responses values in the first group and the line response values in the second group, we obtain LDDTCP-1 and LDDTCP-2 codes, respectively. The last feature descriptor vector LDDTCP is formed by concatenation both LDDTCP-1 and LDDTCP-2 histograms. Experimental results using the CK and JAFFE database show that the LDDTCP descriptor achieves superior recognition performance compared to some existing local descriptor methods.
    Keywords: facial expression recognition; human face; appearance descriptor; geometry descriptor; local binary pattern; LBP; local directional pattern; LDP; local ternary pattern; LTP; support vector machine; SVM.
    DOI: 10.1504/IJCVR.2022.10046572
  • Versatile formation patterns for cooperative target tracking using ground vehicles   Order a copy of this article
    by Lili Ma 
    Abstract: In this paper, we investigate the cooperative target tracking problem using a group of autonomous mobile robots. By introducing a tracking control component to existing pursuit-based formation schemes, it is possible to achieve simultaneous tracking and formation in versatile concentric formations. Balanced circular formations can now be achieved with a prescribed formation radius. Elliptical formations with a variety of orientations and shapes can be achieved by applying a transformation matrix. To address the practical issue of obstacle avoidance, a repellent vector field technique is used, which prevents agents from approaching obstacles. Tracking, formation, and avoidance are combined to provide a more comprehensive solution for cooperative target tracking. The models considered include both single-integrator and double-integrator robots. MATLAB simulations are used to demonstrate the effectiveness of the proposed schemes.
    Keywords: Cooperative target tracking; balanced circular formation; prescribed formation radius; elliptical formation; obstacle avoidance.
    DOI: 10.1504/IJCVR.2022.10046573
  • A comparative study between convolution neural networks and multi-layer perceptron networks for hand-written digits recognition   Order a copy of this article
    by Aaron Rasheed Rababaah 
    Abstract: This paper presents an investigation that aims at comparing deep learning (DL) and traditional artificial neural networks (ANNs) in the application of hand-written digits recognition (HDR). In our study, convolution neural networks (CNNs) are a representative model for the DL models and the multi-layer perceptron (MLP) is a representative model for ANN models. The Two models of MLP and CNN were implemented using MATLAB development environment and tested using a publically available image database that consists of over 20,000 samples from all ten hand-written digits each of which is 24 x 24 pixels. The experimental results showed that the CNN model was superior to the MLP model with an average classification accuracy of 95.14% and 89.74% respectively. Furthermore, the CNN model was observed to have better performance stability and better execution efficiency as the MLP model requires human intervention to handcraft and pre-process the features of the digit patterns.
    Keywords: hand-written digit; pattern recognition; multi-layer perceptron; MLP; deep learning; convolution neural networks; CNNs; comparative study.
    DOI: 10.1504/IJCVR.2022.10047192
  • Paddy variety identification from field crop images using deep learning techniques   Order a copy of this article
    by Naveen N. Malvade, Rajesh Yakkundimath, Girish B. Saunshi, Mahantesh C. Elemmi 
    Abstract: On-field identification of paddy varieties provides actionable information to farmers and policymakers in many aspects of crop handling and management practices. In this paper, three transfer learning pre-trained models namely ResNet-50, EfficientNet-B7, and CapsNet are presented to effectively classify the field crop images of 15 different paddy varieties captured during the booting plant growth stage. The experiments using the CapsNet model with an image dataset comprising 60,000 labelled images show the significant performance with the testing accuracy of 92.96%, and validation accuracy of 95%. The ResNet-50 and EfficientNet-B7 models have yielded the average validation accuracies of 85% and 90%, respectively. The CapsNet model has achieved both higher accuracy and better computational efficiency over the considered deep learning classification models on the held out paddy field crop image dataset.
    Keywords: paddy variety identification; field crop image classification; deep convolutional neural networks; DCNN; transfer learning; CapsNet; ResNet-50; EfficientNet-B7.
    DOI: 10.1504/IJCVR.2022.10047193
  • Camouflaged object segmentation using saliency maps - a comparative study   Order a copy of this article
    by Sachi Choudhary, Rashmi Sharma 
    Abstract: Camouflage is the most common approach employed by armed forces to conceal something from the enemy's gaze on the battlefield or elsewhere. This article covers the literature on several strategies used to find concealed objects that have features in common with the surrounding environment in terms of colour, texture, orientation, and intensity levels. The concern of this research is the use of saliency map to locate the camouflaged object in the scene. The proposed methodology generates a saliency map based on region contrast. Another application for detecting the hidden object in the scene is to evaluate the ability of the blending camouflage pattern. Therefore, computations have been performed to locate the hidden object within the surrounding environment and to find the effectiveness of a camouflaged texture. A comparative study has been conducted here that compare the performance of saliency map based on centre surrounded, global contrast and proposed region contrast. The focus area for this comparison is on camouflaged object only. Based on precision, recall and F-measure values, the performance of mentioned approaches have been evaluated.
    Keywords: camouflage object detection; saliency map; camouflage texture evaluation; military camouflage.
    DOI: 10.1504/IJCVR.2022.10047203
  • An experimental evaluation of feature detectors and descriptors for visual SLAM   Order a copy of this article
    by Taihú Pire, Hernán Gonzalez, Emiliano Santano, Lucas Terissi, Javier Civera 
    Abstract: Visual SLAM has, in general, a high computational footprint. Its potential applications such as augmented reality (AR), virtual reality (VR) and robotics have hard real-time constraints and limited computational resources. Reducing the cost of visual SLAM systems is hence essential to equip small robots and AR/VR devices with such technology. Feature extraction, description and matching is at the core of feature-based SLAM systems, having a direct impact in their performance. This work presents a thorough experimental analysis of feature detectors, descriptors and matchers for visual SLAM, focusing on their cost and their effect in the estimation accuracy. We also run our visual SLAM system in an embedded platform (Odroid-XU4) and show the effect of using such limited hardware in the accuracy and cost of the system. Finally, in order to facilitate future research, our evaluation pipeline is made publicly available.
    Keywords: visual SLAM; local image feature; descriptor extractor; keypoint detector; performance evaluation.
    DOI: 10.1504/IJCVR.2022.10047492
  • Path planning of mobile manipulator for navigation and object clean-up   Order a copy of this article
    by Aaditya Asit Saraiya, Sangram Keshari Das, B.K. Raut, V. Kalaichelvi 
    Abstract: Industry and warehouses have been paying lots of attention to mobile manipulator-based path planner problems. This paper focuses on multi-target object clean-up operations using vision sensor which has ample industrial applications. In this work a vision-based path planning approach has been implemented using A* algorithm in order to avoid the obstacles and reach the goal location using the shortest path. The algorithm was developed to classify objects in the workspace as handleable/non-handleable from real-time measurements. In case of multi-object clean-up operations, a priority is set depending on the scenario and a weighted cost function approach is proposed. A series of simulation experiments are conducted to test the effectiveness of the proposed algorithm. The entire workflow of the mobile manipulation-based path planner is demonstrated using various scenarios. This problem has lot of relevance in real world.
    Keywords: vision-based navigation; mobile manipulation-based path planner; object detection; A* path planning algorithm; OpenCV; ROS framework.
    DOI: 10.1504/IJCVR.2022.10047589
  • Research on the online parameter identification method of train driving dynamic model   Order a copy of this article
    by Dandan Liu, Xiangxian Chen, Zhonghao Guo, Jiaxi Yuan, Shoulin Yin 
    Abstract: Automatic train operation (ATO) system is an important driving control system for train operation, which adjusts traction or braking force in real time according to different operating environments. As an important part of the ATO system, the train dynamic model determines the tracking accuracy of the train to the target speed. Based on the force analysis of the actual train operation, the single-particle dynamic models of train operation were established. Considering the high efficiency of the single-particle model in online identification, the single-particle train model is applied to the actual parameter identification. Firstly, the second-order single particle model is established, and three identification methods and two sets of data are compared and analysed. The auxiliary model and the recursive least square method with variable forgetting factor (AM-VFF-RLS) identification method have good performance. On this basis, a third-order-single-particle model is established. Through the analysis of the identification results, it is found that the model can improve the identification accuracy while ensuring the efficiency.
    Keywords: train dynamic model; online identification; AM-VFF-RLS; ATO system.
    DOI: 10.1504/IJCVR.2022.10047951
  • An optimised local feature compression using statistical and structural approach for face recognition   Order a copy of this article
    by A. Divya, K.B. Raja, K.R. Venugopal 
    Abstract: Face recognition is the current extensive research region studied among several recognition tasks in the field of pattern recognition. Face images captured under an unrestricted environment generally contain discrepancies in the pose, illumination and expression (PIE). To improve the robustness of the face image due to PIE variations, an optimised local feature compression (OLFC) is proposed using the matching algorithm and classifier. The pixel values of the images are structured as low picture element values (LPEV) and high picture element values (HPEV). The discrete wavelet transform and statistical methods are applied on LPEV and HPEV respectively to obtain substantial data and statistical features, which results in reduced features dimensions. Experiment is performed on six popular face databases (ORL, YALE, JAFFE, EYB, Faces-94 and FERET), illustrates an excellent performance with high recognition accuracy of 95.5%, 99.33%, 100%, 99.69%, 99.86% and 96.39% respectively with reduced error and computation time compared with existing methods.
    Keywords: face recognition; discrete wavelet transform; DWT; Euclidean distance; artificial neural networks; ANNs.
    DOI: 10.1504/IJCVR.2022.10047958
  • Energy-aware automatic video annotation tool for autonomous vehicle   Order a copy of this article
    by N.S. Manikandan, K. Ganesan 
    Abstract: In a self-driving car, real-time video obtained from the camera sensors is analysed using various scene understanding algorithmic modules (object detection, object classification, lane detection and object tracking). In this paper, we propose an annotation tool that uses deep learning techniques for each of the four modules mentioned above, and the best ones are chosen based on suitable metrics. Our tool is 83% accurate when compared with a human annotator. We considered a video with 530 frames of resolution 1,035 x 1,800 pixels. Our proposed tool consumed 43 minutes of computation with 36.73 g of CO2 emission in a CPU-based system and 2.58 minutes of computation with 7.75 g of CO2 emission in a GPU-based system to process all four modules. But the same video took nearly 3,060 minutes of computational usage with 2.56 kg of CO2 emission for one human annotator to narrate the scene using a normal computer.
    Keywords: automatic annotation; deep learning; object classification; object detection; lane detection; object tracking.
    DOI: 10.1504/IJCVR.2022.10048219
  • Image retrieval by using texture and shape correlated hand crafted features   Order a copy of this article
    by Suresh Kumar Kanaparthi, U.S.N. Raju 
    Abstract: Content-based image retrieval (CBIR) has become one of the trending areas of research in computer vision. In this paper, consonance on hue, saturation, and intensity is used by applying inter-channel voting between them. Diagonally symmetric pattern (DSP) from the intensity component of the image is computed. The grey level co-occurrence matrix (GLCM) is applied to DSP to extract texture features. Histogram of oriented gradients (HOG) features is used to extract the shape information. All three features are concatenated. To evaluate the efficiency of our method, five performance measures are calculated, i.e., average precision rate (APR), average recall rate (ARR), F-measure, average normalised modified retrieval rank (ANMRR) and total minimum retrieval epoch (TMRE). Corel-1K, Corel-5K, Corel-10K, VisTex, STex, and colour Brodatz are used. The experimental results show an improvement in 100% cases for Corel-1K dataset, 80% cases for Corel-5k and 80% cases for each of the three texture datasets.
    Keywords: content-based image retrieval; CBIR; interchannel voting; texture; hand crafted features; shape.
    DOI: 10.1504/IJCVR.2022.10048323
  • An automated system to detect crop diseases using deep learning   Order a copy of this article
    by Purushottam Sharma, Manoj Kumar, Richa Sharma, Shashi Bhushan, Sunil Gupta 
    Abstract: Food is one of the necessities for a human being to survive. Moreover, since the population is increasing with each passing day, growing sufficient crops to feed such a vast population becomes evident. Also, the country’s economy is based on agricultural production as well. However, there is a significant threat to agricultural crop production in today’s times, and hence the analysis of crop diseases becomes essential. Thus, the automatic identification and analysis of plant diseases are highly desired in agricultural information. The main objective of the research to develop an optimised approach for system automation to detect crop diseases. Here we proposed an approach for building an automated system that primarily detects diseases using leaf images and some other features like recommending the remedy for that disease. We created a model using a convolution neural network algorithm and used the transfer learning approach using Inception v3 and ResNet 50 model. Further, we used this model and collected some data for remedies for the diseased classes and added that feature to our system.
    Keywords: convolutional neural network; CNN; leaf image; transfer learning; crop disease; Inception v3; ResNet 50.
    DOI: 10.1504/IJCVR.2022.10048421
  • An improved multi-criteria-based feature selection approach for detection of coronary artery disease in machine learning paradigm   Order a copy of this article
    by Bikesh Kumar Singh, Sonali Dutta, Poonam Chand, Khilesh Kumar, Sumit Kumar Banchhor 
    Abstract: This paper presents an accurate approach for the detection of coronary artery disease (CAD) using an improved multi-criteria feature selection (IMCFS) approach in a machine learning (ML)-based paradigm. This study uses the Z-Alizadeh Sani dataset of CAD, consisting of 303 patients with 56 different attributes. The proposed IMCFS-based approach uses seven different traditional feature selection techniques. For classification, the support vector machine is used with four different kernel functions and is evaluated using three cross-validation protocols. Lastly, performance is evaluated using five measures. The proposed IMCFS-based approach using the 30 most relevant features outperforms all other traditional feature selection techniques and achieved the highest classification accuracy, sensitivity, specificity, the area under receiver operating characteristics, and Mathew’s correlation coefficient of 91.9%, 95.7%, 82.1%, 88.9% and 79.7%, respectively. The proposed IMCFS-based approach is an entirely reliable, automated, and highly accurate ML tool for detecting CAD.
    Keywords: coronary artery disease; CAD; multi-criteria feature selection; machine learning; classification; support vector machine; SVM; kernel functions; cross-validation; accurate; automated; reliable.
    DOI: 10.1504/IJCVR.2022.10048601
  • A combination of 'feature mapping' and 'block' approaches to reduce the matching area of stereoscopic algorithms   Order a copy of this article
    by Djaber Rouabhia, Nour Eddine Djedi 
    Abstract: In this paper, we propose a new approach to restrict the matching field of stereoscopic algorithms. It has been found that computing the disparity map implies using the whole image for a wide range of stereoscopic methods, thus, leading to extra-time calculation and visual artefacts in the results. Based on this observation, we derived an approach that significantly reduces the evaluated time of stereoscopic algorithms and avoids noises appearing in the result. The proposed approach introduces a strong association between silhouette edges and stereoscopic algorithms by using only the geometric information present in the images to restrict the matching area. The proposed method aims to limit the matching zone to the exact geometry of the analysed object, avoiding, therefore, extra times and undesirable noises. We did not use hard-coding algorithms or expensive equipment, and we got accepted results in terms of time and accuracy.
    Keywords: 3D reconstruction; stereo-vision; multi-view stereo; MVS; disparity map; feature mapping; block.
    DOI: 10.1504/IJCVR.2022.10048705
  • Non-intersecting curved paths using Bezier curves in the 2D-Euclidean plane for multiple autonomous robots   Order a copy of this article
    by Utsa Roy, Krishnendu Saha, Chintan Kr Mandal 
    Abstract: This paper proposes an algorithm for generating non-intersecting continuous paths for multiple robots having unique source and destination with convex polygonal obstacles in the 2D-Euclidean plane. It generates paths using Dijkstra's algorithm with the edges of the visibility graph of the map. Although Dijkstra's algorithm is used, the generated paths between the source-destinations will not be explicitly 'short paths' as an edge of the visibility graph cannot be used in multiple paths. The robots are prioritised with respect to the Euclidean distance between them. The discrete paths are converted into continuous paths using Bezier curves along the corners of the edges. The algorithm sequentially generate paths one after another based on a priority based on the Euclidean distance between their source and destination.
    Keywords: visibility graph; Dijkstra’s algorithm; convex hull; Bezier curve.
    DOI: 10.1504/IJCVR.2022.10048706
  • Energy-aware vehicle/pedestrian detection and close movement alert at nighttime in dense slow traffic on Indian urban roads using a depth camera   Order a copy of this article
    by N.S. Manikandan, K. Ganesan 
    Abstract: In recent times, Infrared cameras, thermal cameras, RADAR, LIDAR, and depth cameras are widely used in the nighttime vehicle/pedestrian detection systems. In the present research, we propose a novel way of detecting the vehicles/pedestrians in dense, slow traffic conditions at night times. To train and build the necessary artificial intelligence model, daytime depth images are used. For training, the necessary datasets are created using a customised method. The trained model was used to detect the vehicles/pedestrians during night time. In addition, the tracking algorithm was used to follow the vehicle/pedestrian and predict its close movement and direction and provide the necessary warnings to the vehicle drivers. The proposed model was tested against a variety of object detection and tracking techniques involving embedded GPUs and depth cameras. The best suits of algorithms were identified based on the metrics such as accuracy, execution time, and the less carbon emission. Our proposed method has detected and tracked the vehicles/pedestrians accurately within the 17-metre range.
    Keywords: green computing; artificial intelligence; deep learning; object detection; object tracking.
    DOI: 10.1504/IJCVR.2022.10048790
  • Developed a late fusion of multi facial components for facial recognition with a voting method and global weights   Order a copy of this article
    by Nguyen Van Danh, Vo Hoang Trong, Pham The Bao 
    Abstract: With the development of deep learning, many solutions have achieved outstanding performance in solving facial recognition problems. Nevertheless, many challenges still stand, such as occluded face or illumination. This paper proposes a late fusion of many weighted weak classifiers to form a strong classifier for facial recognition. We train convolutional neural network models as weak classifiers on specific facial components. We build a strong classifier by lately fusing those weak classifiers with corresponding weights calculated locally or globally. A voting method is applied to determine the identity of the face. We experimented on five databases: ORL, CyberSoft, Georgia Tech, Essex Grimace and Essex Faces96. Performances of our method in those databases varied between 99% and 100%. Our proposed method can be used efficiently when a facial image only contains a few facial components. Also, our proposed global weights worked well on many facial databases.
    Keywords: facial recognition; facial components; multi-CNNs; late fusion; voting method.
    DOI: 10.1504/IJCVR.2022.10048849
  • RGB-depth map formation from cili-padi plant imaging using stereo vision camera   Order a copy of this article
    by Wira Hidayat Bin Mohd Saad, Muhammad Haziq Bin Abd Razak, Muhammad Noorazlan Shah Zainudin, Syafeeza Binti Ahmad Radzi, Muhd. Shah Jehan Bin Abd. Razak 
    Abstract: Stereo vision is one of the advancements in computer vision and pattern recognition applications using a dual camera to mimic human visuals. This study focused on RGB-depth (RGB-d) map image formation selection parameters, specifically from the stereo image captured on the cili-padi (birds-eye chilli) plant. The process starts from calibrating the camera used with a checkerboard image to obtain the camera’s intrinsic and extrinsic resolution. The stereo images were rectified to facilitate the disparity computation between the left and right images. Then, point cloud plotting is acquired by using a triangulation function on the image disparity with the camera parameter value. RGB-d images are computed by normalising the depth information of each point plot into greyscale value or any other suitable colourmap. Comparing the different types of disparity map transformation function algorithms used to produce the RGB-d image shows that using SGM-function provides the best output of RGB-d image formation.
    Keywords: cili-padi plant; depth map formation; RGB-depth map; stereo camera vision.
    DOI: 10.1504/IJCVR.2022.10049137
  • An investigation into automated age estimation using sclera images: a novel modality   Order a copy of this article
    by Sumanta Das, Ishita De Ghosh, Abir Chattopadhyay 
    Abstract: Automated age estimation attracts attention due to its potential application in fields like customer relationship management, surveillance, and security. Ageing has a significant effect on human eye, particularly in the sclera region, but age estimation from sclera images is a less explored topic. This work presents a comprehensive investigation on automated human age estimation from sclera images. We employ light-weight deep learning models to identify the changes in the sclera colour and texture. Extensive experiments are conducted for three related tasks: estimation of exact-age of a subject, categorical classification of subjects in different age-groups, and binary classification of adult and minor subjects. Results demonstrate good performance of the proposed models against the state-of-the-art methods. We have obtained mean-absolute-error of 0.05 for the first task, accuracy of 0.92 for the second task, and accuracy of 0.89 for the third task.
    Keywords: human age estimation; age-group classification; adult-minor binary classification; sclera images; deep learning; MASDUM; SBVPI.
    DOI: 10.1504/IJCVR.2022.10049572
  • ResNet-based surface normal estimator with multilevel fusion approach with adaptive median filter region growth algorithm for road scene segmentation   Order a copy of this article
    by Yachao Zhang, Yuxia Yuan 
    Abstract: As an integral part of information processing, road information has important application value in map drawing, post-disaster rescue and military application. In this paper, convolutional neural network is used to fuse lidar point cloud and image data to achieve road segmentation in traffic scenes. We first use adaptive median filter region growth algorithm for preprocessing the input image. The semantic segmentation convolutional neural network with encoding and decoding structure of ResNet is used as the basic network to cross and fuse the point cloud surface normal features and RGB image features at different levels. After fusion, the data is restored into the decoder. Finally, the detection result is obtained by activation function. The KITTI data set is used for evaluation. Experimental results show that the proposed fusion scheme has the best segmentation performance. Compared with other road detection methods, the results show that the proposed method can achieve better overall performance. In terms of AP, the value of proposed method exceeds 95% for UM, UMM scene.
    Keywords: road segmentation; adaptive median filter region growth; data fusion; point cloud surface normal feature; encoding and decoding structure.
    DOI: 10.1504/IJCVR.2022.10049783
  • Texture-based approach to classification meningioma using pathology images   Order a copy of this article
    by Yasmeen O. Sayaheen 
    Abstract: Manual analysis and judgement system suffered by two boundaries: first, studying histological slides by manual human’s effort is time overhead and the human specialists are not permanently obtainable. Secondly, while a lot of work has been done to outline diagnostic standards for all tumour components. CAD is quickly developing owing to the obtainability of up-to-date computing procedures, fresh imaging tools, plus patient data for infection diagnosis. Decision making using computer-assisted can be helped to enhance histopathologists by providing additional objective diagnostic and analytic parameters. Recently, tumour has become one of the most affected diseases that affect human health. Brain is a central system for human bodies that control, organise and arrange regular habit tasks. This paper talks about meningiomas tumour, which is considered one of the popular brain tumours. Colour-based segmentation, morphological operation used to enhance the appearance of cells. Texture-based feature FOS, GLCM, GLRS, GLDS and NGTDM) used to enhance CAD feature extraction process and two classifiers used to improve decision making (SVM and KNN).
    Keywords: meningiomas tumour; texture feature; first order statistics; FOS; grey-level co-occurrence matrix; GLCM; grey-level run length statistic; GLRS; GLDS; neighbourhood grey-tone difference matrix; NGTDM; classification; support vector machine; SVM; k-nearest neighbours; KNN.
    DOI: 10.1504/IJCVR.2022.10049784
  • Identification of personality traits from handwritten text documents using multi-label classification models   Order a copy of this article
    by Salankara Mukherjee, Ishita De Ghosh 
    Abstract: Handwriting is widely investigated to mark emotional states and personality. However, the majority of the studies are based on graphology, and do not utilise personality factor models. We use the well-known five-factor model which says that people possess five basic traits, together known as big-five. Hence the problem of personality prediction from handwriting is essentially a multi-label problem. In addition to that, the predicted values should be non-binary decimal numbers since the model says people possess the traits in various degrees. Multi-label classifiers are not explored yet for personality assessment using handwriting features. Current work aims to bridge the gap. Multi-label classifiers are trained by trait scores obtained by big-five inventory as well as handwriting features. A number of classifiers including classifier chain, binary relevance and label power-set are employed in the work. Best accuracies of 95.9% with non-binary label values and 97.9% with binary label values are achieved.
    Keywords: multi-label classification; personality assessment; big-five traits; handwriting features; non-binary label values.
    DOI: 10.1504/IJCVR.2022.10049835
  • Comparison of convolutional neural networks architectures for mango leaf classification   Order a copy of this article
    by B. Jayanthi, Lakshmi Sutha Kumar 
    Abstract: Plant diseases are a threat to the food supply as they reduce the yield, and reduce the quality of fruits and grains. Hence, early identification and classification of plant diseases are essential. This paper aims to classify mango plant leaves into healthy and diseased using convolutional neural networks (CNNs). The performance comparison of CNN architectures, AlexNet, VGG-16 and ResNet-50 for mango plant disease classification is provided. These models are trained using the Mendeley dataset, validation accuracies are found and compared with and without the use of transfer learning models. AlexNet (25 layers, 6.2 million parameters) produces a testing accuracy of 94.54% and consumes less training time. ResNet-50 (117 layers, 23 million parameters) and VGG-16 (16 layers, 138 million parameters) have given testing accuracies of 98.56% and 98.26% respectively. Therefore, based on the accuracies achieved and complexity, this paper recommends AlexNet followed by ResNet-50 and VGG-16 for plant leaf disease classification.
    Keywords: convolution neural networks; neural network; image classification; precision agriculture.
    DOI: 10.1504/IJCVR.2022.10049962
  • Edge feature enhanced convolutional neural networks for face recognition using IoT devices   Order a copy of this article
    by Ankur, Mohit Kumar Rohilla, Rahul Gupta 
    Abstract: COVID-19 pandemic has turned the world upside down, with almost everything coming to a halt. In the current period, where we are slowly returning to normal lives, organisations have become more concerned about safety and health. In the post-COVID period, biometric systems based on Fingerprint can be dangerous; moreover, real-time attendance of employees and students joining from online mode is a challenge. Real-time face recognition is a challenging task in terms of accuracy and reliability, especially when deep convolutional neural networks (DCNN) are used for face recognition. DCNNs are data-hungry, and in real-life scenarios, the amount of data per subject or class is minimal, and the number of subjects/classes can be huge. Hence, the need for research on image processing and data augmentation research arises for face recognition as there are many scenarios where the number of classes (subjects) is vast.
    Keywords: face recognition; edge enhancement; face edge processing; deep convolutional neural network; DCNN; data augmentation; image processing.
    DOI: 10.1504/IJCVR.2022.10050239
  • Unsupervised image transformation for long wave infrared and visual image matching using two channel convolutional autoencoder network   Order a copy of this article
    by Kavitha Kuppala, Sandhya Banda, S. Sagar Imambi 
    Abstract: Pixel level matching of multi-spectral images is an important precursor to a wide range of applications. An efficient feature representation which can address the inherent dissimilar characteristics of acquisition by the respective sensors is essential for finding similarity between visual and thermal image regions. Lack of sufficient benchmark datasets of corresponding visual and LWIR images hinders the training of supervised learning approaches, such as CNN. To address both the issues of nonlinear variations and unavailability of huge data, we propose a novel two channel non-weight sharing convolutional autoencoder architecture, which computes similarity using encodings of the image regions. One channel is used to generate an efficient representation of the visible image patch, whereas the second channel is used to transform an infrared patch to a corresponding visual region using encoded representation. Results are shown by computing patch similarity using representations generated from various encoder architectures, evaluated on two datasets.
    Keywords: convolutional autoencoder; CAE; multi-spectral image matching; transformation network; two channel siamese architecture; structual similarity measure; SSIM; KAIST dataset; mean squared error; MSE; peak signal to noise ratio; PSNR; Earth mover’s distance; EMD.
    DOI: 10.1504/IJCVR.2022.10050246
  • Intelligent classification model for holy Quran recitation Maqams   Order a copy of this article
    by Aaron Rasheed Rababaah 
    Abstract: Quranic recitation is a field that has been studied for centuries by scholars from different disciplines including tajweed scholars, musicians and historians. Maqams are a system of scales of melodic vocal patterns that have been established and practiced by Quran reciters all over the world for centuries. Traditionally, Maqams are taught by an expert of Quran recitation. We are proposing a process model for intelligent classification of Quran maqams using a comparative study of neural networks, deep learning and clustering techniques. We utilised a publicly available audio dataset of Maqams labelled audio signals consisting of the eight primary Maqams: Ajam, Bayat, Hijaz, Kurd, Nahawand, Rast, Saba, and Seka. The experimental work showed that all of the three classifiers nearest neighbour, multi-layered perceptron and deep learning performed well. Furthermore, it was found that deep learning with power spectrum features was the best model with a classification accuracy of 96.55%.
    Keywords: Quran Maqams; neural networks; signal processing; deep learning; convolutional neural networks; CNN; audio signal features; short-term Fourier transform; STFT; power spectrum.
    DOI: 10.1504/IJCVR.2022.10050367
  • Quantitative analysis of transfer and incremental learning for image classification   Order a copy of this article
    by Mohammed Ehsan Ur Rahman, Imran Shafiq Ahmad 
    Abstract: Incremental and transfer learning are becoming increasingly popular and important because of its advantageous nature in data scarcity scenarios. This work entails a quantitative analysis of the incremental learning approach along with various transfer learning methods using the task of image classification. A detailed analysis of the assumptions under which incremental learning should be applied is presented. The degree to which these assumptions hold in most real-world scenarios is also presented. For experiments, MNIST and CIFAR-100 were used. The extensive coverage of incremental and transfer learning techniques on these two datasets showed that a performance improvement is achieved when these techniques are used in data-scarce situations.
    Keywords: transfer learning; incremental learning; deep learning; image classification; image generation; neural networks; artificial intelligence; machine learning; MNIST; CIFAR-10; digit recognition.
    DOI: 10.1504/IJCVR.2022.10050419
  • An improvement in IoT-based smart trash management system using Raspberry Pi   Order a copy of this article
    by Muhammad Shakir, Shahid Karim, Shahzor Memon, Sadiq Ur Rehman, Halar Mustafa 
    Abstract: Our primary aim is to establish an environmentally sustainable and pollution-free community. The responsible states and their citizens carry out all the attempts to make the city neat and clean. In this paper, smart dustbin garbage collection work has been proposed and completed. To avoid all garbage issues, we have developed a project based on a monitoring system with the help of IoT technology. The proposed work is bidirectional; firstly, it connects with hardware, and secondly, it is supported with mobile by developing an Android-based application. Firebase fire store is used to provide communication between both applications. We have improved the smart trash management system using Raspberry Pi, which is pertinent to developed cities worldwide. This project tracks the dustbins and informs the admin of the amount of garbage collected in the garbage bins via a smartphone application. The proposed approach lowers the total number of waste collection truck trips, thereby lowering the total waste collection budget. It ultimately contributes to society’s cleanliness.
    Keywords: internet of things; IoT; Raspberry Pi; ultrasonic sensor; garbage monitoring; Android.
    DOI: 10.1504/IJCVR.2022.10050420
  • A framework for breast cancer prediction and classification using deep learning   Order a copy of this article
    by Praveen Kumar Shukla, Aditya Ranjan Behera 
    Abstract: Breast cancer is a very common disease nowadays. But it is very important to identify and diagnose it at an early stage. So before identifying, it requires identifying and classifying the cancerous cell. Generally to detect the cancerous cell mammography process is more intuitive than any other methods. This is a method of computer aided diagnostic that includes digital image processing for detection of breast cancer. Article represents the method of detection of cancer affected cells and classifies normal patients to cancerous patients. Pre-processing operations are performed on mammographic images after normalisation of the mammographic images. To complete the task of prediction of cancer affected cells, a breast cancer prediction model architecture has been proposed with an accuracy of 94.87%. For the classification of cancerous patients and normal patients, VGG Net 19 architecture has been adopted with an accuracy of 97.27%. In the purposed framework model can be implemented practically as an application in the field of breast cancer diagnosis for a better result in a shorter period.
    Keywords: artificial neural network; benign; breast cancer; fine needle aspirate; malignant; nuclei; recti linear unit.
    DOI: 10.1504/IJCVR.2022.10050421
  • Digital image watermarking based on hybrid FRT-HD-DWT domain and flamingo search optimisation   Order a copy of this article
    by P.J.R. Shalem Raju, K.V.D. Kasula, Pokkuluri Kiran Sree 
    Abstract: The image watermarking topologies afford a promising solution in digital media copyright protection. However, it is essential to take into account the robustness of watermarking methods. Therefore, in this paper, a digital image watermarking technique based on finite ridgelet transform (FRT), discrete wavelet transform (DWT), and Hessenberg decomposition (HD) is proposed. The FRT and HD methods are hybridised with DWT to enhance the embedding capacity. Likewise, the embedding strength factor is also optimised through a flamingo search algorithm (FSA). After embedding the extraction process is carried out with deep belief neural (DBN) network. The experiments are conducted on four kinds of host images like Lena, house, baboon, and Barbara in MATLAB platform. The results are compared with different models in terms of peak signal to noise ratio (PSNR), normalisation coefficient (NC), and structural similarity index measure (SSIM) under various geometric attacks.
    Keywords: image watermarking; finite ridgelet transform; FRT; flamingo search algorithm; FSA; Hessenberg decomposition; strength factor; discrete wavelet transform; DWT.
    DOI: 10.1504/IJCVR.2022.10050520
  • Image-based deep learning automated grading of date fruit (Alhasa case study Saudi Arabia)   Order a copy of this article
    by Amnah Aldandan, Sajedah AlGhanim, Hawraa Alhashim, Mona A.S. Ali 
    Abstract: Dates are small and popular in the Middle East, and they grow in many countries. Many researchers focus on classifying dates by type. But the researchers didn't consider that many date industries sort dates by quality to determine the proper price and use. This paper classifies Tamer stage dates automatically based on quality. This study proposed two ways to differentiate date fruit quality. First, use CNN; VGG-16 to extract features from the dataset, then uses SVM classifier. Second method based on developed CNN. Tamar used three different images to train these models. Another contribution is the creation of our own dataset, which was acquired using a smartphone camera under uncontrolled lighting and camera parameter circumstances, such as autofocus and camera stabilisation. A comparison between two methods, the CNN model had 97% classification accuracy for Khalas, 95% for Ruzaiz and 90% for Shaishi.
    Keywords: date fruit; classification; Rutab stages; deep learning; convolutional neural network; CNN; support vector machine; SVM; Saudi Arabia.
    DOI: 10.1504/IJCVR.2022.10050650
  • High-speed optical 3D measurement device for quality control of aircraft rivet   Order a copy of this article
    by Benjamin Bringier, Majdi Khoudeir 
    Abstract: Optical three-dimensional devices are widely used for quality control in the industry and allow controlling various defects or properties. In the context of aeronautics, quality control of the parts assembly is problematic due to the number of rivets to be checked and the necessary measurement accuracy. This paper presents a new device that makes it possible to measure the positioning of a rivet in less than two seconds with a measurement accuracy close to 10 um. A standard colour camera and a projector are used to achieve a relatively low-cost device. This device is lightweight and compact enough to be mounted on a robotic arm. Few parameters must be calibrated, and the proposed methodology is accurate even if device positioning errors occur or the appearance of the surface is changed. From a single image acquisition, about 2,000 measuring points on the aircraft skin and up to 600 measuring points on a rivet head of 1 cm2 are performed to evaluate its positioning. Our device is validated by a comparative approach and in real conditions of use.
    Keywords: optical three-dimensional device; computer vision; image processing; quality control.
    DOI: 10.1504/IJCVR.2022.10050711
  • Deep learning method for human activity recognition using heaped LSTM and image pattern of activity   Order a copy of this article
    by P. Rajesh, R. Kavitha 
    Abstract: Deep learning, the most spelt word and habitually used technology of the researchers around the globe of technical arena. With the tremendous growth of technologies like data analytics, data mining, machine learning methods and IoT applications like health monitoring, safety and security, smart control, human movement acknowledgement has become more noteworthy achievement in the field of science. Utilising the most booming technology, we propose a unique approach for monitoring human activities who aspire to live and lead an independent life, mostly the elderly people. In this experiment we discovered a novel method in identifying the human activities and the forte of this approach is the privacy of the monitored person is ensured. This investigation is moved forward by utilising an improved convolution neural network (CNN) with enriched bi-directional LSTM (BLSTM). The activity recognition model is still optimised by using a heaped LSTM (HLSTM) and a fine trained data clustering algorithm. Our proposed approach, when trained and tested with a prominent dataset that contains sensor data, achieved overall accuracy of 99.43% for all the considered nine activities.
    Keywords: activity recognition; bi-directional LSTM; BLSTM; clustering; convolution neural network; CNN; heaped LSTM.
    DOI: 10.1504/IJCVR.2022.10050893
  • Efficient masked face identification biometric systems based on ResNet and DarkNet convolutional neural networks   Order a copy of this article
    by Freha Mezzoudj, Chahreddine Medjahed 
    Abstract: The COVID-19 pandemic has caused death and serious illness in the entire world. During humanity’s fight against this disease, the wearing of face masks has become and remains a necessity in our daily life. This critical fight encourages us to generate a rich masked face database (noted FEI-SM) with different variations of poses and different emotions. We also employed several robust convolutional neural network systems based on three ResNet and two DarkNet models (ResNet18, ResNet50, ResNet101, DarkNet19, and DarkNet53) to measure the accuracy of biometric identification of masked and un-occluded faces on the challenging masked face database FEI-SM. In general, the compared results are showing good accuracies with many used biometric systems. Through experimental runs, the obtained outputs show clearly that the scheme model based on ResNet18 is the most effective model to recognise individuals with masks in different scenarios in terms of rate recognition and testing time.
    Keywords: biometric; masked face identification; ResNet; DarkNet; FEI-SM database; convolutional neural networks; CNN.
    DOI: 10.1504/IJCVR.2022.10052153
  • Applied-behavioural analysis therapy for autism spectrum disorder students through virtual reality   Order a copy of this article
    by T. Subetha, Kayal Padmanandam, L. Lakshmi, S.L. Aruna Rao 
    Abstract: Autism spectrum disorder (ASD) is a neurological disorder that contracts one’s social engagements and suffers from mind blindness, leading to a lack of social-emotional reciprocity. The most unravelling solution for them to learn social behaviour is using interactive virtual reality technologies. This study aims to develop an applied behaviour analysis therapy through virtual reality (VR)-based training to learn the necessary social-communication skills. The proposed system consists of two sessions. First, the training session, where the students are trained with VR content, developed to exhibit social-communication skills. Next, the students enter the practice session where the student will be given a chance to practice the lessons taught during the training session. Student gestures are recognised using a multimodal gesture recognition system, and deep neural network is employed to identify the student’s speech. The successful video snippets are stitched into a video using automatic video self-modelling (VSM) and it allows the learner to improve a particular target social behaviour. The system has been evaluated using a comparative study with and without the proposed study and the results evidence that students have improved learning and communication in the real world, which seems to be the dream of their parents and family.
    Keywords: virtual reality; augmented reality; gesture detection; voice recognition; video self-modelling.
    DOI: 10.1504/IJCVR.2022.10051122
  • Acute myelogenous leukaemia detection in blood microscope images using particle swarm optimisation   Order a copy of this article
    by Abdullah Mohan, Kedir Beshir, Alemayehu Kebede 
    Abstract: The acute myelogenous leukaemia (AML) is one of the types of acute leukaemia that is seen in adults. Nowadays, people use manual tests of blood smear to diagnose leukaemia. This manual method requires more time and the operators ability to diagnose the diseases. In this article, a new hybrid technique that detects AML in blood smears is presented. The proposed method uses a texture-based method - local binary pattern (LBP) and a statistical-based method - grey-level co-occurrence matrix (GLCM) to extract the features from WBC cells. The best features are selected by using a PSO algorithm and their accuracy is measured using nearest neighbour (NN)-classifier and extreme learning machine (ELM). The proposed method was tested using American Society of Hematology (ASH) public datasets and achieved promising results. The ASH database consists of 80 images, where 40 images are taken from AML patients and the remaining 40 are from non-AML patients. The proposed method, LBP+GLCM+PSO along with the ELM classifier achieved an accuracy of 90.44%. The experiment shows that the proposed method outperforms the existing methods in the detection of AML.
    Keywords: NN-classifier; particle swarm optimisation; PSO; acute myelogenous leukaemia; AML; acute lymphoblastic leukaemia; ALL.
    DOI: 10.1504/IJCVR.2022.10051333
  • A new approach to detect cardiovascular diseases using ECG scalograms and ML-based CNN algorithm   Order a copy of this article
    by Lanka Alekhya, P. Rajesh Kumar 
    Abstract: Convolutional neural networks (CNNs) have gained popularity in the classification of cardiovascular diseases using ECG signals. This paper uses a pre-trained CNN model Visual Geometry Group16 (VGG16) network with the transfer learning process is used for feature extraction with SVM, k-NN and RF algorithms to classify the signals. The input to VGG16 net were ECG signals that are considered from the MIT-BIH database for four classes of heart ailments. Around 27 min and 42 sec of elapsed time is engaged to train the network. The study evaluates that this hybrid model of CNN performs on test data and gives an overall model accuracy and mean of MCC for SVM as 95.83% and 94.52%, for k-NN as 96.67% and 95.60% and for Random Forest as 96.94% and 95.96% respectively which gives a better performance when compared with only pretrained CNN-VGG16Net with an overall accuracy of 95.3% and 93.75% as mean MCC.
    Keywords: electrocardiogram; ECG; convolutional neural network; CNN; Visual Geometry Group16; VGG16; support vector machine; SVM; k-nearest neighbour; k-NN; random forest; RF; Mathews correlation coefficient; MCC.
    DOI: 10.1504/IJCVR.2022.10051429
  • Comprehensive survey on video anomaly detection using deep learning techniques   Order a copy of this article
    by Sreedevi R. Krishnan, P. Amudha, S. Sivakumari 
    Abstract: The rapid increase in violence and crime leads to the use of video surveillance systems. Handling such huge videos and classifying them as abnormal or not are tedious. Therefore, an automatic anomaly detection method is vital for the real-time detection of anomalous events. Advancements in machine intelligence lead to an automatic anomaly detection system for the timely identification of anomalous events and reducing the after-effects. Recent research uses deep learning techniques for faster and automatic detection of abnormal events from an enormous volume of surveillance videos. Reviewing the video anomaly detection system is very relevant and helps to promote future research in this area. The paper performs a comprehensive study of several video anomaly detection methods using deep learning techniques to detect and predict anomalous events. The paper also surveys various methods used for women’s safety. Various methodologies, datasets, and evaluation metrics for detecting video anomalies and comparisons are included.
    Keywords: deep learning; CNN; LSTM; GAN; autoencoder; women safety.
    DOI: 10.1504/IJCVR.2022.10051823
  • Robust autonomous detection and tracking of moving objects using hybrid tracking approach   Order a copy of this article
    by Mohamed Akli Bousta, Abdelkrim Nemra 
    Abstract: Detecting and tracking mobile objects in video is among the most prevalent and challenging tasks under realistic motion and climatic conditions such as image occlusion, fast camera movement and natural environmental changes (fog, rain, etc.). In this paper, we propose an improved autonomous visual detection and tracking algorithm, which uses the single shot detection algorithm for initialisation followed by an adaptive kernelised correlation filter (KCF) tracker and combined with a predictor-corrector smooth variable structure filter (SVSF) for target recovery and estimation. It is known that KCF tracker suffers from failure to target recovery after an occlusion and scale variation. To overcome these limitations, the optimal SVSF filter is combined with the KCF tracker in order to maintain suitable target estimation and update the KCF tracker when the target is lost. The obtained results illustrate that the proposed approach achieves the state-of-the-art performance on all tested datasets with many realistic scenarios with different attributes.
    Keywords: visual detection and tracking; single shot multi-box detector; SSD; kernelised correlation filter; KCF; smooth variable structure filter; SVSF.
    DOI: 10.1504/IJCVR.2022.10051959
  • Visual place representation and recognition from depth images   Order a copy of this article
    by Farah Ibelaiden, Slimane Larabi 
    Abstract: We propound a new visual positioning method that recognises the previously visited places whose descriptors are stored in a dataset that does not need updates. The descriptor of the unknown location is computed from a depth video acquired by surrounding the depth camera in the scene to build gradually the corresponding 3D map. From which the 2D map is derived and described geometrically based on the architectural features to constitute the query descriptor which is compared to database descriptors in order to deduce the location. The experiments show the efficiency and robustness of the proposed descriptor to scenery changes, light variations and appearance changes.
    Keywords: place recognition; depth image; architecture-based descriptor; three dimensional model; two dimensional map.
    DOI: 10.1504/IJCVR.2022.10052055
  • Collation of performance parameters on various machine learning algorithms for breast cancer discernment   Order a copy of this article
    by Mohan Kumar, Sunil Kumar Khatri, Masoud Mohammadian 
    Abstract: In clinical practices machine learning (ML) technology plays an important and rapid growing role as it is likely to help healthcare professionals making decisions and proposing new diagnoses. This research study aims in validating and comparing the performance of various ML models that can help in predicting breast cancer in women. Performance Parameters on various ML Algorithms for breast cancer dataset has been tested. The testing is performed on 116 participants from dataset. The features of dataset including insulin, glucose, resisting, adiponectin, homeostasis model assessment (HOMA), leptin, age, and index of obesity (MCP1). Many clinical features were measured like BMI. This dataset experimented with 11 classification algorithms such as logistic regression (LR), k-nearest neighbour (kNN), support vector machine (SVM), decision tree (DT), random forest (RF), Naive Baise and optimum ML algorithms, etc. The research work detected breast cancer from the published Coimbra breast cancer dataset (CBCD). Each classifier has been utilised for various kinds of parameters tuning and for prediction. These results suggested they could be taken as a very meaningful and useful pair of factors to forecast cancer.
    Keywords: machine learning; ML; optimal algorithms; prediction; breast cancer; support vector machine; SVM.
    DOI: 10.1504/IJCVR.2022.10052056
  • Copy move forgery detection by improved SIFT K-means algorithm   Order a copy of this article
    by Kavita Rathi, Parvinder Singh 
    Abstract: The copy move forgery due to its copied features from same image pose toughest challenge in image forgery detection. Key-point-based CMFD techniques outperform the block-based CMFD techniques. SIFT is most used key-point-based techniques. The present algorithm improves upon the SIFT algorithm with improvement in the various steps of the workflow by adding Laplace of Gaussian and multiplying it by the square of Gaussian kernel to make it real scale invariant, applying double level filtering at feature extraction and filtering by using g2NN and K-mean clustering. The results in the form of recall, precision, and F1 measure outperformed the state-of-art key-point-based CMFD techniques over multiple datasets.
    Keywords: copy move forgery; CMF; key-point-based CMFD; SIFT extractor; Laplace of Gaussians; LoGs; K-means clustering.
    DOI: 10.1504/IJCVR.2022.10052099
  • Deep multiple affinity model for proposal-free single instance segmentation   Order a copy of this article
    by Isah Charles Saidu, Lehel Csató 
    Abstract: We improve on an existing instance segmentation model with a probabilistic extension to the encoded neighbourhood branch model (Bailoni et al., 2020) - we call it multiple outputs encoded neighbourhood branch (mENB) model. The mENB predicts - for each voxel in a 3D volume, a distribution of central masks, where each mask represents affinities of its central voxel and the neighbouring voxels within the mask. When post-processed using a graph partition algorithm, these masks collectively delineates the boundaries of each instance of the target class within the input volume. Our algorithm is efficient due to active learning, more accurate and it is robust to Gaussian noise and model weights perturbations. We conducted two experiments: 1) the first experiment compared mask predictions of our technique against the baseline (Bailoni et al., 2020) using the CREMI 2016 neuron segmentation dataset and the results showed a more accurate masks predictions with uncertainty quantification; 2) in the second experiment, we tested segmented instances against the popular proposal-based mask-RCNN and the results showed that our technique yields better precision and intersection over union.
    Keywords: segmentation; active learning; affinity model; uncertainty quantification.
    DOI: 10.1504/IJCVR.2022.10052466
  • Generic object detection in real-time images under poorly visible conditions: a systematic literature review   Order a copy of this article
    by Perla Sunanda, Dwaram Kavitha 
    Abstract: The invention and usage of CNN in computer vision (CV) have made object detection an emerging task to locate and identify objects in an image or video is facing challenge with poorly visible conditions. This review aims to know the research gap for detecting generic objects, to identify the frameworks needed for working with real-time images, to see the importance of image enhancement and the need for designing nighttime datasets. A systematic literature search of studies were carried out in Scopus and IEEE databases to select object detection studies specifying generic object detection, real-time images, poorly visible or lowlight conditions, image enhancement pre-processing, the type of framework or algorithms needed, and the nighttime datasets. The time frame for the analysis was from January 2010 to the latest month of 2022. The study shows that there is an utmost need for detecting objects in nighttime or lowlight conditions.
    Keywords: computer vision; CV; object detection; obstacle detection; poorly visible; low light condition; nighttime.
    DOI: 10.1504/IJCVR.2023.10053141
  • Improving accuracy of arbitrary-shaped text detection using ResNet-152 backbone-based pixel aggregation network   Order a copy of this article
    by Suresh Shanmugasundaram, Natarajan Palaniappan 
    Abstract: CNN-based scene text detection in real-world applications is facing two major issues. The speed-accuracy trade-off is the first issue. Secondly, the arbitrary-shaped text instance is to be modelled. This work solves both issues by using ResNet-152 backbone-based pixel aggregation network. Since ResNet-152 provides better accuracy and performance, ResNet-152 is chosen for backbone. The proposed network has a high speed segmentation head and a learnable post-processing. Feature pyramid enhancement module (FPEM) and feature fusion module (FFM) constitute the segmentation head. For high quality segmentation, multi-level information is introduced by a cascadable U-shaped module that is nothing but FPEM. Different depth features are given by FPEM. FFM will collect these features into a final feature to segment the arbitrary shaped text. Using the predicted similar vectors aggregate precisely text pixels, pixel aggregation (PA) implements this post process which is learnable. The proposed ResNet-152 backbone-based PAN can attain an F-measure of 85.6% on Total-Text dataset.
    Keywords: arbitrary-shaped text detection; scene text detection; curve text detection; text segmentation; DNN.
    DOI: 10.1504/IJCVR.2023.10053234
  • An implementation of searchable video player   Order a copy of this article
    by Kitae Hwang, In Hwan Jung, Jae Moon Lee 
    Abstract: This paper introduces an Android app, SVPlayer that searches for scenes in a video. To search for scenes in a video, SVPlayer extracts voice from the video, converts it into text, and searches for words in the text. Voice is converted to text in units of ten seconds, and both voice and text are made into a timeline text. When the user enters a word, the word is searched and a timeline list of all scenes that contains the word is displayed, and the user can select a desired time from the list. The performance was variously evaluated through actual measurement, and as a result, it took only 2-3 minutes to create 10-second timeline text from a 20-minute video. SVPlayer processes this task in the background, so the user can jump directly to the desired scene in the middle of watching 2-3 minutes after starting to play the video.
    Keywords: voice; search scene; text.
    DOI: 10.1504/IJCVR.2023.10053409
  • Registration of CT and MR image in multi-resolution framework using embedded entropy and feature fusion   Order a copy of this article
    by Sunita Samant, Pradipta Kumar Nanda, Ashish Ghosh, Subhaluxmi Sahoo, Adya Kinkar Panda 
    Abstract: In this paper, a new scheme for the registration of brain CT and noisy MR images is proposed in a multi-resolution framework based on the notions of embedded entropy and nonlinear combination of the mutual information (MI) corresponding to Renyi’s and Tsallis entropy. Gabor and Sobel’s features are fused probabilistically and the registration is carried out in fused feature space. The weights for the fusion of the two distributions are obtained using the Bhattacharyya distance as the similarity measure. Registration parameter is obtained at different resolutions by maximising the combined mutual information obtained at different resolutions. The proposed algorithm is tested with the real patient data obtained from Retrospective Image Registration Evaluation (RIRE) database. It is found that the optimum registration parameter obtained at a low resolution of (64 x 64) has high accuracy. The proposed scheme exhibits improved performance as compared to other existing algorithms.
    Keywords: multi-modal image registration; embedded entropy; mutual information; fused feature space; multi-resolution.
    DOI: 10.1504/IJCVR.2023.10053410
  • Wireless underwater channel modelling for acoustic communication   Order a copy of this article
    by Sanapala Umamaheswararao, M.N.V.S.S. Kumar, R. Madhu 
    Abstract: Underwater channel modelling is very essential to establish acoustic communication underwater. It helps AUVs to navigate safely by avoiding collisions. But lot of complexities involved in acoustic communication as there will be reflections from the water surfaces. The main factors that are influencing underwater communication are transmission loss, noise, multipath, Doppler spread, and propagation delay. These parameters made available the acoustic channels bandwidth restricted and drastically subject to both range and frequency. The terrestrial communication parameters are not suitable to the underwater communication and hence require a dictated system design. The underwater channel modelling includes the finding of signal to noise ratio (SNR) at the receiver, transmission path loss and path gain for a particular path due to multipath propagation, and the noise level in the propagation path. An underwater channel communication model for sonar data is developed by considering the case of multipath propagation in shallow water.
    Keywords: channel modelling; multi-path propagation; path loss.
    DOI: 10.1504/IJCVR.2023.10054629