International Journal of Computational Vision and Robotics (44 papers in press)
Intelligent Fuzzy Logic based Sliding Mode Control Methodologies for Pick & Drop Operation of Robotic Manipulator
by Mohd Salim Qureshi, Pushpendra Singh, Pankaj Swarnkar
Abstract: The work shows a vigorous adaptive control methodology in tracking control of robot manipulators based on amalgamation of fuzzy control with ostensible sliding mode control (SMC). For robotics, the impetus of adopting SMC relies on its substantial features. Nonetheless, demerits of classical SMC, like effect of chattering and prior knowledge of uncertainty bounds can be extremely caustic. This article proposes different robust adaptive control techniques. Firstly, an Adaptive Fuzzy PI Sliding Mode Control (AF-PI-SMC) is proposed where fuzzy controller is the major tracking controller and the difference between ideal computational and fuzzy controller is compensated by the compensation controller. Uncertain bound of compensation controller is examined by estimation mechanism. Secondly, an Auto-Tuned Adaptive Fuzzy Sliding Mode Controller (AT-AFSMC) is proposed where the control gain is considered as individual vector and is adjusted by an adaptive SISO fuzzy system. Here, control gain ? is tuned online which makes the controller adaptive. Mathematical analysis showcases that the controllers in tracking robot manipulator in the presence of uncertainties has global asymptotic stability in Lyapunov sense. Finally, proposed controllers are tested on a 2-Degree of Freedom (DOF) robot manipulator with real time digital simulator Opal-RT (OP-4500). The experimental results express superiority of the proposed control techniques in presence of structured and unstructured uncertainties.
Keywords: Robotic manipulator; Sliding mode control; Fuzzy sliding mode control; Pick & drop operation.
Kidney image classification using transfer learning with convolutional neural network
by Priyanka , Dharmender Kumar
Abstract: For abdominal studies, one of the most widely used diagnostic methods is ultrasound imaging. Several chronic kidney diseases (CKDs) such as kidney stone, cystic kidney, and hydronephrosis are present in the human kidney. These CKDs, later on, lead to the development of a number of severe diseases particularly heart diseases, pulmonary attacks, cardiomyopathy, etc. So, early detection of CKDs is highly desirable in clinical practices as it can save hundreds of lives. Nowadays, the main focus of researchers is to develop automatic disease detection methods avoiding the need for human interaction. The study of deep learning models is playing a critical role in various applications of healthcare not only due to their fast and accurate results but also minimal manual interference is required in these methods. In this paper, two approaches are proposed for the detection of CKDs in ultrasound kidney images. The first one is a conventional approach that uses GA optimised neural network (GAONN) as classifier whereas in other approach convolution neural network model such as AlexNet is used for automatic detection of diseases. AlexNet is trained using the transfer learning process. Experimental results show that CNN performs better than GA optimised neural network in classifying kidney images.
Keywords: convolution neural network; CNN; GA optimised neural network; transfer learning; accuracy; principal component analysis; grey level co-occurance matrix.
A novel restricted Boltzmann machine-based temporal-spatial correlation method for student behaviour recognition in depth video
by Fan Zhang
Abstract: Human behaviour recognition is an important research hotspot in the field of artificial intelligence. Current behaviour recognition methods have low recognition accuracy under different viewing angles, therefore, this paper proposes a novel restricted Boltzmann machine (RBM)-based temporal-spatial correlation method for student behaviour recognition in depth video. The RBM is used to map the human behaviour from different viewing angles to the high-dimensional space. The time level pooling function is applied in the time series activated by each neuron to realise the encoding of the video time sub-series. Finally, behaviour recognition and classification experiments are conducted on different public datasets and real classroom student behaviour datasets with other methods. The results show that the proposed method improves the accuracy of depth video recognition under different viewing angles and has good generalisation performance. The data analysis of abnormal behaviour in class can play an auxiliary role in dynamic classroom management.
Keywords: restricted Boltzmann machine; RBM; student behaviour recognition; temporal-spatial correlation; Fourier time pyramid algorithm.
Plant leaf disease classification using deep neural network
by N. Kasthuri, T. Meera Devi, Arivazhagan T. Shangar, R. Yashwin, J.S. Shabhareesh
Abstract: Agriculture is the backbone of Indian economy. Most of the people living in rural areas depend on agriculture for their livelihood. Nevertheless, the farmers are facing a lot of difficulties in crop production due to climatic change. In addition, diseases in plants affect the production of crops drastically. Presently, the farmers identify the plant diseases by visual inspection which, in turn, requires an experts help and it is a time consuming task. Hence, in this paper, deep learning networks are used to identify different types of diseases in the leaves of different plants. The model is trained with 45,562 images and validated with 8,049 images belonging to 17 categories of diseases. It is fine-tuned with the hyperparameters such as learning rate, epochs, batch size and input image size and then tested with 9,469 images which yield a total classification accuracy of 96.8%.
Keywords: multiclass classification; convolutional neural network; CNN model; model parameters; activation function; support vector machine; SVM classifier; transfer learning; plant leaf diseases; hyperparameters; performance metric.
Convolutional neural networks for obstacle detection on the road and driving assistance
by Ramzi Mosbah, Larbi Guezouli
Abstract: Generally, a driver have moments of inattention, that can cause considerable damage. To deal with this issue, we have to detect obstacles on the road automatically. To do that, several challenges appear. Firstly, we have to locate the region of interest which is the road part in the frame, than we have to detect objects inside the region of interest. In this work we propose an improved driver assistance system using a camera on the front of the car. Acquired images from this camera feed our system. In the frames to be processed, we reduce the region of interest to the area of the road. Obstacles on the road are sought in this region of interest. At the same time, we take care of the driver by detecting whether he is drowsy. Experimental results were evaluated using KITTI Vision Benchmark Suite and a short videos recorded on streets in Batna.
Keywords: obstacle detection; image edge detection; driving assistance; object recognition; convolutional neural networks.
Design of a solar-powered mobile manipulator using fuzzy logic controller of agriculture application
by Fradina Septiarini, Tresna Dewi, Rusdianasari
Abstract: This paper shows the feasibility of applying a mobile manipulator powered by solar energy as a harvesting robot in agriculture. The designs are started by designing the robots mechanics and the mobile manipulator control. The motion of the mobile base and arm robot manipulator are approached using FLC, whose inputs are based on target detection using image processing image segmentation. FLC design is also intended to predict the robots charging source based on the light sensor attached to the charging system. The robot can directly take power from the solar ray during a sunny day or take it from the battery during a cloudy day. The robot motion is simulated using MobotSim to show how the robot moves from one spot to another, harvesting the agricultural product. The simulations conducted in this study show that the solar-powered mobile manipulator is possible are applied in agriculture as a harvesting robot.
Keywords: agriculture; fuzzy logic; mobile manipulator; robot vision; solar energy.
Facial expression recognition based on convolutional block attention module and multi-feature fusion
by Man Jiang, Shoulin Yin
Abstract: In this paper, we focus on the research of facial expression recognition. A novel convolutional block attention module and multi-feature fusion method are proposed for facial expression recognition. The local feature clustering loss function is proposed, which can reduce the difference between the same class of images and enlarge the difference between different classes of images in the training process. The convolutional block attention module is adopted to better express facial expressions in local areas with rich expressions. Experimental results show that the proposed method can effectively recognise different expressions on the RAF dataset and CK+ dataset compared with other state-of-the-art methods.
Keywords: facial expression recognition; convolutional block attention module; CBAM; multi-feature fusion; local feature clustering; LFC.
Multi-agents system for breast tumour detection in mammography by deep learning pre-processing and watershed segmentation
by Hayet Saadi, Hayet Farida Merouani, Ahlem Melouah, Zahia Guessoum, Saida Lemnadjlia, Nacereddine Boukabach
Abstract: Mammography is the most used process for females to diagnosis and screening breast cancer. In this paper, we presented an enhanced automatic watershed segmentation for breast tumour detection and segmentation reinforced with a group of interactive agents. First, we started by a pre-processing based on deep learning (DL), where a convolution neural network (CNN) is applied, to classify the breast density by AlexNet architecture. Second, classic watershed segmentation was applied on these images. Afterward, a multi-agents system (MASs) was introduced. The information within pixels, regions and breast density were explored, to create a region of interest (ROI), to emerge the MAS segmentation. Experimental results were promising in term of accuracy (ACC), with an overall of (97.18%) over three datasets, Mammographic Image Analysis Society (MIAS), INBreast, and a local dataset called Database of Digital Mammograms of Annaba (DDMA). In some cases, our approach was able to detect accurately breast calcification.
Keywords: mammography; tumour detection; watershed segmentation; multi-agent systems; multi-agents system; deep learning; convolution neural network; AlexNet architecture; pre-processing; breast density; computational vision; computer-aided diagnosis systems.
Obstacle detection technique to solve poor texture appearance of the obstacle by categorising image's region using cues from expansion of feature points for small UAV
by Muhammad Faiz Ramli, Syariful Syafiq Shamsudin
Abstract: Achieving a reliable obstacle detection system for small unmanned aerial vehicle (UAV) is very challenging due to its size and weight constraints. Prior works tend to employ the vision sensor as main detection sensor but resulting to high dependency on texture appearance while not having distance sensing capabilities. Besides, most of wide spectrum range sensors are heavy and expensive. The contribution of this work is on different based-sensor integration technique to increase reliability of detection. Secondly, developed method to create trusted avoidance path by categorising the region in environment into two regions, which are the obstacle region and free region. Cues from expansion of the features points are used to extract the depth information of the environment and classify the region in the image frame. The results show that the proposed system able to handle multiple obstacle and create safe path regardless of the texture and size of the obstacle.
Keywords: obstacle detection; feature points; region classification; safe avoidance path; vision-based-sensor; range-based-sensor; speeded up robust features; SURF; convex hull; depth perception.
Bilateral filter-oriented multi-scale CNN fusion model for single image dehazing
by Jiangjiang Li, Jianjun Zhu, Huili Chen
Abstract: This paper proposes a bilateral filter-oriented multi-scale CNN fusion model for single image dehazing. A multi-scale CNN model with low frequency and high frequency dehazing sub-network is designed. First, the haze image is decomposed by bilateral filter. The low and high frequency of haze image are obtained. Second, the map relationship between the high/low frequency and the high/low frequency transmittance is researched by the designed network model. Third, the high and low frequency transmittance obtained from the model is fused to obtain the scene transmittance map corresponding to the original haze image. Finally, according to the atmospheric scattering model, the haze image is restored to the clear image without haze, and the haze image data set is used to train and test the model. The experiment results show that the proposed method can achieve better dehazing effect in both subjective and objective evaluation.
Keywords: single image dehazing; bilateral filter; multi-scale CNN fusion; map relationship.
A modified Coye algorithm for retinal vessel segmentation
by Sakambhari Mahapatra, Uma Ranjan Jena, Sonali Dash, S. Agrawal
Abstract: Eyes are the best predictors of numerous disorders including glaucoma, diabetic retinopathy, hypertension, and stroke, according to a scientific study. An ophthalmologist can learn about the problems by looking at the segmented retinal blood vessel network. The goal of this study is to offer ophthalmologists with reliable segmented retinal blood vessels to help them pinpoint the issue. This work put forwards an automated method of vessel extraction by incorporating curvelet-based enhancement with the Coye algorithm. Further, the segmentation performance is fine-tuned by embodying a pair of complementary gamma functions (PCGF) for contrast improvement. The suggested approach is evaluated on DRIVE and STARE databases and shows outstanding results as compared to state-of-the-art algorithms.
Keywords: curvelet transform; Coye algorithm; gamma transform; pair of complementary gamma function; PCGF; vessel segmentation.
Automated inspection of spur gears using machine vision approach
by Ketaki Joshi, Bhushan Patil
Abstract: The paper presents a machine-vision-based system for automated inspection of standard spur gears. Image processing algorithms are used for the measurement of important gear dimensions such as radii of addendum circle, dedendum circle and pitch circle, module, number of teeth, pressure angle, tooth thickness, circular pitch, radial runout and tooth alignment error. Deviations from theoretical values according to gear standards are computed and a decision is made regarding acceptance/rejection. The performance of machine vision inspection system is evaluated in terms of its accuracy and precision. Accuracy is based on deviation of machine vision values from those obtained using traditional metrology instruments and gear standards. Precision is measured using partial gauge R&R study. The results obtained for gear images taken by different operators using different imaging devices are repeatable, reproducible and in good agreement with the true values. The results indicate that the machine vision approach is accurate and precise.
Keywords: machine vision; inspection; spur gear; image processing; accuracy; precision; gauge R&R.
The recognition of 3-phase power quality events using optimal feature selection and random forest classifier
by Laxmipriya Samal, Hemanta Kumar Palo, Badri Narayan Sahu
Abstract: This article proposes a novel feature vector by combining the K-means Apriori feature selection algorithm (KAFS) and statistical technique to classify 3-phase Power Quality Disturbance (PQD) events. While the K-means algorithm has clustered the raw signals, the Apriori algorithm has been capable to fetch the desired discriminative features of the chosen PQD events. Further, these discriminative features extracted have been utilized to compute nine-statistical parameters. The reliability of the novel feature vector has been measured in classifying the 3-phase PQD events with similar statistical parameters obtained from the raw PQD samples. Finally, the ability of the Short-time Fourier Transform (STFT) as a time-frequency tool has been evaluated using the KAFS algorithms for the said task. The Random Forest (RF) classifier is chosen to validate the efficacy of the proposed feature vectors. The novel optimized feature vectors using the KAFS have indeed enhanced the recognition accuracy as revealed from our results.
Keywords: power quality; feature selection; classification; recognition accuracy; random forest algorithm.
IoT-based real time clinical healthcare system for aging and underprivileged areas
by Muhammad Shakir, Shahid Karim, Muhammad Imran Saeed, Halar Mustafa, Shahzor Memon, Syed Abbad Kazmi
Abstract: In past decayed with ages, numerous lives are in danger frequently as the patients are not timely and properly operated. Furthermore, constant parameter cannot be accurately estimated at home such as in the hospitals. In underprivileged areas of Pakistan, the caretakers do not have much instinct and right devices to follow the patients situation. This device will come in handy since it is all in one portable, small in size, and easy operable which will be very helpful for guardians. The specialists/caretaker can see real time digital outcomes on an android application which are converted for stability into analogue waveforms and it will also be sent to Google Firebase cloud database which can be accessed worldwide. The framework will likewise produce an alert warning when the outcome goes beyond normal. Our framework is valuable for checking the wellbeing arrangement of each individual through effectively appending the gadget and recording it.
Keywords: internet of things; monitoring system; mobile application; sensors; cloud database.
Identifying optimised speaker identification model using hybrid GRU-CNN feature extraction technique
by Md. Iftekharul Alam Efat, Md. Shazzad Hossain, Shuvra Aditya, Jahanggir Hossain Setu, K.M. Imtiaz-Ud-Din
Abstract: Extracting vigorous and discriminative features and selecting an appropriate classifier model to identify speakers from voice clips are challenging tasks. Thus, we considered signal processing techniques and deep neural networks for feature extraction along with state-of-art machine-learning models as classifiers. Also, we introduced a hybrid Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) as a novel feature extractor for optimizing the subspace loss to extract the best feature vector. Additionally, space-time is contemplated as a computational parameter for finding the optimal speaker identification pipeline. Later, we have inspected the pipeline in a large-scale VoxCeleb dataset comprising 6,000 real world speakers with multiple voices achieving GRU-CNN+R-CNN for the highest accuracy and F1-score as well as GRU-CNN+CNN for maximum precision and LPC+KNN for the highest recall. Also, LPCC+R-CNN and MFCC+R-CNN are accomplished as optimal in terms of memory usage and time respectively.
Keywords: computational complexity; deep learning; feature extraction; speaker identification; VoxCeleb dataset.
Application of digit and speech recognition in food delivery robot
by Low Chun Yin, Sarah Atifah Saruchi, Ong Hong Tze, Chew Ying Xin, Chong Han Wei, Jonathan Lam Lit Seng
Abstract: In COVID-19 quarantine centres, physical human interaction is limited to prevent the spread of the virus. Food delivery robots have been seen replacing humans to perform the task perfectly. However, there is a limit in the tasks that a single robot can handle. This paper designs an efficient and intelligent food delivery robot that acts as a messenger that recognises speech from patients and humans in the background can act on them without any physical interaction. The workload on the microcontroller is greatly reduced when a task like face recognition is replaced with digit recognition as patients are tagged with numbers. The design of the robot is also modular and scalable for bigger centres, introducing the capability to expand when necessary. The future of robotic delivery relies on the efficiency and scalability of multiple systems.
Keywords: speech recognition; digit recognition; image processing; computer vision; robotics.
A systematic study of intelligent autism spectrum disorder detector
by Indu Jamwal, Deepti Malhotra, Mehak Mengi
Abstract: Autism spectrum disorder also known as ASD is a complex developmental condition particularly related to the nervous system that affects peoples communication, social behaviour, and underlying social knowledge. The problem of autism is not common to a particular age group but it has been ascending rapidly among all age groups. Earlier prediction of this developmental disorder will grandly help in sustentation of the subjects physical as well as mental soundness. With more advancement in technology, early detection of certain neurological disorders now becomes reality. Mostly machine-learning methods are used for the analysis of ASD. This research paper presents the systematic review of existing AI models for ASD detection based on screening methods, eye movements, and MRI data, and based on limitations of existing studies, the authors have proposed an ASD_esfMRI for earlier detection of autism which can be implemented in future by using eye gaze data and MRI data collectively.
Keywords: autism spectrum disorder; ASD; machine learning; magnetic resonance imaging; MRI; structural MRI; functional MRI; neurological; detection; prediction.
An improved sclera recognition using kernel entropy component analysis method
by B.S. Harish, M.S. Maheshan, C.K. Roopa, R. Kasthuri Rangan
Abstract: Among the various biometric traits that exist in the human body, sclera is considered to be prominent because of its unique characteristics. In this paper, we propose an improved sclera recognition method using kernel entropy component analysis (KECA). The main objective of this paper is to integrate kernel-based methods with entropy to choose the best principal components. Further, the resulting top principal components are given a symbolic interval valued representation. To evaluate the efficiency of the new proposed representation method, we conducted extensive experimentation using various classifiers. The proposed method has achieved over 5.09% of hike in the accuracy result with 50:50 split and over 10.69% of hike with 60:40 split, respectively. The obtained result of the proposed method is effective and feasible for sclera recognition.
Keywords: sclera; recognition; kernel entropy; symbolic representation.
Irregularities recognition system for automotive pieces
by Ignacio Algredo-Badillo, German Portillo-García, Kelsey A. Ramírez-Gutierrez, Luis A. Morales-Rosales
Abstract: The automotive industry is a growing sector in Mexico that requires many production processes. One of the most important is auto parts manufacturing with high-quality standards to avoid economic losses. Hence, the failure detection of pieces must be carried out in the early process, discarding those that do not reach the desired quality. This paper deals with an object recognition system to automatically find failures in circular automotive pieces. This is an open problem in the automobile assembly process to guarantee product quality. We detect imperfections (above 3.5 mm) on small pieces, such as scratches and dents on edges, by using an image processing stage, where no training is included, with a low-cost camera. The average processing time to detect failures is 2.7 seconds, which allows us to examine more pieces in a short time compared with other works and with manual inspections carried out by human experts. The proposed system reaches an accuracy of 98% and is implemented in the LabVIEW tool.
Keywords: automotive industry; vehicle pieces; defects detection; Hough transform; irregularities recognition.
Local directional double ternary coding pattern for facial expression recognition
by Chebah Ouafa, Laskri Mohamed Tayeb
Abstract: This paper presents a novel texture descriptor, the local directional double ternary coding pattern (LDDTCP) that combines the directional information from LDP and the ternary description from LTP for representing facial expression. The proposed LDDTCP operator encodes the image texture by computing the edge and line responses values using the eight directions based Frei-Chen masks. To achieve robustness, the obtained eight Frei-Chen masks are partitioned into two groups according to their directions. After calculating the average of each group, we assign three discrimination levels to each pixel based on the edge responses values in the first group and the line response values in the second group, we obtain LDDTCP-1 and LDDTCP-2 codes, respectively. The last feature descriptor vector LDDTCP is formed by concatenation both LDDTCP-1 and LDDTCP-2 histograms. Experimental results using the CK and JAFFE database show that the LDDTCP descriptor achieves superior recognition performance compared to some existing local descriptor methods.
Keywords: facial expression recognition; human face; appearance descriptor; geometry descriptor; local binary pattern; LBP; local directional pattern; LDP; local ternary pattern; LTP; support vector machine; SVM.
Versatile formation patterns for cooperative target tracking using ground vehicles
by Lili Ma
Abstract: In this paper, we investigate the cooperative target tracking problem using a group of autonomous mobile robots. By introducing a tracking control component to existing pursuit-based formation schemes, it is possible to achieve simultaneous tracking and formation in versatile concentric formations. Balanced circular formations can now be achieved with a prescribed formation radius. Elliptical formations with a variety of orientations and shapes can be achieved by applying a transformation matrix. To address the practical issue of obstacle avoidance, a repellent vector field technique is used, which prevents agents from approaching obstacles. Tracking, formation, and avoidance are combined to provide a more comprehensive solution for cooperative target tracking. The models considered include both single-integrator and double-integrator robots. MATLAB simulations are used to demonstrate the effectiveness of the proposed schemes.
Keywords: Cooperative target tracking; balanced circular formation; prescribed formation radius; elliptical formation; obstacle avoidance.
Generalised homomorphic and root filtering in 2D-nonseparable discrete linear canonical transform domains in the image enhancement applications
by Shobha Sharma, Tarun Varma
Abstract: In this paper, the generalised homomorphic filtering (HF) and root filtering (RF) techniques are extended to 2D-nonseparable discrete linear canonical transform (2D-NsDLCT) domains in the low light image enhancement applications. The objective is to improve the visual appearance for the benefit of further processing. The input image is first transformed into 2D-NsDLCT domains in the proposed methodology, and then HF or RF is applied to it. The filtered image is inverse transformed to the spatial domain. The advantage of the proposed technique is based on the fact that the 2D-NsDLCT domains provide many free parameters that can be varied to improve the visual quality of the given images. We have compared the simulation results of the proposed methods with the special cases of 2D-NsDLCT and state-of-the-art methods. The computed quality metrics reveal that the output images of the proposed methods have better quality than the competing techniques.
Keywords: NsLCT; homomorphic filtering; root filtering; image enhancement.
Real-time sign language recognition and speech conversion using VGG16
by Dona Mary Cherian, Jincy J. Fernandez
Abstract: Sign language is used to communicate non-verbally by the deaf and mute community. This method consists of hand gestures or sign for representing the language. Hand gesture recognition extends human-computer interaction (HCI) more convenient and flexible to society. Therefore, it is important to classify each character correctly without error. In this time, online interpreters are available for translating the sign language or gestures to corresponding common language and vice versa. But it requires an expert or intermediate who can translate in both ways. Sensors are also used with hand gloves for tracking hand articulates. Thus, the communication for the deaf/dumb community and the rest has become difficult and costly. This paper mainly describes the classification of sign language hand gestures to its corresponding alphabets in text form using deep neural networks. After classification the text is converted to speech which helps the visually challenged people to understand the sign. The method will classify real-time images captured using a desktop camera. The accuracy of the model obtained using convolution neural network was 97%.
Keywords: American sign language; ASL; convolutional neural network; CNN; visual geometry group 16; VGG16.
A comparative study between convolution neural networks and multi-layer perceptron networks for hand-written digits recognition
by Aaron Rasheed Rababaah
Abstract: This paper presents an investigation that aims at comparing deep learning (DL) and traditional artificial neural networks (ANNs) in the application of hand-written digits recognition (HDR). In our study, convolution neural networks (CNNs) are a representative model for the DL models and the multi-layer perceptron (MLP) is a representative model for ANN models. The Two models of MLP and CNN were implemented using MATLAB development environment and tested using a publically available image database that consists of over 20,000 samples from all ten hand-written digits each of which is 24 x 24 pixels. The experimental results showed that the CNN model was superior to the MLP model with an average classification accuracy of 95.14% and 89.74% respectively. Furthermore, the CNN model was observed to have better performance stability and better execution efficiency as the MLP model requires human intervention to handcraft and pre-process the features of the digit patterns.
Keywords: hand-written digit; pattern recognition; multi-layer perceptron; MLP; deep learning; convolution neural networks; CNNs; comparative study.
Paddy variety identification from field crop images using deep learning techniques
by Naveen N. Malvade, Rajesh Yakkundimath, Girish B. Saunshi, Mahantesh C. Elemmi
Abstract: On-field identification of paddy varieties provides actionable information to farmers and policymakers in many aspects of crop handling and management practices. In this paper, three transfer learning pre-trained models namely ResNet-50, EfficientNet-B7, and CapsNet are presented to effectively classify the field crop images of 15 different paddy varieties captured during the booting plant growth stage. The experiments using the CapsNet model with an image dataset comprising 60,000 labelled images show the significant performance with the testing accuracy of 92.96%, and validation accuracy of 95%. The ResNet-50 and EfficientNet-B7 models have yielded the average validation accuracies of 85% and 90%, respectively. The CapsNet model has achieved both higher accuracy and better computational efficiency over the considered deep learning classification models on the held out paddy field crop image dataset.
Keywords: paddy variety identification; field crop image classification; deep convolutional neural networks; DCNN; transfer learning; CapsNet; ResNet-50; EfficientNet-B7.
An experimental evaluation of feature detectors and descriptors for visual SLAM
by Taihú Pire, Hernán Gonzalez, Emiliano Santano, Lucas Terissi, Javier Civera
Abstract: Visual SLAM has, in general, a high computational footprint. Its potential applications such as augmented reality (AR), virtual reality (VR) and robotics have hard real-time constraints and limited computational resources. Reducing the cost of visual SLAM systems is hence essential to equip small robots and AR/VR devices with such technology. Feature extraction, description and matching is at the core of feature-based SLAM systems, having a direct impact in their performance. This work presents a thorough experimental analysis of feature detectors, descriptors and matchers for visual SLAM, focusing on their cost and their effect in the estimation accuracy. We also run our visual SLAM system in an embedded platform (Odroid-XU4) and show the effect of using such limited hardware in the accuracy and cost of the system. Finally, in order to facilitate future research, our evaluation pipeline is made publicly available.
Keywords: visual SLAM; local image feature; descriptor extractor; keypoint detector; performance evaluation.
Path planning of mobile manipulator for navigation and object clean-up
by Aaditya Asit Saraiya, Sangram Keshari Das, B.K. Raut, V. Kalaichelvi
Abstract: Industry and warehouses have been paying lots of attention to mobile manipulator-based path planner problems. This paper focuses on multi-target object clean-up operations using vision sensor which has ample industrial applications. In this work a vision-based path planning approach has been implemented using A* algorithm in order to avoid the obstacles and reach the goal location using the shortest path. The algorithm was developed to classify objects in the workspace as handleable/non-handleable from real-time measurements. In case of multi-object clean-up operations, a priority is set depending on the scenario and a weighted cost function approach is proposed. A series of simulation experiments are conducted to test the effectiveness of the proposed algorithm. The entire workflow of the mobile manipulation-based path planner is demonstrated using various scenarios. This problem has lot of relevance in real world.
Keywords: vision-based navigation; mobile manipulation-based path planner; object detection; A* path planning algorithm; OpenCV; ROS framework.
Research on the online parameter identification method of train driving dynamic model
by Dandan Liu, Xiangxian Chen, Zhonghao Guo, Jiaxi Yuan, Shoulin Yin
Abstract: Automatic train operation (ATO) system is an important driving control system for train operation, which adjusts traction or braking force in real time according to different operating environments. As an important part of the ATO system, the train dynamic model determines the tracking accuracy of the train to the target speed. Based on the force analysis of the actual train operation, the single-particle dynamic models of train operation were established. Considering the high efficiency of the single-particle model in online identification, the single-particle train model is applied to the actual parameter identification. Firstly, the second-order single particle model is established, and three identification methods and two sets of data are compared and analysed. The auxiliary model and the recursive least square method with variable forgetting factor (AM-VFF-RLS) identification method have good performance. On this basis, a third-order-single-particle model is established. Through the analysis of the identification results, it is found that the model can improve the identification accuracy while ensuring the efficiency.
Keywords: train dynamic model; online identification; AM-VFF-RLS; ATO system.
Energy-aware automatic video annotation tool for autonomous vehicle
by N.S. Manikandan, K. Ganesan
Abstract: In a self-driving car, real-time video obtained from the camera sensors is analysed using various scene understanding algorithmic modules (object detection, object classification, lane detection and object tracking). In this paper, we propose an annotation tool that uses deep learning techniques for each of the four modules mentioned above, and the best ones are chosen based on suitable metrics. Our tool is 83% accurate when compared with a human annotator. We considered a video with 530 frames of resolution 1,035 x 1,800 pixels. Our proposed tool consumed 43 minutes of computation with 36.73 g of CO2 emission in a CPU-based system and 2.58 minutes of computation with 7.75 g of CO2 emission in a GPU-based system to process all four modules. But the same video took nearly 3,060 minutes of computational usage with 2.56 kg of CO2 emission for one human annotator to narrate the scene using a normal computer.
Keywords: automatic annotation; deep learning; object classification; object detection; lane detection; object tracking.
Image retrieval by using texture and shape correlated hand crafted features
by Suresh Kumar Kanaparthi, U.S.N. Raju
Abstract: Content-based image retrieval (CBIR) has become one of the trending areas of research in computer vision. In this paper, consonance on hue, saturation, and intensity is used by applying inter-channel voting between them. Diagonally symmetric pattern (DSP) from the intensity component of the image is computed. The grey level co-occurrence matrix (GLCM) is applied to DSP to extract texture features. Histogram of oriented gradients (HOG) features is used to extract the shape information. All three features are concatenated. To evaluate the efficiency of our method, five performance measures are calculated, i.e., average precision rate (APR), average recall rate (ARR), F-measure, average normalised modified retrieval rank (ANMRR) and total minimum retrieval epoch (TMRE). Corel-1K, Corel-5K, Corel-10K, VisTex, STex, and colour Brodatz are used. The experimental results show an improvement in 100% cases for Corel-1K dataset, 80% cases for Corel-5k and 80% cases for each of the three texture datasets.
Keywords: content-based image retrieval; CBIR; interchannel voting; texture; hand crafted features; shape.
An automated system to detect crop diseases using deep learning
by Purushottam Sharma, Manoj Kumar, Richa Sharma, Shashi Bhushan, Sunil Gupta
Abstract: Food is one of the necessities for a human being to survive. Moreover, since the population is increasing with each passing day, growing sufficient crops to feed such a vast population becomes evident. Also, the countrys economy is based on agricultural production as well. However, there is a significant threat to agricultural crop production in todays times, and hence the analysis of crop diseases becomes essential. Thus, the automatic identification and analysis of plant diseases are highly desired in agricultural information. The main objective of the research to develop an optimised approach for system automation to detect crop diseases. Here we proposed an approach for building an automated system that primarily detects diseases using leaf images and some other features like recommending the remedy for that disease. We created a model using a convolution neural network algorithm and used the transfer learning approach using Inception v3 and ResNet 50 model. Further, we used this model and collected some data for remedies for the diseased classes and added that feature to our system.
Keywords: convolutional neural network; CNN; leaf image; transfer learning; crop disease; Inception v3; ResNet 50.
An improved multi-criteria-based feature selection approach for detection of coronary artery disease in machine learning paradigm
by Bikesh Kumar Singh, Sonali Dutta, Poonam Chand, Khilesh Kumar, Sumit Kumar Banchhor
Abstract: This paper presents an accurate approach for the detection of coronary artery disease (CAD) using an improved multi-criteria feature selection (IMCFS) approach in a machine learning (ML)-based paradigm. This study uses the Z-Alizadeh Sani dataset of CAD, consisting of 303 patients with 56 different attributes. The proposed IMCFS-based approach uses seven different traditional feature selection techniques. For classification, the support vector machine is used with four different kernel functions and is evaluated using three cross-validation protocols. Lastly, performance is evaluated using five measures. The proposed IMCFS-based approach using the 30 most relevant features outperforms all other traditional feature selection techniques and achieved the highest classification accuracy, sensitivity, specificity, the area under receiver operating characteristics, and Mathews correlation coefficient of 91.9%, 95.7%, 82.1%, 88.9% and 79.7%, respectively. The proposed IMCFS-based approach is an entirely reliable, automated, and highly accurate ML tool for detecting CAD.
Keywords: coronary artery disease; CAD; multi-criteria feature selection; machine learning; classification; support vector machine; SVM; kernel functions; cross-validation; accurate; automated; reliable.
A secure identity and access management system for decentralising user data using blockchain
by Tripti Rathee, Parvinder Singh
Abstract: The arrival of blockchain technology has made a revolution in the field of cybersecurity. Since on the internet, almost every interaction involves some digital identity, therefore the ways needed to protect the digital identity over the internet becomes stronger. In this paper, a blockchain based identity and access management system 'MedSecureChain' has been implemented on a medical ecosystem. An OAuth-based authentication mechanism is used to provide delegated access, so as to protect and provide the control over user data. Further, a document verification system using interplanetary file system (IPFS) and blockchain technology has been proposed. IPFS is used to store the users' data in decentralised manner thus reducing the size of the data. The proposed system provides security and privacy to the identity of the user by using smart contracts. The use of blockchain helps in decentralising the system thus eliminating the control of single authority over the data.
Keywords: access control; security; blockchain; distributed ledger; identity management.
Quantum neural network application for exudate affected retinal image patch identification
by Mahua Nandy Pal, Minakshi Banerjee, Ankit Sarkar
Abstract: In the field of retinal disease identification, deep neural networks are exhaustively used. But the efficiency of quantum neural network in the field is not yet explored. Recently, quantum neural network achieved attention of researchers as it is required to explore if quantum network has any scope in the relevant field in terms of resource utilisation and decision-making during network learning. In this paper, efficiency of a simple quantum network model is experimented. In the present scenario, quantum classical models are unable to handle more than few qubits. Experimentally, it is found that the quantum neural network is quite efficient in representing the features of exudate affected retinal image patches. The accuracy of quantum neural net model is 84.28%. The accuracies are 51.80% and 88% respectively with comparable deep neural net and convolutional neural net models.
Keywords: quantum neural network model; deep neural network model; deep convolutional neural network model; retinal fundus image; exudates; classification; TensorFlow quantum; TFQ; parameterised quantum circuit; PQC.
New descriptors' combination for 3D mesh correspondence and retrieval
by Roaa Soloh, Abdallah El Chakik, Hassan Alabboud, Ahmad Shahin, Adnan Yassine
Abstract: 3D models that are widely used nowadays are mostly represented by meshes or point clouds. These models are appearing in many fields like computer vision, informatics, engineering, as well as medicine. In this paper, we aim to find a superior one-to-one correspondence between 3D models in order to obtain optimal matching and retrieval. To do so, we detect feature points using the well known 3D Harris detector, followed by proposing a combination of local shape descriptors to form a compact feature vector for the keypoints extracted that consist of: Gaussian curvature, curvature index, and shape index. Lastly we model the matching problem as combinatorial problem solved using brute-force approach, and Hungarian one, comparing the efficiency between them. Our proposed combination of descriptors show good performance and compromise numerical values specifically using the Hungarian algorithm where its results demonstrate our proposed approach. Moreover, cosine similarity is used behind the retrieval system between these features of each pairs in the database, and our system gives accurate retrieval for several models, and acceptable percentages for others.
Keywords: 3D meshes; feature detection and extraction; matching problems; shape retrieval; Hungarian algorithm; brute-force algorithm.
A deep hybrid model for advertisements detection in broadcast TV and radio content
by Abdesalam Amrane, Abdelkrim Meziane, Abdelmounaam Rezgui, Abdelhamid Lebal
Abstract: Media monitoring is essential for measuring the influence of companies over their consumers. It consists of building, reporting, and providing a full view of media sources in near real-time allowing to synthesise the data. Advertisement detection and classification in electronic media (TV and radio) is an essential part of a media monitoring system and is very useful for companies that work in a competitive environment. Advertisement detection entails many difficulties including unbalanced data, misclassification caused by outliers, and variation in loudness levels between TV/radio channels. To overcome these challenges, we propose a deep hybrid model for advertisement detection (DHM-ADS). We conduct several experiments by combining different methods: deep neural network models (ANN, CNN, and RNN) with dynamic time warping and multi-level deep neural networks such as autoencoders. The evaluation shows that the ANN classifier combined with an autoencoder gives the best result for advertisement detection in TV/radio broadcast even compared to the conventional framework 'DejaVu'.
Keywords: advertisement detection; media monitoring; audio outliers removal; deep learning; autoencoder.
Ship identification from SAR image using novel deep learning method with reduced false prediction
by J. Anil Raj, Sumam Mary Idicula, Binu Paul
Abstract: Many research works using deep learning techniques for automatic vessel (or ship) detection from SAR images has good detection accuracy. But the main problem in these methods is false detection, primarily due to speckle presence. Therefore, we propose a novel pre-processing and deep learning model for vessel detection to address this problem. First, generate a three-channelled image (SarNeDe image) from a greyscale SAR image. Then, this image is used to train the model to predict the vessel's position in the SAR image. We studied the performance of different models using the SarNeDe technique and designed a lightweight model with the highest detection accuracy. We experimented on the public SAR ship detection dataset (SSDD) and dataset of ship detection for deep learning under complex backgrounds (SDCD) to validate the proposed method's feasibility. The experimental results indicated that our proposed method's vessel detection accuracy has increased with a reduced false detection percentage.
Keywords: convolutional neural network; image processing; ship detection; SAR target detection.
Deep learning solution for machine vision problem of vehicle body damage classification
by Aaron Rasheed Rababaah
Abstract: The automation of vehicle damage classification into classes of interest has benefits over manual solutions such as efficiency, accuracy, reliability and repeatability. Industries such as automotive dealerships, car rentals and car insurance are among the most industries that are expected to be interested in such a solution. In this paper, we present machine vision and deep learning-based method for vehicle damage classification based on convolution neural networks (CNNs) models. For training and validation, we used a publicly available dataset along with our own to increase input data as CNN models require significantly much more data than classical machine learning models. Our best performing model demonstrated a remarkable classification accuracy of 98.7%. As future work, we intend to consider a wider range of damage classes and significantly extend the current dataset to further validate the current solution.
Keywords: vehicle damage classification; image processing; machine vision; deep learning; convolutional neural networks.
Special Issue on: NAMSP2021 Intelligent Processing and Analysis of Multidimensional Signals and Images (IPAMSI)
DDVM: dual decision voting mechanism for brain tumour identification with LBP2Q-SVM type classifier
by Mansi Lather, Parvinder Singh
Abstract: Brain tumour classification plays a significant role in medical science as diagnosis of a brain tumour at its early stage of development can improve the recovery of the patient after treatment. In this paper, effective brain tumour presence and type classification methods are proposed. A pre-processing phase of the proposed model is capable to handle the dull medical images by contrast enhancement and noise filtering. In the first phase, to detect the tumour a dual decision voting mechanism (DDVM) for convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM) classification models is proposed. The final tumour identification is done by score maximisation. In the second phase, to identify the type of tumour as high-grade glioma or low-grade glioma, a novel algorithm named LBP2Q featured support vector machine classification model is designed. The results of both phases demonstrated that the proposed scheme outperforms the existing techniques in terms of various performance matrices.
Keywords: biomedical image processing; brain tumour detection; classification model; machine learning; medical image analysis.
Adaptive kernel-based active contour
by Gunjan Naik, Shubhangi Kelkar, Bhushan Garware, Aditya Abhayankar
Abstract: Geodesic active contour model (GACM) is a standard deterministic method for the segmentation of complex organ structures based on edge maps. For MRI images, the GACM performs poorly due to noise and weak edges, which might result from a low scanning period, low Tesla scanner machines, and other environmental conditions. The performance of GACM is getting affected due to constant edge detector kernels and based only on intensity values. To improve this performance, we have proposed a method involving adaptive kernels and phase-based edge detection called 'phase congruency'. The kernels used in phase congruency are log Gabor kernels for the calculation of edges. Instead of log Gabor kernels, we have proposed to use ICA kernels, which resemble similar anisotropic properties like log Gabor kernels and are also adaptive. This adaptive kernel-based phase congruency provides a robust edge map, to be used in GACM. Experimentation shows that when compared with state of art edge detection techniques, adaptive kernels enhance the weak as well as strong edges and improve the overall performance.
Keywords: active contour model; image segmentation; phase congruency; edge detection; geodesic active contour model; GACM.
Camouflaged object segmentation using saliency maps - a comparative study
by Sachi Choudhary, Rashmi Sharma
Abstract: Camouflage is the most common approach employed by armed forces to conceal something from the enemy's gaze on the battlefield or elsewhere. This article covers the literature on several strategies used to find concealed objects that have features in common with the surrounding environment in terms of colour, texture, orientation, and intensity levels. The concern of this research is the use of saliency map to locate the camouflaged object in the scene. The proposed methodology generates a saliency map based on region contrast. Another application for detecting the hidden object in the scene is to evaluate the ability of the blending camouflage pattern. Therefore, computations have been performed to locate the hidden object within the surrounding environment and to find the effectiveness of a camouflaged texture. A comparative study has been conducted here that compare the performance of saliency map based on centre surrounded, global contrast and proposed region contrast. The focus area for this comparison is on camouflaged object only. Based on precision, recall and F-measure values, the performance of mentioned approaches have been evaluated.
Keywords: camouflage object detection; saliency map; camouflage texture evaluation; military camouflage.
An optimised local feature compression using statistical and structural approach for face recognition
by A. Divya, K.B. Raja, K.R. Venugopal
Abstract: Face recognition is the current extensive research region studied among several recognition tasks in the field of pattern recognition. Face images captured under an unrestricted environment generally contain discrepancies in the pose, illumination and expression (PIE). To improve the robustness of the face image due to PIE variations, an optimised local feature compression (OLFC) is proposed using the matching algorithm and classifier. The pixel values of the images are structured as low picture element values (LPEV) and high picture element values (HPEV). The discrete wavelet transform and statistical methods are applied on LPEV and HPEV respectively to obtain substantial data and statistical features, which results in reduced features dimensions. Experiment is performed on six popular face databases (ORL, YALE, JAFFE, EYB, Faces-94 and FERET), illustrates an excellent performance with high recognition accuracy of 95.5%, 99.33%, 100%, 99.69%, 99.86% and 96.39% respectively with reduced error and computation time compared with existing methods.
Keywords: face recognition; discrete wavelet transform; DWT; Euclidean distance; artificial neural networks; ANNs.
Special Issue on: The Role of Computer Vision for Smart Cities
Image enhancement based on skin-colour segmentation and smoothness
by Haitao Sang, Bo Chen, Shifeng Chen, Li Yan
Abstract: The image restoration tasks represented by image denoising, super-resolution and image deblurring have a wide range of application background, and have become a research hotspot in academia and business circles. A novel image enhancement algorithm based on skin texture preserving is proposed in this paper. The mask has been obtained using the Gaussian fitting, which can have a box blur for many times for skin feather. The denoising smoothing image is fused with the original image mask to preserve the hair details of the original image and enhance the edge details of the contour, so as to provide more effective information for the extraction of edge features. Compared with different methods of image smoothing algorithms, this
algorithm is more effective in smoothing the skin edge contour and achieving better detection of images. Experimental results show that the proposed algorithm has strong adaptive capacity and significant effect on most images detection. Specifically, it can moderately smooth the edges of the areas with many details, leaving no traces of an artificial process. The proposed algorithm with image enhancement has a wide range of practicality.
Keywords: image enhancement; image restoration; image generation and synthesis; texture preserving smoother; skin-colour model.
Supervised learning software model for the diagnosis of diabetic retinopathy
by M. Padmapriya, S. Pasupathy
Abstract: Diabetic retinopathy (DR) is the leading cause of eye diseases and vision loss for diabetic affected people. Due to the damage of retinal blood vessels, diabetic patients often suffer from DR. So the retinal blood vessel segmentation plays a crucial role in the diagnosis of DR. We can prevent vision loss or blindness problems if the diagnosis happens during the early stages. Early diagnosis and initial investigation would help lower the risk of vision loss by 50%. This article exploits the supervised classification approach to detect blood vessels by applying features such as grey level and invariant moments. The image pre-processing and blood vessel segmentation are the two essential steps are used in this study, along with the proposed classification framework using neural network models. Two publicly available retinal image datasets, such as DRIVE and STARE, are used to assess the proposed supervised classification framework. The suggested supervised classification methodology in this study attains the average retinal blood vessel segmentation accuracy of 93.94% in the DRIVE dataset and 95.00% in the STARE dataset.
Keywords: diabetic retinopathy; fundus imaging; grey level features; invariant
moments; vessel segmentation.